Harnessing Deep Learning and Language Models for Protein Function Prediction: A CAFA5-Based Study
Keywords:
Deep Learning, CAFA5, T5 Embedding, LSTM, GRU, Bi-LSTM, Attention MechanismAbstract
Recent advancements in deep learning have brought remarkable progress in the area of predicting the protein functions from its amino acid sequences. These sequences play a crucial role in accelerating drug evaluation and uncovering how cells work. This research investigated several deep models for predicting protein functions, which include Bi-LSTM coupled with an attention mechanism, Gated Recurrent Unit, Long Short-Term Memory, Deep Neural Networks, and Bidirectional LSTM. This research used the CAFA5 dataset along with the T5 embedding, which is created from this dataset, to test these DL models for the multi-label protein functions prediction task. The researchers used state-of-the-art matrices to measure the performance of these models, which includes ROC-AUC, Hamming loss AUC, and binary accuracy. The analysis demonstrates the Bi-LSTM paired with attention mechanism and DNN models outperformed the baseline traditional RNN models in both minimizing loss and accuracy. With an outstanding ROC-AUC score of 0.9239 and consistent prediction reliability, the Bi-LSTM plus Attention model performed well. This research showed that combining DL models with integrated attention layers produces more scalable and accurate results for predicting protein functions. Showing their usefulness in practical bioinformatics tasks.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License



