Harnessing Deep Learning and Language Models for Protein Function Prediction: A CAFA5-Based Study

Ali Haider; Jamal Shah; Musadaq Mansoor; Omar Bin Samin

Authors

Ali Haider School of Computer Sciences & IT, Institute of Management Sciences, Peshawar, Pakistan.
Jamal Shah School of Computer Sciences & IT, Institute of Management Sciences, Peshawar, Pakistan.
Musadaq Mansoor School of CS, Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur, Pakistan.
Omar Bin Samin School of Computer Sciences & IT, Institute of Management Sciences, Peshawar, Pakistan.

Keywords:

Deep Learning, CAFA5, T5 Embedding, LSTM, GRU, Bi-LSTM, Attention Mechanism

Abstract

Recent advancements in deep learning have brought remarkable progress in the area of predicting the protein functions from its amino acid sequences. These sequences play a crucial role in accelerating drug evaluation and uncovering how cells work. This research investigated several deep models for predicting protein functions, which include Bi-LSTM coupled with an attention mechanism, Gated Recurrent Unit, Long Short-Term Memory, Deep Neural Networks, and Bidirectional LSTM. This research used the CAFA5 dataset along with the T5 embedding, which is created from this dataset, to test these DL models for the multi-label protein functions prediction task. The researchers used state-of-the-art matrices to measure the performance of these models, which includes ROC-AUC, Hamming loss AUC, and binary accuracy. The analysis demonstrates the Bi-LSTM paired with attention mechanism and DNN models outperformed the baseline traditional RNN models in both minimizing loss and accuracy. With an outstanding ROC-AUC score of 0.9239 and consistent prediction reliability, the Bi-LSTM plus Attention model performed well. This research showed that combining DL models with integrated attention layers produces more scalable and accurate results for predicting protein functions. Showing their usefulness in practical bioinformatics tasks.

Harnessing Deep Learning and Language Models for Protein Function Prediction: A CAFA5-Based Study

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

SCOPUS

HJRS

ISSN

Online First

Call for Papers

Make a Submission

Open Access

Information

Conference

SC-2