Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction
0
by Akash Bahai, Chee Keong Kwoh, Yuguang Mu, Yinghui Li
The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly. Despite advancements, the accuracy of computational methods remains modest, especially when compared to protein structure prediction. Deep learning methods, while successful in protein structure prediction, have shown some promise for RNA structure prediction as well, but face unique challenges. This study systematically benchmarks state-of-the-art deep learning methods for RNA structure prediction across diverse datasets. Our aim is to identify factors influencing performance variation, such as RNA family diversity, sequence length, RNA type, multiple sequence alignment (MSA) quality, and deep learning model architecture. We show that generally ML-based methods perform much better than non-ML methods on most RNA targets, although the performance difference isn’t substantial when working with unseen novel or synthetic RNAs. The quality of the MSA and secondary structure prediction both play an important role and most methods aren’t able to predict non-Watson-Crick pairs in the RNAs. Overall among the automated 3D RNA structure prediction methods, DeepFoldRNA has the best prediction followed by DRFold as the second best method. Finally, we also suggest possible mitigations to improve the quality of the prediction for future method development.