Supporting data for "learnMSA: Learning and Aligning Large Protein Families"

29.09.2022 03:00

Gigadb.org

The alignment of large numbers of protein sequences is a challenging task and its importance grows rapidly along with the size of biological datasets. State-of-the-art algorithms have a tendency to produce less accurate alignments with an increasing number of sequences. This is a fundamental problem since many downstream tasks rely on accurate alignments.
We present learnMSA, a novel statistical learning approach of profile hidden Markov models (pHMMs) based on batch gradient descent. Fundamentally different from popular aligners, we fit a custom recurrent neural network architecture for (p)HMMs to potentially millions of sequences with respect to a maximum a posteriori objective and decode an alignment. We rely on automatic differentiation of the log-likelihood and thus, our approach is different from existing HMM training algorithms like Baum–Welch. Our method does not involve progressive, regressive or divide-and-conquer heuristics. We use uniform batch sampling to adapt to large datasets in linear time without the requirement of a tree. When tested on ultra-large protein families with up to 3.5 million sequences, learnMSA is both more accurate and faster than state-of-the-art tools. On the established benchmarks HomFam and BaliFam with smaller sequence sets it matches state-of-the-art performance. All experiments where done on a standard workstation with a GPU.
Our results show that learnMSA does not share the counter-intuitive drawback of many popular heuristic aligners which can substantially lose accuracy when many additional homologs are input. LearnMSA is a future-proof framework for large alignments with many opportunities for further improvements.

Moscow.media

Частные объявления сегодня

Rss.plus

Все новости за 24 часа

Supporting data for "learnMSA: Learning and Aligning Large Protein Families"

Новости спорта

Медведев проиграл Беллуччи во втором круге турнира в Роттердаме

Тамбовчан приглашают в областную картинную галерею на выставку экслибрисов

Минобороны просит взыскать 57 млн с космического научно-производственного центра

Капитолина Атласова-Юхневич: «Мне писали записки Максим Аммосов и Емельян Ярославский»

Скульптура Георгия Победоносца вместе с парком должна появиться 6 мая в Ачинске