Guide tree and multiple sequence alignment

The main difficulty in resolving deep phylogenies lies in obtaining accurate multiple sequence alignment. Many problems remain unresolved. First, what is the best guide tree for the commonly used progressive multiple sequence alignment? One might believe that the true tree should be the best guide tree. However, this belief conflicts with the basic principle that multiple sequence alignment should start with the most similar sequences and progress toward less similar sequences. This conflict is illustrated with the following true tree:

((S1:0.001, S2:0.1):0.001, (S3:0.001,S4:0.1):0.001).

S1 and S3 are the most similar sequences, with a pairwise distance of only 0.003. They should therefore be aligned first following the principle stated above. However, the true tree would not allow S1 and S3 to be aligned first and would force S1 and S2 (or S3 and S4) to be aligned first. This is one of the reasons for widely used multiple sequence alignment programs, such as MAFFT and MUSCLE, to use a modified version of UPGMA to reconstruct the guide tree, because UPGMA will cluster S1 and S3 together. Such a guide tree ensures that S1 and S3 would be aligned first. Will such a guide tree distort the resulting MSA and the subsequent phylogenetic reconstruction?

Second, given that we could agree on what should be the best guide tree, how can we obtain this best guide tree? The guide tree is conventionally obtained from pairwise alignments. Will three-way alignment lead to a better guide tree? For four sequences, one would have 4 triplets: {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, so the distance between sequences 1 and 2, D_12, could be computed from the first two triplets (D_12_triplet1, D_12_triplet2). Should D_12 be the arithmetic mean of D_12_triplet1 and D_12_triplet2, or weighted mean based on how similar sequences 3 and 4 are to sequences 1 and 2?

Third, what criterion should be used in choosing the optimal MSA? If phylogenetic reconstruction is the ultimate goal, then phylogenetic accuracy obviously should be the ultimate criterion for choosing the best MSA. Given that this criterion cannot be practically used, does the commonly weighted SPS (sum-of-pairwise-score) criterion serve as a good proxy for phylogenetic accuracy?

This study represents the first step towards addressing these questions:

Askari Rad, M.; Kruglikov, A.; Xia, X. Three-Way Alignment Improves Multiple Sequence Alignment of Highly Diverged Sequences. Algorithms 2024, 17, 205. https://doi.org/10.3390/a17050205

Dr. Reza Rahavi

Experimental Medicine , Faculty of Medicine, UBC, Vancouver | Medical Content Writing

4 个月

How can incorporating deep learning techniques enhance the accuracy and efficiency of multiple sequence alignment in phylogenetic analysis? https://lnkd.in/gsaeuFhu

回复

要查看或添加评论,请登录

Xuhua Xia的更多文章