13
Multiple-sequence alignments
Contents
Multiple alignments
Progressive pairing
Alignment consensus and profiles
Generating a consensus sequence
Generating an alignment profile
Profile alignments
Generating simple multiple alignments in Python
Profile-based multiple alignment
Interfacing multiple-alignment programs
Using ClustalW from Python
Multiple alignments
Expanding from an alignment of just two sequences, the more sequences you can align
together then the more information you have and the more accurate your alignment will
be. The caveat to this is that the closeness of the relationship between sequences is
important and should also be taken into account. Consider that for two very closely related
sequences, the differences are significant and similarities less significant, because you
expect similarity. In contrast, for distantly related sequences residue differences are
positively expected, so the similarities are more significant and differences less so.
Given that we have described how alignments can be made for pairs of sequences, the
next topic is to show how we can include more than just two sequences in an alignment to
make a multiple-sequence alignment, and how the overall or average properties of such an
alignment can be measured. As a naïve example of the benefits of multiple-sequence
alignment, consider the following alternative alignments for two sequences:
GCGCATG--GCGCAT GCGCAT--GGCGCAT
GGGCATGCGGCGCAT GGGCATGCGGCGCAT
There is no way to know which alignment is best; the gap appears equally good in
either position. However, if there is a third sequence which supports one scenario over the
other then we can make a better judgement, in this instance supporting the first scenario.
GCGCATG--GCGCAT
GGGCATGCGGCGCAT
GCGCATGCCCCGCAT
When aligning pairs of sequences, which you can imagine as a two-dimensional
problem, we can use the dynamic programming trick. However, as the number of
sequences in a multiple alignment increases the complexity of the problem increases; in
effect we get a new dimension of possibilities for each extra sequence. If the comparison
of two sequences required a grid of points then three requires a volume of points, four a
4D hypervolume etc. In essence the alignment problem grows very complex very quickly
with extra sequences. The objective is to find the gap placements in the multiple-sequence
alignment to give the optimum alignment score for all of the sequences at the same time.
Overall, there is no general fast method for guaranteeing the generation of optimal
multiple-sequence alignments.
Do'stlaringiz bilan baham: |