Optimising pairwise alignment
Given that we have discussed the principle of how we can measure the match quality of an
aligned pair of sequences, we now turn to the problem of determining which alignment out
of all the possible combinations is the best (highest scoring). Consider for a moment the
following three examples. The last alignment is the best, with the middle one a close
second.
ALIGNMENTS--- A--LIGN-MENTS --ALIGN-MENTS
ANALIGDPVENTS ANALIGDPVENTS ANALIGDPVENTS
We can represent these same alignments as a comparison matrix, where each element
represents the pairing of a residue from one sequence with a residue from the other. In this
case we have indicated the aligned (paired) residues for a given row and column with ‘x’.
Each alternative alignment can be viewed as a different route through the comparison
matrix and gaps are simply jumps in one sequence or another; down rows or across
columns.
ALIGNMENTS ALIGNMENTS ALIGNMENTS
Ax…...... Ax…...... A….......
N.x…..... N…....... N….......
A..x….... A…....... Ax…......
L…x…... L.x…..... L.x….....
I….x….. I..x….... I..x…....
G…..x…. G…x…... G…x…...
D…...x… D….x….. D….x…..
P…....x.. P…....... P….......
V….....x. V…..x…. V…..x….
E…......x E…...x… E…...x…
N…....... N…....x.. N…....x..
T…....... T….....x. T….....x.
S…....... S…......x S…......x
It is clear that each separate alignment possibility is a different way of placing the
dashes, which represent the gaps. With a total alignment length of 13 and placing three
gaps inside the shorter sequence we have (13×12×11)/(3×2×1) = 286 ways of arranging
the gaps. If we extend this calculation to the not unreasonable and biologically typical
scenario of five gaps in 100 positions then the result is (100×99×98×97×96)/(5×4×3×2×1)
= 75,287,520. And this still does not consider another class of alignment possibilities with
gaps being present in both sequences like:
------ALIGNMENTS
ANALIGDPVENTS---
As you can see the number of possible combinations grows very rapidly with the length
of the sequence. Indeed, for almost all situations it is impractical to check them all when
doing an alignment. Nevertheless we can still find the best alignment of a pair of
sequences by using a clever trick, which allows us to neglect checking the vast majority of
alignments. The principle behind this is commonly referred to dynamic programming.
This is perhaps a misleading name, because the idea doesn’t actually involve anything
especially dynamic, and isn’t anything novel in terms of computer programming.
However, the idea is really very cunning.
Do'stlaringiz bilan baham: |