route through the network without having to consider all possible routes. By finding the
best route to each intermediate location the algorithm excludes many route possibilities.
As illustrated, starting at A you must go via B or C then through D or E, then F or G
before finally getting to H. Each leg of the journey takes a different time and all of these
times are added to get the total travel time. If I want to find the quickest journey from A to
H then I have to potentially consider all of the possible routes, A, B, D, F, H, A, C, E, F, H
etc. … It might seem that you have to calculate the time for all possible journeys before
you know which way is best. However, this problem can be broken into smaller sub-
problems which allow us to discard certain routes at an early stage. For example, we can
calculate the shortest journey time from A to D and A to E; potentially these could go
through either B or C, but if both of the shortest routes go through B we need never
consider point C ever again. When extending the routes through F and G we only have to
extend our best routes from D and E, which we have already determined only use point B.
In general, if we add up journey times as we go, and given that at each point there are only
two previous directions that we could have come from, we can choose the best direction to
get to that point and ignore all routes that come from the other direction. Thus in the end
when we reach H there are only two routes to choose between, one from F and one from
G. At the start we couldn’t know which of F or G would be optimal in the end, but in
calculating the routes (sub-problems) to these final points we have only been following the
best paths for each stage, which is much more efficient than following them all. Sequence
alignment is analogous to this kind of journey analysis, the difference being that we want
the route of maximum alignment score (not minimum time) and we have three routes into
each point rather than two; the three routes are (i) to put a gap in one sequence, (ii) put a
gap in the other sequence or (iii) align two residues.
If we start from the beginning of two sequences and consider the alignment growing by
inserting a gap in one sequence or the other, or by putting the next two residues together,
at each point the number of possibilities grows by a multiple of three each time. However,
to get to a particular residue pairing there are three routes that it could have come from;
and like in the above example only the best one to this point needs to be continued.
Specifically, what sequence alignment does is to extend the (sub-total) scores of these
three routes with gap penalties or a similarity score for the next residue pair. By taking the
maximum of these scores the other two are discarded and their routes cut; two of the sub-
alignments that get to this point do not need to be extended any further, because there is
already a better alignment to get to the same place. Every time the alignment is extended,
multiple new possibilities are generated, but at the same time previous sub-alignments can
be disregarded. By repeatedly extending the alignment, and pruning sub-optimal
alternatives along the way, the end of the sequences is eventually reached. Initially as the
sequences are compared, there is no way to know which sub-alignments are discarded and
which win out, but the decisions that are taken (whether to add gaps or pair up residues) at
each step along the way can be remembered. So, starting from the end point the winning
decisions can be followed, backwards, to find what gave the best score at each point and
hence the best alignment.
Do'stlaringiz bilan baham: