Profile alignments
Once we have calculated a profile for an alignment we can align it with something else,
just as we do in a pairwise sequence alignment. We actually require the ability to align
pairs of profiles for the demonstration function to create multiple alignments. When we
create a multiple-sequence alignment by progressively adding sequences, we have to
combine profiles for one alignment (one branch of the tree) with another alignment (a
different branch); we cannot do a plain sequence alignment.
We won’t go through the profile alignment function in great detail because it is so
similar to the regular sequence alignment function discussed in
Chapter 12
. Nonetheless,
there are still a few key differences. Firstly and most obviously, we are not passing-in
sequences (strings of letters), but rather profiles (lists of residue fraction dictionaries).
Secondly, when we calculate the residue similarity scores, rather than comparing one
residue with another, we compare all of the residue fractions with each other. Accordingly,
say that at a given position we have 50% ‘A’ and 50% ‘G’ in one profile and 25% ‘C’ and
75% ‘G’ in the other profile, then the final similarity score comes from the addition of four
values; the combinations A:C, A:G, G:C and G:G each using a different similarity value
from the substitution matrix, multiplied by the weight for that pair. You can imagine the
weights for each combination being the area within a square defined by edges that have
lengths defined by the individual profile weights. For example, the A:C combination is
50% times 25%, giving 12.5% to the score. Thirdly, the function also modifies the gap
penalties because the input profiles are generated from alignments that may carry gaps.
The modification is to multiply the penalties by the total weight for the profiles at each
point; any missing weight will be due to gaps so the total is not necessarily 100%. This
multiplication reduces the gap penalties, thus gaps are penalised less when gaps were
present in the input profiles.
def profileAlign(profileA, profileB, simMatrix, insert=8, extend=4):
numI = len(profileA) + 1
numJ = len(profileB) + 1
scoreMatrix = [[0] * numJ for x in range(numI)]
routeMatrix = [[0] * numJ for x in range(numI)]
for i in range(1, numI):
routeMatrix[i][0] = 1
for j in range(1, numJ):
routeMatrix[0][j] = 2
for i in range(1, numI):
for j in range(1, numJ):
penalty1 = insert
penalty2 = insert
if routeMatrix[i-1][j] == 1:
penalty1 = extend
elif routeMatrix[i][j-1] == 2:
penalty2 = extend
fractionsA = profileA[i-1]
fractionsB = profileB[j-1]
similarity = 0.0
totalWeight = 0.0
for residueA in fractionsA:
for residueB in fractionsB:
weight = fractionsA[residueA] * fractionsB[residueB]
totalWeight += weight
similarity += weight * simMatrix[residueA][residueB]
penalty1 *= totalWeight
penalty2 *= totalWeight
paths = [scoreMatrix[i-1][j-1] + similarity, # Route 0
scoreMatrix[i-1][j] - penalty1, # Route 1
scoreMatrix[i][j-1] - penalty2] # Route 2
best = max(paths)
route = paths.index(best)
scoreMatrix[i][j] = best
routeMatrix[i][j] = route
profileOutA = []
profileOutB = []
i = numI-1
j = numJ-1
score = scoreMatrix[i][j]
while i > 0 or j > 0:
route = routeMatrix[i][j]
if route == 0: # Diagonal
profileOutA.append(profileA[i-1])
profileOutB.append(profileB[j-1])
i -= 1
j -= 1
elif route == 1: # Gap in profile B
profileOutA.append(profileA[i-1])
profileOutB.append(None)
i -= 1
elif route == 2: # Gap in profile A
profileOutA.append(None)
profileOutB.append(profileB[j-1])
j -= 1
profileOutA.reverse()
profileOutB.reverse()
return score, profileOutA, profileOutB
alignA = ['SRPAPVV--LII', 'TVPAPVVIILII']
alignB = ['HHFSEPEITLIIF', 'H-FSELVIALIIF']
print(profileAlign(profile(alignA), profile(alignB), BLOSUM62))
The output of the function is a numeric score and two aligned, gapped profiles. Note
that the output profiles were made by using the .append() function to add to the end of the
lists, even though we went through the sequence positions in reverse order (decreasing i
and j). This results in the profile lists being created in the reverse order to what we require,
hence reverse() is used to flip the order at the end.
3
A gap is indicated in an output profile
by using the None object; this is arbitrary but something that evaluates as false is handy.
Do'stlaringiz bilan baham: |