Python Programming for Biology: Bioinformatics and Beyond

Phylogenetic trees using neighbour-joining

Download 7,75 Mb.

Pdf ko'rish

bet	205/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 201 202 203 204 205 206 207 208 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Phylogenetic trees using neighbour-joining

Tree-building can be a very difficult task. This is basically because the number of branch

combinations increases very rapidly with the number of input taxa. The number of

combinations can be so vast that we cannot routinely test all combinations to give the

optimum tree arrangement. Hence what is often done, and which we will do here, is to

generate a quite good, but not optimal, tree using a computationally fast method. The

method used here is the one introduced by Saitou and Nei; the neighbour-joining method.

This is a quick way of generating a tree that is usually not so far from a global optimum

and which is also relatively easy to understand mechanistically. The generated tree will

not necessarily be at the global optimum because the algorithm is ‘greedy’ in that it only

optimises the local solution for pairing up sequences, i.e. it never considers the tree as a

whole. Nonetheless it is still a speedy and useful example to be used in subsequent

analyses, and it can also provide a starting point for more globally aware optimising

methods, if more accuracy is required. As with most of the examples in this book, the aim

of the tree-generating example is not to give a high-performance, gold-standard program,

but rather to lead you through a relatively simple piece of code that can be readily

understood and which illustrates the major points of the topic.

The neighbour-joining algorithm involves repeatedly finding the closest pair from

amongst the input sequences (and sub-trees after the first cycle) and joining these to form

a new larger sub-tree until all sequences have been considered and only one, fully joined

tree remains. Thus the first step for the tree-building example is to calculate what is

termed a distance matrix, where the rows and columns of the matrix represent the different

input sequences and the matrix elements represent the evolutionary distance between those

sequences. Here we will only compare isolated input sequences; however, the method

could be adapted to consider multiple sequences for each taxon (e.g. each input is a group

of sequences representing a different species) so that the evolutionary distance between

organisms is better represented as a whole. The tree that will result in the end is what is

called an unrooted tree. This means that the tree has no indication of which taxon came

first, evolutionarily speaking, and hence where the common (but unseen) ancestor fits in.

By making an unrooted tree we are only showing the branch relationships between the

input taxa. We recommend further reading if you wish to build rooted trees: those with an

ancestor start point.

The evolutionary distance that separates two sequences in this instance will be

estimated using the scores that are generated by performing pairwise sequence alignment.

Thus the values will depend on the similarity of paired residues as defined by a

substitution matrix. With DNA sequences it is common to use a more complex

evolutionary model, to say how each change relates to distance; there are many more

sophisticated ways of calculating the distance between sequences, not all of which can be

used with the relatively simple neighbour-joining method used here. However, by using

the substitution matrices we will keep things very simple for illustrative purposes and be

able to use existing Python functions. For further reading on distance matrix generation,

and tree generation in general, the PHYLIP

software package is a good starting point.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 201 202 203 204 205 206 207 208 ... 514