Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet197/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   193   194   195   196   197   198   199   200   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Phylogenetic trees

Given  a  group  of  homologous  sequences  we  can  often  go  beyond  saying  that  they  are

related and build a phylogenetic tree to say how they are related to one another. The idea

with such a tree is to reconstruct the way that sequences have diverged during evolution.

This  can  be  used  to  reconstruct  the  events  of  how  genes,  non-coding  regions  and  even

protein domains arose. Given enough information we can look at a large scale to say how

whole  species  are  related,  and  if  we  look  at  the  fine  details  how  individuals  within  a

family are related. Of course on some occasions we already know the inheritance tree, by

using  knowledge  of  parentage.  This  enables  us  to  follow  traits  including  physical

differences, biochemical differences (e.g. blood groups) and inherited disease symptoms.

However, it is only if we study the inherited differences at the biological sequence level

that  we  can  understand  the  molecular  reasons,  which  in  turn  improves  medicine  and

biology.

In  history,  evolutionary  and  family  trees  were  built  according  to  observable

characteristics.  If  two  species  shared  certain  anatomical  characteristics  they  would  be

deemed to be more closely related. This works well in some cases, but not in others (such

as knowing where to place the elephant, whale and duck-billed platypus in the evolution

of  mammals).  The  reason  for  this  difficulty  is  that  people  were  only  following  a  few

subjective measurements. DNA sequencing allows us to place evolutionary lineages with

much more confidence, because the detection of sequence is a precise thing and there are

vastly  more  data  points  to  follow:  potentially  every  base  pair,  gene  and  transposon.

Nevertheless, we sometimes still have to resort to anatomical comparisons when DNA is




unavailable, as with dinosaurs, but the more bones the better.

When constructing a phylogenetic tree of sequences the basic principle is to think of the

most similar sequences being the most closely related, analogous to the anatomical means

of grouping organisms. When looking at sequence evolution we often think in terms of the

most frugal explanation or parsimony; it is reasonable to assume that minimal changes are

the most likely, so we would think that a nucleotide is less likely to change from say T to

G to C  than  it  is  to  go  directly  from  T  to  C.  Accordingly  when  we  build  a  phylogenetic

tree we assume that the correct one is, or is close to, the one that involves the minimum

amount  of  overall  sequence  change.  Absolute  parsimony  isn’t  always  a  good  idea  in  all

situations:  with  distantly  related  sequences,  and  those  with  a  high  rate  of  change,  the

chances  of  having  intermediate  residue  changes  is  significant,  so  it  is  better  to  think  in

terms  of  the  long-term  equilibrium  of  sequence.  Also  some  things  may  be  similar  by

chance  and  not  because  of  a  common  ancestor,  although  this  becomes  increasingly

unlikely  overall  if  we  consider  increasingly  more  sequence  data.  However,  there  may

simply not be enough data to form a firm opinion, even if building some sort of optimised

tree is computationally possible.

When  trying  to  work  out  real  inheritance  and  evolutionary  relationships  more

information  will  yield  better  results.  Thus  when  we  look  at  the  relationships  between

species  it  is  best  to  consider  as  much  sequence  and  as  many  sequences  as  possible,

although  given  the  choice  it  is  better  to  have  sequences  that  sample  a  tree  widely  and

evenly.  Tree-building  becomes  more  inaccurate,  with  regard  to  the  underlying  truth,  the

longer  the  branch,  so  it  is  best  to  have  lots  of  linking  sequences  and  hence  shorter

branches. Also, some sequences (genes, proteins or whatever) may be better than others at

uncovering  the  relationships,  particularly  if  the  rate  of  sequence  change  is  the  right

magnitude;  too  many  changes  and  the  assumption  of  parsimony  is  weaker,  but  too  few

changes  and  there  isn’t  enough  evidence  to  support  a  hypothesis.  Accordingly,  when  we

study fast-moving things, like the mutation of viruses, we look at rapidly changing genes,

and  for  slow  things  like  speciation  we  look  at  slowly  changing  things:  ribosomal  RNA

genes, mitochondrial ‘housekeeping’ genes and rare transposon and duplication events.

When  we  have  confidently  built  a  phylogenetic  tree,  analyses  of  sequence  variation

gives us more information than can be obtained from alignments. We will be able to spot

which changes occurred first and whether the same change has occurred more than once.

As illustrated in

Figure 14.4

, consider for example four sequences A, B, C and D, two of

which, A and B, have residue W at a position and two of which, C and D, have residue Y

at  the  same  position.  If  we  know  that  the  pairs  A  and  B  and  C  and  D  are  more  closely

related  as  a  whole,  then  we  know  that  one  residue  substitution  was  enough  to  make  the

observed  situation;  the  ancestor  of  the  sequences  might  have  had  W  or  Y  but  one

substitution  is  enough  to  generate  the  A  and  B  (W)  branch  or  C  and  D  (Y)  branch.

Conversely if the most closely related pairs overall are A and C and B and D then each

pair  contains  a  mix  of  W  and  Y  residues.  In  this  case  there  must  have  been  at  least  two

substitution events, one on each branch from the ancestor, which could only have one of

the two residues. Accordingly, by considering the overall relationship between sequences

we  can  make  much  better  measurements  of  the  rate  of  change  than  we  can  from  just  a

multiple alignment.





Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   193   194   195   196   197   198   199   200   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish