Python Programming for Biology: Bioinformatics and Beyond


Figure 14.3.  The classification of homologous sequences as orthologues or



Download 7,75 Mb.
Pdf ko'rish
bet196/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   192   193   194   195   196   197   198   199   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 14.3.  The classification of homologous sequences as orthologues or

paralogues. DNA sequences in different contexts may be similar to one another because

they share a common ancestor (they are homologous) and any differences have arisen by

evolutionary divergence. There are two general ways in which such homologues can arise:

an original sequence may follow two different evolutionary routes because different

species separated from one another, thus creating orthologues, with different versions of

the sequence in each genome. Alternatively, within a single genome a sequence may be

duplicated and the different copies may then take on different roles, diverging from one

another, creating paralogues.

When looking through sequences to detect homologues it is often the case that we find

only sections of a gene or protein that are similar, while the remainder of the sequence is

distinctly  dissimilar.  The  reason  for  this  is  that  the  limits  of  genes  are  not  static  in

evolution; via sequence changes they can expand and contract and be recombined in new

ways, so that only parts of their common ancestry remain. It is a common occurrence to

have only part of a gene duplicated, and in some instances duplication can merge genes or

parts of genes. The units that are most commonly shuffled around in this way correspond

to  exons,  i.e.  the  protein-coding  parts  of  genes  between  the  introns.  By  shuffling  whole

exons  it  is  more  likely  that  a  sensible  coding  sequence  will  be  maintained;  breaking  up

exons  is  more  likely  to  result  in  codon  frame-shifts  and  nonsense  protein  code.  At  a

protein level it is clear that the shuffling of exons gives rise to shuffling, duplication and

recombination  of  entire  domains,  the  functional  units  of  proteins  which  are  typically

autonomously  folding  and  globular.  Multi-domain  proteins,  where  each  part  of  the  gene

has a potentially different ancestry, are very common in many genomes, including human.




Accordingly if you search for protein homologues you are typically searching for domains

that share a common ancestor, rather than whole genes. It often does not matter whether

protein homologues comprise whole coding regions or are just part of a larger gene as far

as the analysis of the family is concerned. So long as they do have a common ancestor you

can tell something about conservation, protein structure and function.

Often  we  seek  to  link  sequences  as  orthologues  and  paralogues.  This  helps  us

understand the process of evolution that has given rise to the observed sequences. Also, if

we  have  some  knowledge,  experimental  or  otherwise,  about  one  sequence  then  we  can

also say something about the related sequences; often they have a similar function. We can

group them into a family where we can see general trends more clearly; multiple-sequence

alignment is improved and we can identify conservation to identify features important to

function (e.g. binding sites and catalytic sites) and protein structure. Indeed the presence

of  a  common  ancestor  for  proteins,  and  the  preservation  of  3D  coordinates  between

homologues,  is  at  the  heart  of  a  process  called  comparative  modelling,  which  will  be

discussed more in

Chapter 15

. Often there is a choice between studying DNA sequence or

protein  sequence,  and  which  you  do  on  a  given  occasion  depends  on  what  you  are

interested in. Protein structure is far more conserved than amino acid sequence, which is

in  turn  more  conserved  than  nucleotide  sequence,  so  for  detecting  the  conservation  in

remote  protein  homologues,  with  similar  3D  folds,  you  would  use  protein  sequences.

However,  given  that  DNA  changes  are  the  underlying  mechanism,  and  thus  the  most

sensitive measure, if precise changes are studied nucleotide sequences are used.


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   192   193   194   195   196   197   198   199   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish