Python Programming for Biology: Bioinformatics and Beyond

Figure 14.3. The classification of homologous sequences as orthologues or

Download 7,75 Mb.

Pdf ko'rish

bet	196/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 192 193 194 195 196 197 198 199 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 14.3. The classification of homologous sequences as orthologues or

paralogues. DNA sequences in different contexts may be similar to one another because

they share a common ancestor (they are homologous) and any differences have arisen by

evolutionary divergence. There are two general ways in which such homologues can arise:

an original sequence may follow two different evolutionary routes because different

species separated from one another, thus creating orthologues, with different versions of

the sequence in each genome. Alternatively, within a single genome a sequence may be

duplicated and the different copies may then take on different roles, diverging from one

another, creating paralogues.

When looking through sequences to detect homologues it is often the case that we find

only sections of a gene or protein that are similar, while the remainder of the sequence is

distinctly dissimilar. The reason for this is that the limits of genes are not static in

evolution; via sequence changes they can expand and contract and be recombined in new

ways, so that only parts of their common ancestry remain. It is a common occurrence to

have only part of a gene duplicated, and in some instances duplication can merge genes or

parts of genes. The units that are most commonly shuffled around in this way correspond

to exons, i.e. the protein-coding parts of genes between the introns. By shuffling whole

exons it is more likely that a sensible coding sequence will be maintained; breaking up

exons is more likely to result in codon frame-shifts and nonsense protein code. At a

protein level it is clear that the shuffling of exons gives rise to shuffling, duplication and

recombination of entire domains, the functional units of proteins which are typically

autonomously folding and globular. Multi-domain proteins, where each part of the gene

has a potentially different ancestry, are very common in many genomes, including human.

Accordingly if you search for protein homologues you are typically searching for domains

that share a common ancestor, rather than whole genes. It often does not matter whether

protein homologues comprise whole coding regions or are just part of a larger gene as far

as the analysis of the family is concerned. So long as they do have a common ancestor you

can tell something about conservation, protein structure and function.

Often we seek to link sequences as orthologues and paralogues. This helps us

understand the process of evolution that has given rise to the observed sequences. Also, if

we have some knowledge, experimental or otherwise, about one sequence then we can

also say something about the related sequences; often they have a similar function. We can

group them into a family where we can see general trends more clearly; multiple-sequence

alignment is improved and we can identify conservation to identify features important to

function (e.g. binding sites and catalytic sites) and protein structure. Indeed the presence

of a common ancestor for proteins, and the preservation of 3D coordinates between

homologues, is at the heart of a process called comparative modelling, which will be

discussed more in

Chapter 15

. Often there is a choice between studying DNA sequence or

protein sequence, and which you do on a given occasion depends on what you are

interested in. Protein structure is far more conserved than amino acid sequence, which is

in turn more conserved than nucleotide sequence, so for detecting the conservation in

remote protein homologues, with similar 3D folds, you would use protein sequences.

However, given that DNA changes are the underlying mechanism, and thus the most

sensitive measure, if precise changes are studied nucleotide sequences are used.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 192 193 194 195 196 197 198 199 ... 514