Figure 14.3. The classification of homologous sequences as orthologues or
paralogues. DNA sequences in different contexts may be similar to one another because
they share a common ancestor (they are homologous) and any differences have arisen by
evolutionary divergence. There are two general ways in which such homologues can arise:
an original sequence may follow two different evolutionary routes because different
species separated from one another, thus creating orthologues, with different versions of
the sequence in each genome. Alternatively, within a single genome a sequence may be
duplicated and the different copies may then take on different roles, diverging from one
another, creating paralogues.
When looking through sequences to detect homologues it is often the case that we find
only sections of a gene or protein that are similar, while the remainder of the sequence is
distinctly dissimilar. The reason for this is that the limits of genes are not static in
evolution; via sequence changes they can expand and contract and be recombined in new
ways, so that only parts of their common ancestry remain. It is a common occurrence to
have only part of a gene duplicated, and in some instances duplication can merge genes or
parts of genes. The units that are most commonly shuffled around in this way correspond
to exons, i.e. the protein-coding parts of genes between the introns. By shuffling whole
exons it is more likely that a sensible coding sequence will be maintained; breaking up
exons is more likely to result in codon frame-shifts and nonsense protein code. At a
protein level it is clear that the shuffling of exons gives rise to shuffling, duplication and
recombination of entire domains, the functional units of proteins which are typically
autonomously folding and globular. Multi-domain proteins, where each part of the gene
has a potentially different ancestry, are very common in many genomes, including human.
Accordingly if you search for protein homologues you are typically searching for domains
that share a common ancestor, rather than whole genes. It often does not matter whether
protein homologues comprise whole coding regions or are just part of a larger gene as far
as the analysis of the family is concerned. So long as they do have a common ancestor you
can tell something about conservation, protein structure and function.
Often we seek to link sequences as orthologues and paralogues. This helps us
understand the process of evolution that has given rise to the observed sequences. Also, if
we have some knowledge, experimental or otherwise, about one sequence then we can
also say something about the related sequences; often they have a similar function. We can
group them into a family where we can see general trends more clearly; multiple-sequence
alignment is improved and we can identify conservation to identify features important to
function (e.g. binding sites and catalytic sites) and protein structure. Indeed the presence
of a common ancestor for proteins, and the preservation of 3D coordinates between
homologues, is at the heart of a process called comparative modelling, which will be
discussed more in
Chapter 15
. Often there is a choice between studying DNA sequence or
protein sequence, and which you do on a given occasion depends on what you are
interested in. Protein structure is far more conserved than amino acid sequence, which is
in turn more conserved than nucleotide sequence, so for detecting the conservation in
remote protein homologues, with similar 3D folds, you would use protein sequences.
However, given that DNA changes are the underlying mechanism, and thus the most
sensitive measure, if precise changes are studied nucleotide sequences are used.
Do'stlaringiz bilan baham: |