When comparing different versions of the same gene or protein, say in a multiple
alignment, the sequences analysed are all related to one another. Saying that they are the
same kind of gene is not only to say that they do the same job, but also that they have a
common ancestor. Considering all the species of mammals on Earth, they all use
haemoglobin to transport oxygen via blood and they all have globin genes to make the
protein part of this. Thus we also know that the common ancestor
3
of mammals had
haemoglobin particles and globin genes. The origins of globin undoubtedly go back even
further than this to the time when backboned animals were new to our planet. The globin
genes have diverged as the various species have split from one another, with any sequence
change being carried on to descendants of that line. Genes or proteins that are known to be
related to one another by the fact that they share a common ancestor are said to be
homologous. It is a common mistake to mix up sequence similarity with homology; it may
be stated that you can ‘measure sequence homology’, when strictly speaking what is
meant is that the sequences are sufficiently similar that we can infer homology: a common
ancestry.
Considering again the globin gene, as you may already know, there are even different
kinds of globin gene within a single genome. Normal haemoglobins are a combination of
alpha and beta versions of globin; two copies of each protein make the final particle. If we
consider globins that are used in an embryo and fetus even more globin versions are
present: gamma, delta, epsilon, zeta. Each version comes from a different gene and
because they are so similar we know that they all have a common ancestor and were
generated by gene duplication within a genome. So there are two basic means by which
homologues are generated: when species separate or when genes duplicate. Accordingly
for a pair of homologous genes or proteins we can say whether they are orthologues or
paralogues.
Orthologues are different versions of the same gene in different species, generated by
the fact that there was a common ancestor which also had the gene. For closely related
species this is usually a straightforward concept, but for more distantly related species the
definition becomes fuzzier, given that functions can diverge and genes can be copied
within an organism. For example, the human PAX6 gene (involved in formation of the iris
of the eye) has two orthologues in fruit fly,
4
named eyeless (ey) and twin of eyeless (toy).
Paralogues are genes that are related by the fact that they arose from gene duplication
within a genome. The eyeless and twin of eyeless genes already mentioned are good
examples; there was one ancestor gene and a duplication event generated the homologues.
This is not to say that the duplication occurred in the fruit fly we see today, but rather in
some ancestor that gave rise to many species, including the common laboratory fruit fly
Drosophila. Looking at the globin genes where we have six close paralogues it is obvious
that here there must have been multiple gene duplication events.