U
G
G
W
Trp
Tryptophan
A
G
G
C
U
U
L
Leu
Leucine
G
U
U
V
Val
C
U
C
G
U
C
C
U
A
G
U
A
C
U
G
G
U
G
C
C
U
P
Pro
Proline
G
C
U
A
Ala
C
C
C
G
C
C
C
C
A
G
C
A
C
C
G
G
C
G
C
A
U
H
His
Histidine
G
A
U
D
Asp
C
A
C
G
A
C
C
A
A
Q
Gln
Glutamine
G
A
A
E
Glu
C
A
G
G
A
G
C
G
U
R
Arg
Arginine
G
G
U
G
Gly
C
G
C
G
G
C
C
G
A
G
G
A
C
G
G
G
G
G
An important complication to the way in which RNA transmits its sequence message
comes from the fact that it usually has large non-coding sections removed before it
becomes a mature mRNA and its sequence is translated into protein. The regions of an
RNA chain that are removed are called introns and those that remain are called exons. The
RNA is said to be spliced: the ends of the exons are joined as the introns are lost. Introns
are very common in the human genome and their presence makes it significantly more
difficult to detect which bits of a gene are actually used to make protein sequences.
Even though DNA, RNA and protein really are sequences of chemical compounds,
linked together into a chain, it is often sufficient to represent them simply as a sequence of
letters or residue codes. You can perform many useful analyses simply by knowing what
the order of amino acids or nucleotides is, without having to consider all of the atoms that
are present in the real molecule.
DNA sequencing
Today the majority of the sequence information for DNA, RNA and protein in various
organisms comes from the sequencing of just DNA. Because of the rules of nucleotide
pairing and because of the genetic code (three nucleotides give one amino acid) it is easy
to determine an RNA and protein sequence once you know the gene-coding regions in the
DNA. It may be difficult to work out where the coding regions of a gene start and end in a
large section of DNA, but the conversion to the different types of sequence is trivial.
DNA is sequenced with a special kind of chemical reaction, which these days is often
performed by a computerised machine. In essence many copies of a DNA strand are made
using an enzyme (a protein that catalyses the required chemical reaction), and the
nucleotides that are added to the end of the growing strands are detected. A common way
(used in Sanger and Illumina sequencing methods) of detecting the nucleotide added is to
have the reaction occasionally stop, when an inhibiting compound is incorporated at the
growing end. Here there are four different inhibitors that take the place of each of the
DNA nucleotides. The aim is to get some of the copied DNA strands to stop growing at
every single nucleotide position. The sequence is revealed by detecting which inhibitor
stopped the chain growing at each position; i.e. which nucleotide is at the end of each
length of strand. The different inhibitors at the end of the DNA strands are designed to
glow with different colours to make them easy to identify. The reaction can happen in
distinct cycles (e.g. Illumina method) to give subsequent nucleotide reads, or the DNA
strands can be sorted by size so that the end nucleotide can be detected afterwards (the
Sanger method).
The actual DNA that is used in the sequencing reaction commonly comes from an
organism’s set of chromosomes (which collectively are referred to as a genome), but it is
also possible to have DNA which comes from the amplification of a small section of a
genome (i.e. a small quantity is copied to give a large amount) or to use DNA that has
been copied from RNA (i.e. opposite to the usual flow of information) using a special
enzyme.
Do'stlaringiz bilan baham: