When we do not have direct experimental evidence for the structure of a protein, we can
closely related protein. This method is known as comparative modelling or homology
modelling. Strictly speaking even the structures of proteins determined by X-ray
crystallography and NMR can also be thought of as models, as prior information about
normal molecular geometries is used and there is always some uncertainty. However,
direct experimental data constrains the models much more (and generally crystallography
more so than NMR) and the more data you have the closer the model will be to the native
conformation.
Comparative modelling relies on the observation that when proteins evolve their
structures change more slowly than their amino acid sequence does. Hence, if we can
detect two proteins that have sufficiently similar amino acid sequences, and thus infer a
common ancestry, then we can be confident that they have structural similarities. Also, the
closer the sequence similarity between two such homologous proteins, the closer their
structural similarity will be. There are two basic steps when building a structural model
based upon the structure of a protein homologue: find a homologue of known structure
and then use the homologue’s structure to guide the building of a model.
For the query protein of unknown structure we use its sequence to find potential
homologues which do have a known structure, to act as the structural template. Template
detection uses a special kind of sequence alignment, which is especially sensitive and
accurate. Rather than using the regular, general substitution matrices like BLOSUM or
PAM, comparative modelling tends to use family-specific scoring matrices or, for even
better homologue detection, substitution tables that are specific to the structural
environment. Using environment-specific substitution data allows an alignment to be
sensitive to the way that amino acid changes in evolution depend on structure. For
example, serine swapping for proline is more common in turns than in alpha-helices,
because proline tends to disrupt helices. We know the structural environment for each
position in a sequence alignment because we know the structure of the template, and the
best guess is that the query sequence has the same structural environment, even if the
residues differ. Such structural environments are typically defined by combining side-
chain hydrogen bonding, solvent exposure
6
and secondary-structure categories. Thus, for
example, you would have substitution matrices for exposed alpha-helix; buried, side-chain
hydrogen-bonding alpha-helix; exposed beta-sheet, to name but a few.
Once a template is selected, and the sequence-structure alignment tells us which query
residues are equivalent to which template residues, the next step is to build the computer
model. Generally the backbone of the model is built first, then the side chains, given that
these may vary significantly between the query and template, and finally loops are
modelled in the regions that were not aligned, i.e. where there were gaps. The initial
model may be built by assembling fragments of the query structure using the
conformations borrowed from one or more templates. Alternatively, it may be built by the
application of spatial restraints, derived from the templates, on to the model of the query
polypeptide. The model is then subjected to a minimisation procedure to find the
conformation that best satisfies these restraints. A popular program for such restraint-
based modelling is MODELLER.
7
Do'stlaringiz bilan baham: