Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	220/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 216 217 218 219 220 221 222 223 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Comparative modelling

When we do not have direct experimental evidence for the structure of a protein, we can

sometimes still come up with a good guess called a model if we know the structure of a

closely related protein. This method is known as comparative modelling or homology

modelling. Strictly speaking even the structures of proteins determined by X-ray

crystallography and NMR can also be thought of as models, as prior information about

normal molecular geometries is used and there is always some uncertainty. However,

direct experimental data constrains the models much more (and generally crystallography

more so than NMR) and the more data you have the closer the model will be to the native

conformation.

Comparative modelling relies on the observation that when proteins evolve their

structures change more slowly than their amino acid sequence does. Hence, if we can

detect two proteins that have sufficiently similar amino acid sequences, and thus infer a

common ancestry, then we can be confident that they have structural similarities. Also, the

closer the sequence similarity between two such homologous proteins, the closer their

structural similarity will be. There are two basic steps when building a structural model

based upon the structure of a protein homologue: find a homologue of known structure

and then use the homologue’s structure to guide the building of a model.

For the query protein of unknown structure we use its sequence to find potential

homologues which do have a known structure, to act as the structural template. Template

detection uses a special kind of sequence alignment, which is especially sensitive and

accurate. Rather than using the regular, general substitution matrices like BLOSUM or

PAM, comparative modelling tends to use family-specific scoring matrices or, for even

better homologue detection, substitution tables that are specific to the structural

environment. Using environment-specific substitution data allows an alignment to be

sensitive to the way that amino acid changes in evolution depend on structure. For

example, serine swapping for proline is more common in turns than in alpha-helices,

because proline tends to disrupt helices. We know the structural environment for each

position in a sequence alignment because we know the structure of the template, and the

best guess is that the query sequence has the same structural environment, even if the

residues differ. Such structural environments are typically defined by combining side-

chain hydrogen bonding, solvent exposure

and secondary-structure categories. Thus, for

example, you would have substitution matrices for exposed alpha-helix; buried, side-chain

hydrogen-bonding alpha-helix; exposed beta-sheet, to name but a few.

Once a template is selected, and the sequence-structure alignment tells us which query

residues are equivalent to which template residues, the next step is to build the computer

model. Generally the backbone of the model is built first, then the side chains, given that

these may vary significantly between the query and template, and finally loops are

modelled in the regions that were not aligned, i.e. where there were gaps. The initial

model may be built by assembling fragments of the query structure using the

conformations borrowed from one or more templates. Alternatively, it may be built by the

application of spatial restraints, derived from the templates, on to the model of the query

polypeptide. The model is then subjected to a minimisation procedure to find the

conformation that best satisfies these restraints. A popular program for such restraint-

based modelling is MODELLER.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 216 217 218 219 220 221 222 223 ... 514