The converse operation from the previous section is to load a structure from the database
into a Structure object. Standard SQL allows all the data to be fetched in one query of the
database, and this is the way it should be done for reasons of efficiency. The Structure
structure, chain, residue, atom WHERE structure.pdbId=? AND
structure.conformation=? AND structure.id=chain.structureId AND
chain.id=residue.chainId AND residue.id=atom.residueId ORDER BY
chain.id, residue.id, atom.id"
Note that all the tables are included in the FROM clause and all the parent-child ids are
explicitly checked as being equal in the WHERE clause (these are table ‘joins’), in
addition to the usual check on pdbId and conformation. Where the column names are not
unique we have to include the table name as a qualifier. Here, for the sake of clarity, we
have always included the table name, even when the column name is unique.
In the function we first get the cursor object.
from Structures import Structure, Chain, Residue, Atom
def getStructureFromDb(connection, pdbId, conformation=1,
placeHolder='%s'):
cursor = connection.cursor()
We wrap the code in a try/finally block so that we always close the cursor at the end,
whether or not there is an error. We then get the complete record from the database. If that
fails then we raise an error exception.
try:
stmt = "SELECT structure.name, chain.molType, chain.code,
residue.seqId, residue.code, atom.name, atom.x, atom.y, atom.z,
atom.element FROM structure, chain, residue, atom WHERE
structure.pdbId=%s AND structure.conformation=%s AND
structure.id=chain.structureId AND chain.id=residue.chainId AND
residue.id=atom.residueId ORDER BY chain.id, residue.id, atom.id"
stmt = formatStatement(stmt, placeHolder)
cursor.execute(stmt, (pdbId, conformation))
result = cursor.fetchall()
if not result:
msg = 'structure with (pdbId=%s, conformation=%s) not known'
raise Exception( msg % (pdbId, conformation))
We then loop over all the records in the result. Each record represents one atom, so the
code is similar in style to that in the getStructuresFromFile() function in
Chapter 8
. We
create the structure object the first time through the loop.
structure = chain = residue = atom = None
for (structureName, chainMolType, chainCode,
residueSeqId, residueCode, atomName,
atomX, atomY, atomZ, atomElement) in result:
if not structure:
structure = Structure(structureName, conformation, pdbId)
We then create the residue and chain objects if they are needed.
if not chain or chain.code != chainCode:
chain = Chain(structure, chainCode, chainMolType)
if not residue or residue.chain != chain \
or residue.seqId != residueSeqId:
residue = Residue(chain, residueSeqId, residueCode)
We then (always) create an atom object.
coords = (atomX, atomY, atomZ)
Atom(residue, atomName, coords, atomElement)
At the end we close the cursor and return the structure.
finally:
cursor.close()
return structure
We add some test code. For sqlite3 we have:
if __name__ == '__main__':
import sys
if len(sys.argv) not in (3, 4):
print('need to specify database, PDB id, [conformation=1]')
sys.exit(1)
database = sys.argv[1]
pdbId = sys.argv[2]
if len(sys.argv) == 3:
conformation = 1
else:
conformation = int(sys.argv[3])
import sqlite3
connection = sqlite3.connect(database)
placeHolder = '?'
try:
structure = getStructureFromDb(connection, pdbId,
conformation, placeHolder)
finally:
connection.close()
For MySQLdb we instead have the slightly more complicated:
import MySQLdb
import getpass
user = getpass.getuser()
pwd = getpass.getpass()
connection = MySQLdb.connect(db=database, user=user, passwd=pwd)
placeHolder = '%s'
The test code does not do anything with the generated Structure object but the code
could be extended to do so. More generally, once you have a Structure object then you can
do all the manipulations on a structure that were discussed in
Chapter 8
. Having all the
structures in a database makes doing this on many structures in one go a relatively easy
matter.
Do'stlaringiz bilan baham: