Python interaction with the database
In
Chapter 8
we read a PDB file and created one or more Structure objects as a result. In
this section we show how to put a Structure object into a database. The Python wrapping
around SQLite is handled in a package called sqlite3, which is automatically included with
the standard Python distribution. The Python wrapping around MySQL is called
MySQLdb, which requires a separate installation. Mechanically they both work in the
same way; the main differences in the Python modules correspond to any differences that
exist for the underlying SQL commands.
The first thing that we need to do is to make a connection to the database. For most
database implementations this optionally allows a username and password to be supplied,
and in a real-world situation this would normally be required. In SQLite the concept of a
username and password does not exist, and instead access is totally determined by the
user’s permissions to the file that contains the database. As an example, with sqlite3, to
connect to a database, we just have to do:
import sqlite3
connection = sqlite3.connect(database)
whereas with MySQLdb we would instead do, if a username, assumed to be stored in the
variable user, and password, assumed to be stored in the variable pwd, were required:
import MySQLdb
connection = MySQLdb.connect(db=database, user=user, passwd=pwd)
When we are finished with a connection we can just close it:
connection.close()
There is a further subtlety with databases, compared to storing data in regular files: at
any point you can decide to commit or roll back any changes you have made to the
database since the last time you committed changes, or, if you have not previously
committed any changes, since you connected to the database. This is an exceedingly
powerful ability. So if the changes are acceptable, then you can commit them:
connection.commit()
and otherwise you can discard them and go back to your previous state:
connection.rollback()
Note that if you close a connection without doing a commit then all changes will have
been lost. For many database implementations, connection is a relatively heavyweight
operation to carry out, so normally we would want to make a connection to the database,
carry out many commands and then disconnect, rather than connecting and disconnecting
for every single command.
A second step is required before any SQL command can be executed, which is that a
cursor needs to be created for the connection. This is much more lightweight, so can be
done on a regular basis. An SQL command is executed using the cursor, and the result is
then returned. In some sense, you can think of a cursor as an opaque handle into the
database, in much the same way that the Python open() command supplies an object to
handle a file on disk. To get hold of a cursor from a connection we just have to do:
cursor = connection.cursor()
When you are done with a cursor you can close it:
cursor.close()
The cursor allows execution of SQL commands. For example, suppose you want to find
the records in the Structure table with pdbId=‘1AFO’. Then you can do:
stmt = "select * from structure where pdbId='1AFO'"
cursor.execute(stmt)
Note that the statement has no semicolon (‘;’) at the end.
The execute() function does not return the result from executing the command. There
are a couple of further functions you can call to get at the actual result, if there is one
(some commands, such as insertions, never return a result). In the way that file handles
have two functions, readline() and readlines() to either read the next line in the file or all
the remaining lines, a cursor has the equivalent functionality for the result from a cursor
execution. So the function fetchone() gives the next record, or None if there are no more.
And the function fetchall() returns all the remaining records, or an empty list if there are
no more.
Each record in the result is a tuple, but what is in the tuple depends on the command
executed. For example, the above query uses ‘*’ in the query so returns all the columns for
the table, which in this case are four in number, so the tuple for every result record will be
of length four. Thus we could have the following loop:
for (structureId, name, pdbId, conformation) in cursor.fetchall():
print(pdbId, conformation) # or whatever
The way we have coded it, the above query just uses the constant ‘1AFO’ to do the
query. In most normal applications the conditions would be provided by variables rather
than by constants. This raises a slightly tricky issue. So it would be natural, from a Python
point of view, to do the following:
stmt = "select * from structure where pdbId='%s'" % pdbId
cursor.execute(stmt)
But this is not the recommended methodology, because it makes the application
vulnerable to an ‘SQL injection attack’. So you, as a developer, might think that the pdbId
variable is harmless, because it is just a PDB id. But unless you can guarantee the source
of the information in this variable, it might contain malicious SQL code. For example, if
you ask the user for a PDB id and they enter:
"0' or '0'='0"
the statement would then become:
stmt = "select * from structure where pdbId='0' or '0'='0'"
The second condition in the where clause is always true so this query returns all the
records in the Structure table, which is not the intended result. Imagine the trouble that
would be caused if instead of a query this were a delete command. The recommendation is
instead to use a placeholder for each value, and to set them in the execute function. In
sqlite3 the placeholder is a question mark (‘?’):
stmt = "select * from structure where pdbId=?"
values = (pdbId,)
cursor.execute(stmt, values)
In MySQLdb the placeholder is ‘%s’, otherwise the syntax is the same. So in
MySQLdb the statement would be
stmt = "select * from structure where pdbId=%s"
Note that there are no single quotation marks around the placeholder in the statement,
which is nice, because it means you do not have to worry about whether a column is a
string or not. The second argument to execute() has to be a tuple, even if only one
condition is being set. As an example with two conditions, consider the query (syntax for
sqlite3):
stmt = "select * from structure where pdbId=? and conformation=?"
values = (pdbId, conformation)
cursor.execute(stmt, values)
Do'stlaringiz bilan baham: |