Figure 26.2. An example of graphical interfaces to operate simple DNA sequence
analysis functionality. The two windows result from the Python examples constructed
using the Tkinter (left) and Qt/PySide (right) graphical object libraries. The widget styles
are the defaults for each situation and reflect what happens under the same Linux
operating system.
Using Tkinter
Before getting into the main example we will initially demonstrate ‘hello world’ code for
Tkinter. Naturally the examples assume that the Tkinter library is installed and available to
Python. See
http:www.cambridge.org/pythonforbiology
for Tkinter download and install
instructions, as well as links to full documentation. The module is called Tkinter in Python
2 and tkinter in Python 3, so we try the latter and if that does not work then try the former.
We then create the top-level rootWindow, which is a Tk() class of object.
try:
import tkinter
except:
import Tkinter as tkinter
rootWindow = tkinter.Tk()
Then we make a widget to put inside the window, which in this case is a simple text
label, of the tkinter.Label class. Note that we construct the label object using the
rootWindow as the first argument, which is the means of specifying that the label belongs
to the window; in GUI speak rootWindow is the parent and label is the child.
label = tkinter.Label(rootWindow, text='Hello World')
Once the label is created we must specify where in the window it will go. Here the pack
geometry manager is used (because it is simple), which by default adds widgets to their
parent from top to bottom. Unless we use a geometry manager the label will not appear,
because Tkinter will not know where to draw it.
label.pack()
Then to actually see the result the mainloop() function call is issued from the top-most
parent widget (often called the ‘root’). If we did not issue this function call the Python
interpreter would make all the graphical objects but the program would then immediately
end, without displaying anything. By invoking a main graphics loop the system is
informed that it should not end the program. Instead Tk remains active and waits to detect
graphical events, like clicking on a button or resizing a window.
rootWindow.mainloop()
With these basic principles in mind we move on to the definition of the graphical
interface class for simple DNA sequence analysis. As usual, we first make the appropriate
imports. The re module is imported because we will do a precautionary check of the DNA
sequences, to remove any whitespace. The Tkinter imports now include filedialog and
messagebox in Python 3, or equivalently tkFileDialog and tkMessageBox in Python 2,
which are pre-constructed Tk elements, for finding files and displaying pop-up messages.
These larger compound widgets exist to easily perform some of the most common
operations. The remaining imports are from the Sequence module, which refers to the
examples from
Chapter 11
in this book that can be downloaded from the supporting on-
line material.
import re
try:
import tkinter
from tkinter import filedialog, messagebox
except:
import Tkinter as tkinter
import tkFileDialog as filedialog
import tkMessageBox as messagebox
from Sequences import proteinTranslation, STANDARD_GENETIC_CODE
To construct the GUI a new class is defined as a subclass of tkinter.Tk. Thus it inherits
all of the properties of this Tk main window. Little of the Tk class will be changed, but
rather we will augment the new object definition to embed sub-widgets (text boxes, button
etc.) and add a number of bound methods, which include function calls to actually do the
specialist science operations. Immediately after the class statement the __init__ function is
redefined (which is called when a new object of this type is made). All Python objects will
have an __init__(), so here we are overwriting the one from the Tk superclass. However,
we invoke the __init__ for tkinter.Tk directly on self (which represents the current
instance of an object) as the first task, so the superclass initialisation is still done. The
reason to overwrite the function in this way is to keep the original functionality, i.e.
actually making a GUI window in this case, but at the same time create a place where
customisation can occur. Hence, in the remainder of the __init__ function we add extra
code that creates this specialist window, including adding any internal sub-widgets.
class SequenceTkGui(tkinter.Tk):
def __init__(self):
tkinter.Tk.__init__(self)
Unlike the simple ‘hello world’ example above we will use a different geometry
manager called grid to create the layout of the widgets inside the top-level window. Using
a grid is an easier way to manage things overall (in the authors’ own experience) because
it is easier to predict the results. As the name suggests, using grid means that we will be
placing widgets in the main window by specifying the row and column they lie within.
Also, when required, a widget can be made to span multiple rows or columns,
4
which adds
lots of flexibility.
The next command configures the behaviour of the rows and the columns of the grid
system within which the graphical widgets will be placed. By default all rows and
columns have weight=0, which means that they do not expand to fill any extra space
beyond the immediate size of the item they contain. Setting the weight=1 below
specifically for column 5, row 1 and row 4 (counting from zeros) means that these will be
the expanding rows and column in our window. Hence, when the main window is resized
these will resize too. If required, weights greater than 1 could be used if one part needs to
expand more than another.
self.grid_columnconfigure(5, weight=1)
self.grid_rowconfigure(1, weight=1)
self.grid_rowconfigure(4, weight=1)
The first graphical widget that is added will be a text label, as was demonstrated above.
In this example widgets are added to the window class in display order, from top to
bottom and left to right. This is just good practice to make visual inspection easier and not
an absolute requirement. The tkinter.Label is created as belonging to self and having the
required text. The object is assigned to the self.label1 variable so that we can access it
anywhere inside the class (without self it would only be accessible in the immediate
function). In keeping with the intention to use a grid layout we invoke the .grid() call,
which is available to all Tk widgets. As the arguments indicate, the label is placed at grid
position row=0, column=0 and spans six columns, i.e. the whole of the top row. The last
sticky argument states how the widget inside the grid will adhere to the edges of its cell.
The system Tkinter uses is based on the cardinal compass coordinates, i.e. North, South,
East and West. This can seem a bit odd, given that compass directions depend on which
way you are facing, but can be imagined if you are facing a map with North at the top.
Accordingly the specification tkinter.EW here means to stick to both the left- and right-
hand edges.
self.label1 = tkinter.Label(self, text='Enter 1-Letter DNA Sequence:')
self.label1.grid(row=0, column=0, columnspan=6, sticky=tkinter.EW)
The next widget is a tkinter.Text, which will allow us to display multiple lines of text. It
is placed into the grid on the next row with NSEW stickiness, i.e. to stick to all four edges
of the grid cell.
self.seqTextBox = tkinter.Text(self)
self.seqTextBox.grid(row=1, column=0, columnspan=6,
sticky=tkinter.NSEW)
Below the text box comes a row of buttons that the user can ‘push’ by clicking with the
mouse cursor.
The button objects are defined using the tkinter.Button class and assigned to respective
self. variables. The arguments for constructing the buttons are self (the parent), some text
to display on the button and a command. The command is the name of a Python callback
function which will be triggered when the user pushes the button. Here the callback
functions are custom ones that will be defined later in the class structure. Naturally the
text of the buttons reflects the functions they call. The functions take no arguments, but
arguments may be added via a lambda function.
5
All of the buttons are placed in separate
columns within row 2, sticking to the left of the grid cell (tkinter.W).
self.clearButton = tkinter.Button(self, text='Clear',
command=self.clearSeq)
self.clearButton.grid(row=2, column=0, sticky=tkinter.W)
self.loadButton = tkinter.Button(self, text='Load FASTA',
command=self.loadFasta)
self.loadButton.grid(row=2, column=1, sticky=tkinter.W)
self.transButton = tkinter.Button(self, text='Translate',
command=self.seqTranslate)
self.transButton.grid(row=2, column=2, sticky=tkinter.W)
self.compButton = tkinter.Button(self, text='Composition',
command=self.seqComposition)
self.compButton.grid(row=2, column=3, sticky=tkinter.W)
self.findButton = tkinter.Button(self, text='Find:',
command=self.seqFind)
self.findButton.grid(row=2, column=4, sticky=tkinter.EW)
The last widget in row 2 is a tkinter.Entry rather than a button. An Entry object allows
the user to type in a small piece of text. This will be used to enter a query DNA sequence,
which will be searched for within the main sequence.
self.findEntry = tkinter.Entry(self)
self.findEntry.grid(row=2, column=5, sticky=tkinter.EW)
The next two rows contain another Label, giving the title for the section, and second
large Text box to display the textual output for the user. Both of these widgets span all six
columns, remembering that columns 0 to 5 inclusive were filled above.
self.label2 = tkinter.Label(self, text='Text output:')
self.label2.grid(row=3, column=0, columnspan=6, sticky=tkinter.W)
self.outTextBox = tkinter.Text(self)
self.outTextBox.grid(row=4, column=0, columnspan=6,
sticky=tkinter.NSEW)
The final widget is placed in a row on its own. This is another Button and it calls the
self.destroy function. This function is inbuilt into all Tk() objects and provides a means of
removing the main window, which stops the Tkinter mainloop() and so causes the
program to end.
self.closeButton = tkinter.Button(self, text='Quit',
command=self.destroy)
self.closeButton.grid(row=5, column=5, sticky=tkinter.EW)
self.closeButton.config(bg='yellow') # Yellow background
With the widget construction done the remainder of the class involves defining the
functions that underpin the graphics to make things work. All of the functions at least take
self as an argument so they can access the self. names from within the object. Although, as
discussed previously, the self is not passed in brackets when calling the function; rather it
is implicit because of the dot notation.
First are functions to clear and set the DNA sequence text within self.seqTextBox (the
upper text area). Note that Tk uses a string (row.column) based system to identify parts of
the text within the Text widget; ‘0.0’ is the beginning, and tkinter.END represents the end,
wherever that is. Thus clearing the sequence means applying delete() to all the text. When
setting the sequence the text box is cleared before the text that was passed in as an
argument is added.
def clearSeq(self):
self.seqTextBox.delete('0.0', tkinter.END)
def setSequence(self, text):
self.clearSeq()
self.seqTextBox.insert(tkinter.END, text)
The function to get the DNA sequence from the upper box extracts all the widget text,
between start and end points. The re (regular expression module; see
Appendix 5
) is used
to tidy the sequence by removing any whitespace, including tabs and line returns. The
sequence is also forced to be upper case. Mostly these checks are present as examples to
remind us that whenever the user provides input that is supposed to have some meaning
(so here it should be a DNA sequence not a shopping list) our program should aim to
detect or remove nonsense. At the end of the function the curated sequence string seq is
returned.
def getSequence(self):
seq = self.seqTextBox.get('0.0', tkinter.END)
seq = re.sub('\s+','',seq)
seq = seq.upper()
return seq
Two functions control the contents of the lower text area, self.outTextBox. The
showText() function is for adding text to the box. This is similar to seqSequence() but we
do not clear the text area first. Also, an explicit check is made to ensure that all added text
ends with a ‘\n’ (new line) character; this function adds a new line each time. The
clearOutput() function removes all output text using the Text.delete() call with ranges, as
mentioned for clearSeq().
def showText(self, text):
if text[-1] != '\n':
text += '\n'
self.outTextBox.insert(tkinter.END, text)
def clearOutput(self):
self.outTextBox.delete('0.0', tkinter.END)
With the functions to control the text areas defined, attention now turns to the functions
that are called by pressing the buttons, i.e. callbacks connected via command. The first of
these is a function to load a sequence from a FASTA-format file. It uses functionality from
BioPython, as discussed in
Chapter 11
, to read the entries. Here we only take the first
sequence from the file, i.e. there is a break in the loop, but we could take more sequences
if the GUI was adjusted accordingly. The notable part of this function is that it uses the
filedialog, which comes with Tkinter and allows us to easily create a widget that lets the
user select a file. The .askopenfile() call actually displays the file-requesting widget and
gives back an open file object (same as if using the open() keyword), although we have to
check for None if no file was selected.
def loadFasta(self):
fileObj = filedialog.askopenfile(parent=self, mode='rU',
title='Choose a FASTA file')
if fileObj:
from Bio import SeqIO
for entry in SeqIO.parse(fileObj, 'fasta'):
self.setSequence(entry.seq)
break
fileObj.close()
Next comes the first scientific function of the class. As the name hints seqTranslate()
will translate the DNA sequence (in the upper panel) into three-letter protein sequences
that are displayed in the lower text panel. The self.getSequence() function is called to
extract the currently displayed DNA sequence. The output area is cleared and we use
showText() to display a title. Then comes a for loop inside which the sequence translation
occurs. A loop is used so that we can define indent (as 0, 1, 2) to specify where in the
DNA sequence we start translating, remembering that a protein’s amino acids are coded by
three DNA bases. Thus by using the loop we will get translations of all three forward
reading frames.
6
def seqTranslate(self):
seq = self.getSequence()
self.clearOutput()
self.showText('DNA sequence')
self.showText(seq)
self.showText('Protein sequence')
for indent in range(3):
In the loop the protein sequence is obtained by calling the proteinTranslation() defined
earlier in the book. We translate with the standard genetic code, so that is passed in. The
GUI could be expanded so that the user may select from among several genetic codes. The
translated protein sequence is initially a list of Python strings, but is then joined into one
long line of text. The variable spaces is defined, which will act as padding in the output, to
move the indentation of each subsequent translated reading frame one space to the right,
i.e. so the amino acid codes are staggered and lie exactly under their DNA codon triplet.
At the end of the loop the elements are combined to give the output text, which is
displayed in the GUI using showText().
proteinSeq = proteinTranslation(seq[indent:], STANDARD_GENETIC_CODE)
proteinSeq = ''.join(proteinSeq)
spaces = ' ' * indent
text = 'Reading frame %d\n%s%s' % (indent, spaces, proteinSeq)
self.showText(text)
A second scientific function is one that gets the DNA sequence and counts the different
letters. Each letter is used as a key to the counts dictionary. The letters are then sorted and
for each kind the average composition, as a percentage, is calculated. The data is then used
to make a line of text and passed to self.showText() for display.
def seqComposition(self):
self.clearOutput()
seq = self.getSequence()
n = 0.0
counts = {}
for letter in seq:
counts[letter] = counts.get(letter, 0) + 1
n += 1.0
letters = counts.keys()
letters.sort()
text = "Composition:"
for letter in letters:
text += ' %s;%.2f%%' % (letter, counts[letter] * 100 / n)
self.showText(text)
The last function in the SequenceTkGui class is used to locate a query sub-sequence
within the main DNA sequence. The query sequence is obtained using the .get() call that
goes with the self.findEntry widget; this gives back the contents of the box. Any
whitespace at the edges of the query is removed with .strip(). Then a check is made to
ensure that we are not searching with something blank. Thus if query is empty we use
messageBox to create a pre-constructed Tk widget and inform the user that the search
could not be done. After a warning the return statement immediately quits the function.
Otherwise, if the search query was defined, the main sequence, seq, is fetched. Then it is a
relatively simple matter to see if the query sequence is present. If it is we loop through the
main sequence to find all occurrences, i.e. query is compared with seq[i:i+win], where i is
the position and win is the query width. Whether the search made a match or not is
indicated by the text that is passed into self.showText().
def seqFind(self):
self.clearOutput()
query = self.findEntry.get()
query = query.strip()
if not query:
messagebox.showwarning("Warning", "Search sequence was blank")
return
seq = self.getSequence()
if query in seq:
text = "Locations of %s" % (query)
self.showText(text)
win = len(query)
for i in range(len(seq)-win):
if seq[i:i+win] == query:
self.showText(' %d' % i)
else:
text = "Sub-sequence %s not found" % (query)
self.showText(text)
Finally, at the end of the class and function definitions we can write testing the code.
Note that this is subject to the __name__ == ‘__main__’ clause, which only runs the test if
the Python file is used directly. This allows for the SequenceTkGui to be imported by
other Python modules without the test code being run. The testing is done by creating
window as a SequenceTkGui class object, and then calling the Tk() main loop, which the
class inherits from, to view the graphics.
if __name__ == '__main__':
window = SequenceTkGui()
window.mainloop()
Do'stlaringiz bilan baham: |