Python Programming for Biology: Bioinformatics and Beyond


Figure 26.2.  An example of graphical interfaces to operate simple DNA sequence



Download 7,75 Mb.
Pdf ko'rish
bet432/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   428   429   430   431   432   433   434   435   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 26.2.  An example of graphical interfaces to operate simple DNA sequence

analysis functionality. The two windows result from the Python examples constructed

using the Tkinter (left) and Qt/PySide (right) graphical object libraries. The widget styles

are the defaults for each situation and reflect what happens under the same Linux

operating system.




Using Tkinter

Before getting into the main example we will initially demonstrate ‘hello world’ code for

Tkinter. Naturally the examples assume that the Tkinter library is installed and available to

Python.  See

http:www.cambridge.org/pythonforbiology

 for  Tkinter  download  and  install

instructions, as well as links to full documentation. The module is called Tkinter in Python

2 and tkinter in Python 3, so we try the latter and if that does not work then try the former.

We then create the top-level rootWindow, which is a Tk() class of object.

try:


import tkinter

except:


import Tkinter as tkinter

rootWindow = tkinter.Tk()

Then  we  make  a  widget  to  put  inside  the  window,  which  in  this  case  is  a  simple  text

label,  of  the  tkinter.Label  class.  Note  that  we  construct  the  label  object  using  the

rootWindow as the first argument, which is the means of specifying that the label belongs

to the window; in GUI speak rootWindow is the parent and label is the child.

label = tkinter.Label(rootWindow, text='Hello World')

Once the label is created we must specify where in the window it will go. Here the pack

geometry  manager  is  used  (because  it  is  simple),  which  by  default  adds  widgets  to  their

parent from top to bottom. Unless we use a geometry manager the label will not appear,

because Tkinter will not know where to draw it.

label.pack()

Then to actually see the result the mainloop() function call is issued from the top-most

parent  widget  (often  called  the  ‘root).  If  we  did  not  issue  this  function  call  the  Python

interpreter would make all the graphical objects but the program would then immediately

end,  without  displaying  anything.  By  invoking  a  main  graphics  loop  the  system  is

informed that it should not end the program. Instead Tk remains active and waits to detect

graphical events, like clicking on a button or resizing a window.

rootWindow.mainloop()

With  these  basic  principles  in  mind  we  move  on  to  the  definition  of  the  graphical

interface class for simple DNA sequence analysis. As usual, we first make the appropriate

imports. The re module is imported because we will do a precautionary check of the DNA

sequences,  to  remove  any  whitespace.  The  Tkinter  imports  now  include  filedialog  and

messagebox  in  Python  3,  or  equivalently  tkFileDialog  and  tkMessageBox  in  Python  2,

which are pre-constructed Tk elements, for finding files and displaying pop-up messages.

These  larger  compound  widgets  exist  to  easily  perform  some  of  the  most  common

operations.  The  remaining  imports  are  from  the  Sequence  module,  which  refers  to  the

examples from

Chapter 11

 in  this  book  that  can  be  downloaded  from  the  supporting  on-

line material.

import re




try:

import tkinter

from tkinter import filedialog, messagebox

except:


import Tkinter as tkinter

import tkFileDialog as filedialog

import tkMessageBox as messagebox

from Sequences import proteinTranslation, STANDARD_GENETIC_CODE

To construct the GUI a new class is defined as a subclass of tkinter.Tk. Thus it inherits

all  of  the  properties  of  this  Tk  main  window.  Little  of  the  Tk  class  will  be  changed,  but

rather we will augment the new object definition to embed sub-widgets (text boxes, button

etc.) and add a number of bound methods, which include function calls to actually do the

specialist science operations. Immediately after the class statement the __init__ function is

redefined (which is called when a new object of this type is made). All Python objects will

have an __init__(), so here we are overwriting the one from the Tk superclass. However,

we  invoke  the  __init__  for  tkinter.Tk  directly  on  self  (which  represents  the  current

instance  of  an  object)  as  the  first  task,  so  the  superclass  initialisation  is  still  done.  The

reason  to  overwrite  the  function  in  this  way  is  to  keep  the  original  functionality,  i.e.

actually  making  a  GUI  window  in  this  case,  but  at  the  same  time  create  a  place  where

customisation  can  occur.  Hence,  in  the  remainder  of  the  __init__  function  we  add  extra

code that creates this specialist window, including adding any internal sub-widgets.

class SequenceTkGui(tkinter.Tk):

def __init__(self):

tkinter.Tk.__init__(self)

Unlike  the  simple  ‘hello  world’  example  above  we  will  use  a  different  geometry

manager called grid to create the layout of the widgets inside the top-level window. Using

a grid is an easier way to manage things overall (in the authors’ own experience) because

it is easier to predict the results. As the name suggests, using grid means that we will be

placing  widgets  in  the  main  window  by  specifying  the  row  and  column  they  lie  within.

Also, when required, a widget can be made to span multiple rows or columns,

4

which adds



lots of flexibility.

The  next  command  configures  the  behaviour  of  the  rows  and  the  columns  of  the  grid

system  within  which  the  graphical  widgets  will  be  placed.  By  default  all  rows  and

columns  have  weight=0,  which  means  that  they  do  not  expand  to  fill  any  extra  space

beyond  the  immediate  size  of  the  item  they  contain.  Setting  the  weight=1  below

specifically for column 5, row 1 and row 4 (counting from zeros) means that these will be

the expanding rows and column in our window. Hence, when the main window is resized

these will resize too. If required, weights greater than 1 could be used if one part needs to

expand more than another.

self.grid_columnconfigure(5, weight=1)

self.grid_rowconfigure(1, weight=1)

self.grid_rowconfigure(4, weight=1)




The first graphical widget that is added will be a text label, as was demonstrated above.

In  this  example  widgets  are  added  to  the  window  class  in  display  order,  from  top  to

bottom and left to right. This is just good practice to make visual inspection easier and not

an absolute requirement. The tkinter.Label  is  created  as  belonging  to  self and having the

required  text.  The  object  is  assigned  to  the  self.label1  variable  so  that  we  can  access  it

anywhere  inside  the  class  (without  self  it  would  only  be  accessible  in  the  immediate

function).  In  keeping  with  the  intention  to  use  a  grid  layout  we  invoke  the  .grid()  call,

which is available to all Tk widgets. As the arguments indicate, the label is placed at grid

position row=0, column=0 and spans six columns, i.e. the whole of the top row. The last

sticky argument states how the widget inside the grid will adhere to the edges of its cell.

The system Tkinter uses is based on the cardinal compass coordinates, i.e. North, South,

East and West. This can seem a bit odd, given that compass directions depend on which

way  you  are  facing,  but  can  be  imagined  if  you  are  facing  a  map  with  North  at  the  top.

Accordingly  the  specification  tkinter.EW  here  means  to  stick  to  both  the  left-  and  right-

hand edges.

self.label1 = tkinter.Label(self, text='Enter 1-Letter DNA Sequence:')

self.label1.grid(row=0, column=0, columnspan=6, sticky=tkinter.EW)

The next widget is a tkinter.Text, which will allow us to display multiple lines of text. It

is placed into the grid on the next row with NSEW stickiness, i.e. to stick to all four edges

of the grid cell.

self.seqTextBox = tkinter.Text(self)

self.seqTextBox.grid(row=1, column=0, columnspan=6,

sticky=tkinter.NSEW)

Below the text box comes a row of buttons that the user can ‘push’ by clicking with the

mouse cursor.

The button objects are defined using the tkinter.Button class and assigned to respective

self. variables. The arguments for constructing the buttons are self (the parent), some text

to display on the button and a command. The command is the name of a Python callback

function  which  will  be  triggered  when  the  user  pushes  the  button.  Here  the  callback

functions  are  custom  ones  that  will  be  defined  later  in  the  class  structure.  Naturally  the

text  of  the  buttons  reflects  the  functions  they  call.  The  functions  take  no  arguments,  but

arguments may be added via a lambda function.

5

All of the buttons are placed in separate



columns within row 2, sticking to the left of the grid cell (tkinter.W).

self.clearButton = tkinter.Button(self, text='Clear',

command=self.clearSeq)

self.clearButton.grid(row=2, column=0, sticky=tkinter.W)

self.loadButton = tkinter.Button(self, text='Load FASTA',

command=self.loadFasta)

self.loadButton.grid(row=2, column=1, sticky=tkinter.W)

self.transButton = tkinter.Button(self, text='Translate',

command=self.seqTranslate)

self.transButton.grid(row=2, column=2, sticky=tkinter.W)




self.compButton = tkinter.Button(self, text='Composition',

command=self.seqComposition)

self.compButton.grid(row=2, column=3, sticky=tkinter.W)

self.findButton = tkinter.Button(self, text='Find:',

command=self.seqFind)

self.findButton.grid(row=2, column=4, sticky=tkinter.EW)

The last widget in row 2 is a tkinter.Entry rather than a button. An Entry object allows

the user to type in a small piece of text. This will be used to enter a query DNA sequence,

which will be searched for within the main sequence.

self.findEntry = tkinter.Entry(self)

self.findEntry.grid(row=2, column=5, sticky=tkinter.EW)

The  next  two  rows  contain  another  Label,  giving  the  title  for  the  section,  and  second

large Text box to display the textual output for the user. Both of these widgets span all six

columns, remembering that columns 0 to 5 inclusive were filled above.

self.label2 = tkinter.Label(self, text='Text output:')

self.label2.grid(row=3, column=0, columnspan=6, sticky=tkinter.W)

self.outTextBox = tkinter.Text(self)

self.outTextBox.grid(row=4, column=0, columnspan=6,

sticky=tkinter.NSEW)

The final widget is placed in a row on its own. This is another Button and it calls the

self.destroy function. This function is inbuilt into all Tk() objects and provides a means of

removing  the  main  window,  which  stops  the  Tkinter  mainloop()  and  so  causes  the

program to end.

self.closeButton = tkinter.Button(self, text='Quit',

command=self.destroy)

self.closeButton.grid(row=5, column=5, sticky=tkinter.EW)

self.closeButton.config(bg='yellow') # Yellow background

With  the  widget  construction  done  the  remainder  of  the  class  involves  defining  the

functions that underpin the graphics to make things work. All of the functions at least take

self as an argument so they can access the self. names from within the object. Although, as

discussed previously, the self is not passed in brackets when calling the function; rather it

is implicit because of the dot notation.

First are functions to clear and set the DNA sequence text within self.seqTextBox  (the

upper text area). Note that Tk uses a string (row.column) based system to identify parts of

the text within the Text widget; ‘0.0’ is the beginning, and tkinter.END represents the end,

wherever that is. Thus clearing the sequence means applying delete() to all the text. When

setting  the  sequence  the  text  box  is  cleared  before  the  text  that  was  passed  in  as  an

argument is added.

def clearSeq(self):

self.seqTextBox.delete('0.0', tkinter.END)

def setSequence(self, text):



self.clearSeq()

self.seqTextBox.insert(tkinter.END, text)

The function to get the DNA sequence from the upper box extracts all the widget text,

between start and end points. The re (regular expression module; see

Appendix 5

) is used

to  tidy  the  sequence  by  removing  any  whitespace,  including  tabs  and  line  returns.  The

sequence is also forced to be upper case. Mostly these checks are present as examples to

remind us that whenever the user provides input that is supposed to have some meaning

(so  here  it  should  be  a  DNA  sequence  not  a  shopping  list)  our  program  should  aim  to

detect or remove nonsense. At the end of the function the curated sequence string seq  is

returned.

def getSequence(self):

seq = self.seqTextBox.get('0.0', tkinter.END)

seq = re.sub('\s+','',seq)

seq = seq.upper()

return seq

Two  functions  control  the  contents  of  the  lower  text  area,  self.outTextBox.  The

showText() function is for adding text to the box. This is similar to seqSequence() but we

do not clear the text area first. Also, an explicit check is made to ensure that all added text

ends  with  a  ‘\n’  (new  line)  character;  this  function  adds  a  new  line  each  time.  The

clearOutput() function removes all output text using the Text.delete() call with ranges, as

mentioned for clearSeq().

def showText(self, text):

if text[-1] != '\n':

text += '\n'

self.outTextBox.insert(tkinter.END, text)

def clearOutput(self):

self.outTextBox.delete('0.0', tkinter.END)

With the functions to control the text areas defined, attention now turns to the functions

that are called by pressing the buttons, i.e. callbacks connected via command. The first of

these is a function to load a sequence from a FASTA-format file. It uses functionality from

BioPython,  as  discussed  in

Chapter  11

,  to  read  the  entries.  Here  we  only  take  the  first

sequence from the file, i.e. there is a break in the loop, but we could take more sequences

if  the  GUI  was  adjusted  accordingly.  The  notable  part  of  this  function  is  that  it  uses  the

filedialog, which comes with Tkinter and allows us to easily create a widget that lets the

user  select  a  file.  The  .askopenfile()  call  actually  displays  the  file-requesting  widget  and

gives back an open file object (same as if using the open() keyword), although we have to

check for None if no file was selected.

def loadFasta(self):

fileObj = filedialog.askopenfile(parent=self, mode='rU',



title='Choose a FASTA file')

if fileObj:

from Bio import SeqIO

for entry in SeqIO.parse(fileObj, 'fasta'):

self.setSequence(entry.seq)

break


fileObj.close()

Next  comes  the  first  scientific  function  of  the  class.  As  the  name  hints  seqTranslate()

will  translate  the  DNA  sequence  (in  the  upper  panel)  into  three-letter  protein  sequences

that  are  displayed  in  the  lower  text  panel.  The  self.getSequence()  function  is  called  to

extract  the  currently  displayed  DNA  sequence.  The  output  area  is  cleared  and  we  use

showText() to display a title. Then comes a for loop inside which the sequence translation

occurs.  A  loop  is  used  so  that  we  can  define  indent  (as  0,  1,  2)  to  specify  where  in  the

DNA sequence we start translating, remembering that a protein’s amino acids are coded by

three  DNA  bases.  Thus  by  using  the  loop  we  will  get  translations  of  all  three  forward

reading frames.

6

def seqTranslate(self):



seq = self.getSequence()

self.clearOutput()

self.showText('DNA sequence')

self.showText(seq)

self.showText('Protein sequence')

for indent in range(3):

In the loop the protein sequence is obtained by calling the proteinTranslation() defined

earlier in the book. We translate with the standard genetic code, so that is passed in. The

GUI could be expanded so that the user may select from among several genetic codes. The

translated protein sequence is initially a list of Python strings, but is then joined into one

long line of text. The variable spaces is defined, which will act as padding in the output, to

move the indentation of each subsequent translated reading frame one space to the right,

i.e. so the amino acid codes are staggered and lie exactly under their DNA codon triplet.

At  the  end  of  the  loop  the  elements  are  combined  to  give  the  output  text,  which  is

displayed in the GUI using showText().

proteinSeq = proteinTranslation(seq[indent:], STANDARD_GENETIC_CODE)

proteinSeq = ''.join(proteinSeq)

spaces = ' ' * indent

text = 'Reading frame %d\n%s%s' % (indent, spaces, proteinSeq)

self.showText(text)

A second scientific function is one that gets the DNA sequence and counts the different

letters. Each letter is used as a key to the counts dictionary. The letters are then sorted and

for each kind the average composition, as a percentage, is calculated. The data is then used



to make a line of text and passed to self.showText() for display.

def seqComposition(self):

self.clearOutput()

seq = self.getSequence()

n = 0.0

counts = {}

for letter in seq:

counts[letter] = counts.get(letter, 0) + 1

n += 1.0

letters = counts.keys()

letters.sort()

text = "Composition:"

for letter in letters:

text += ' %s;%.2f%%' % (letter, counts[letter] * 100 / n)

self.showText(text)

The  last  function  in  the  SequenceTkGui  class  is  used  to  locate  a  query  sub-sequence

within the main DNA sequence. The query sequence is obtained using the .get() call that

goes  with  the  self.findEntry  widget;  this  gives  back  the  contents  of  the  box.  Any

whitespace  at  the  edges  of  the  query  is  removed  with  .strip().  Then  a  check  is  made  to

ensure  that  we  are  not  searching  with  something  blank.  Thus  if  query  is  empty  we  use

messageBox  to  create  a  pre-constructed  Tk  widget  and  inform  the  user  that  the  search

could  not  be  done.  After  a  warning  the  return  statement  immediately  quits  the  function.

Otherwise, if the search query was defined, the main sequence, seq, is fetched. Then it is a

relatively simple matter to see if the query sequence is present. If it is we loop through the

main sequence to find all occurrences, i.e. query is compared with seq[i:i+win], where i is

the  position  and  win  is  the  query  width.  Whether  the  search  made  a  match  or  not  is

indicated by the text that is passed into self.showText().

def seqFind(self):

self.clearOutput()

query = self.findEntry.get()

query = query.strip()

if not query:

messagebox.showwarning("Warning", "Search sequence was blank")

return


seq = self.getSequence()

if query in seq:

text = "Locations of %s" % (query)

self.showText(text)

win = len(query)

for i in range(len(seq)-win):

if seq[i:i+win] == query:



self.showText(' %d' % i)

else:


text = "Sub-sequence %s not found" % (query)

self.showText(text)

Finally,  at  the  end  of  the  class  and  function  definitions  we  can  write  testing  the  code.

Note that this is subject to the __name__ == ‘__main__’ clause, which only runs the test if

the  Python  file  is  used  directly.  This  allows  for  the  SequenceTkGui  to  be  imported  by

other  Python  modules  without  the  test  code  being  run.  The  testing  is  done  by  creating

window as a SequenceTkGui class object, and then calling the Tk() main loop, which the

class inherits from, to view the graphics.

if __name__ == '__main__':

window = SequenceTkGui()

window.mainloop()


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   428   429   430   431   432   433   434   435   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish