Python Programming for Biology: Bioinformatics and Beyond


Using ClustalW from Python



Download 7,75 Mb.
Pdf ko'rish
bet187/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   183   184   185   186   187   188   189   190   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Using ClustalW from Python

We  will  use  input  to  the  ClustalW  program  in  the  simple  FASTA  format,  which  you  are

hopefully already familiar with. Of course, if the data is not already in a FASTA file we

will have to make one; indeed the following example will make the files for you starting

from  a  simple  list  of  one-letter  sequences  as  we  have  been  using  in  the  examples.

Although plain sequence strings are fine for basic analyses you may wish to consider more

detailed  sequence  representations,  like  the  Bio.Seq  module,  when  situations  are  more

complicated.

Firstly, in our example we import all of the external modules we will need upfront. It is

generally a good idea to do this because you can quickly look at the start of a file and see

all of the modules you are depending upon. Most of the modules are BioPython

7

libraries,



so it is assumed that this is installed on your system.

import os

from Bio.Seq import Seq

from Bio.SeqRecord import SeqRecord

from Bio.Alphabet import IUPAC

from Bio import SeqIO, AlignIO

Two file names are defined; one for the input to the alignment program (.fasta) and one

for the output from the program (.aln).

fastaFileName = "test2.fasta"

alignFileName = "test2.aln"

Next we loop though the sequences, seqs, defined as needed, to make a list of sequence

record objects. For each one-letter sequence we first make a BioPython Seq object using

the standard IUPAC

8

protein sequence alphabet. If there were an invalid protein sequence



we  would  get  an  error.  The  basic  sequence  object  is  then  combined  with  a  name  and  a

description  to  make  a  fully  fledged  sequence  record  object.  The  SeqRecord  object  is

required for writing a FASTA file.

records = []

for i, seq in enumerate(seqs):

seqObj = Seq(seq, IUPAC.protein)

name = 'test%d' % i

recordObj = SeqRecord(seqObj, id=name, description='demo only')

records.append(recordObj)

We  use  the  open()  function  to  create  a  file  handle  object  into  which  the  sequence

records  are  written.  The  SeqIO  module  of  BioPython  has  a  .write()  function  that  can

output various formats, so naturally we use the desired ‘fasta’ option. With the file written

the file handle is closed; this is not essential here but a good habit to get into.

outFileObj = open(fastaFileName, "w")

SeqIO.write(records, outFileObj, "fasta")

outFileObj.close()

With the input file made, the next job is to run the external alignment program; in this

case  ClustalW.  This  requires  we  have  installed  the  alignment  program  and  that  it  can  be




run as the command ‘clustalw’ from the operating system. We will invoke the command

using the subprocess.call() function, as we illustrated for BLAST in

Chapter 12

.  Also,  in

this instance we specify -INFILE and -OUTFILE  options,  to  define  the  input  and  output

file  names  respectively,  formatted  in  the  form  –OPTION=VALUE  (unlike  with  BLAST

where  the  option  name  and  associated  value  are  separated  with  a  space).  If  desired,  we

could read the ClustalW manual and include other options to control other aspects like gap

penalties and the substitution matrix.

from subprocess import call

cmdArgs = ['clustalw',

'-INFILE=' + fastaFileName,

'-OUTFILE=' + alignFileName]

call(cmdArgs)

A  perhaps  better,  but  more  complicated,  alternative  to  the  call()  is  to  use  the

subprocess.Popen class, which amongst other things provides a means of executing jobs in

parallel; very handy to do if you have lots of alignments and a multi-core computer.

After  the  system  execution  we  then  open  and  read  the  output  alignments  file.  We  use

the  AlignIO  module  from  BioPython  to  do  the  interpretation  of  the  file,  specifying  the

read format as ‘clustal’. Note that we use the .read() function because we only have one

alignment in the file, if there were several we would use .parse() instead, to get back a list

of alignments. The reading function makes an alignment object from which we can print

out attributes, like the length, and loop through to get at the individual records. Note that

the record variables are BioPython objects and thus come with .seq and .id attributes from

which you can easily get at the one-letter sequence and name.

fileObj = open(alignFileName)

alignment = AlignIO.read(fileObj, "clustal")

print("Alignment length %i" % alignment.get_alignment_length())

for record in alignment:

print(record.seq, record.id)

Finally,  we  will  illustrate  how  you  can  write  the  alignment  object  out  into  a  different

format  (other  than  the  Clustal-format  file  we  already  have).  This  is  a  simple  matter  of

making a list of alignments and an output file handle object, with a new name, which we

pass on to the AlignIO.write() function. In this example we are using the ‘phylip’ format

corresponding  to  the  PYHLIP  suite

9

 of  programs  that  makes  and  analyses  phylogenetic



trees.

alignments = [alignment,]

outputHandle = open("test2.phylip", "w")

AlignIO.write(alignments, outputHandle, "phylip")




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   183   184   185   186   187   188   189   190   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish