Python Programming for Biology: Bioinformatics and Beyond


Reading and writing FASTA files



Download 7,75 Mb.
Pdf ko'rish
bet155/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   151   152   153   154   155   156   157   158   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Reading and writing FASTA files

To  read  a  FASTA-format  file  using  BioPython  we  use  the  SeqIO  module,  which  in  this

case  takes  an  open  file  object  and  extracts  each  sequence  of  the  file,  in  turn  creating  a

special  object  for  each  record.  These  objects  have  the  attributes  .id  and  .seq  that

respectively  allow  us  to  get  hold  of  Python  strings  representing  the  identifier  and  one-

letter sequence.

Quite simply we import the SeqIO BioPython module and create an open file object for

the sequence file (in FASTA format) that we wish to read.

from Bio import SeqIO

fileObj = open("examples/demoSequences.fasta", "rU")

The  SeqIO  module  has  a  parse()  function  that  takes  the  file  object  and  a  file  format

string  to  yield  sequence  record  objects.  Here  the  records  are  extracted  in  a  for  loop  and

allocated to the protein variable, which we can then interrogate. In this example we send

the sequence to the estimateIsoelectric function defined above.

for protein in SeqIO.parse(fileObj, 'fasta'):

print(protein.id)

print(protein.seq)

print(estimateIsoelectric(protein.seq))

fileObj.close()

Writing  a  FASTA  file  using  BioPython  is  slightly  trickier  because  we  have  to  first

create the right type of BioPython objects (SeqRecord), which we then pass into a function

for  writing.  Despite  the  complication  of  making  these  objects  there  is  the  added  benefit

that the sequence will be checked, e.g. that it has the right set of letters, before it is written.



We make several more imports from the BioPython library. The SeqRecord is the final

object we wish to make, and which will be written out. The Seq object is needed internally

to  make  a  SeqRecord  and  IUPAC  is  needed  to  check  the  sequence  letters  according  to

some (the IUPAC) standard.

from Bio.SeqRecord import SeqRecord

from Bio.Seq import Seq

from Bio.Alphabet import IUPAC

An open file object is created in writing mode, with the desired output file name. If we

were  prudent  we  would  check  that  we  are  not  overwriting  an  existing  file  (using

os.path.exists()).

fileObj = open("output.fasta", "w")

Next  a  Seq  class  of  object  is  made,  which  accepts  a  one-letter  sequence  in  its

construction,  and  a  sequence  validation  alphabet  specification,  which  in  this  case  is  the

IUPAC  protein  codes.  This  sequence  object  is  in  turn  used  to  make  a  SeqRecord  which

associates  the  sequence  object  with  an  identifier  (and  potentially  other  kinds  of

annotation).

seqObj = Seq(proteinSeq, IUPAC.protein)

proteinObj = SeqRecord(seqObj, id="TEST")

The proteinObj (a SeqRecord class object) is then written to file using the SeqIO.write()

function. Note that this takes a list of sequence records, as we can have many sequences in

one  file,  hence  we  put  proteinObj  in  a  list  of  one,  using  square  brackets.  The  other

arguments to this function are naturally the open file object to write to and the format type

of the file.

SeqIO.write([proteinObj,], fileObj, 'fasta')

fileObj.close()


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   151   152   153   154   155   156   157   158   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish