Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet73/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   69   70   71   72   73   74   75   76   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

File formats

When  data  is  stored  in  a  plain  text  file,  although  interpreting  the  individual  component

characters is trivial, if we are to understand what the contents of the file actually mean we

have to understand the way in which the data inside the file is structured. This is just like

written language where knowing the alphabet is not enough, we also need to understand

concepts  like  words,  sentences  and  punctuation.  Ultimately  it  is  the  decision  of  the

computer programmer as to how the data in saved files is structured. However, where it is

important  that  files  should  be  understood  by  a  variety  of  programs  the  data  will  be

represented in a standardised, and hopefully documented, way. The data standard for a file

is often referred to as a file format. Virtually all plain text formats consider the stored data

according to lines; they are subdivided by special end-of-line control characters

1

at the end



of each line.

A very common file structure is to have one record of data per line, often with a single

header  line  at  the  top  of  the  file,  to  describe  the  contents  of  the  lines.  One  possibility  is

that  the  file  represents  a  table  with  each  line  describing  one  row.  The  fields  (or  cells)  in

each row, one for each column of the table, may be demarked in various ways: this could

be according to the position within the row, i.e. the position of a character relative to the

start, or specified by special separating characters, like commas or whitespace (tabs, blank

spaces  etc.).  An  alternative  to  a  fixed  order  of  fields  is  that  the  lines  consist  of  pairs  of

named  keys  (identifiers)  and  corresponding  values.  Here  the  keys  will  generally  come

from a fixed, allowed set of keys and in some instances the data values that are addressed

by a single key may span many lines.



Another  common  file  structure  is  to  have  tags  that  identify  and  specify  the  beginning

and  end  of  a  record.  Often  these  tags  can  be  nested,  one  inside  the  other,  thus  denoting

containment.  For  example,  the  XML  (eXtensible  Markup  Language)  data  standard  uses

tags where the record starts with text like ‘’ and ends with the text ‘’,

where here ‘NAME’ is the identifier for the element.

Sometimes a programming language will have its own, inbuilt formats for representing

data  structures  created  in  that  language.  This  process  is  referred  to  as  serialisation.  In

general  the  serialisation  format  will  be  specific  to  the  language  in  question,  and  may

require  special  modules  to  be  installed.  However,  if  data  is  only  going  to  be  used  in  a

single language environment, using serialisation can offer an efficient means to store the

active  data,  in  terms  of  both  speed  and  programming  ease.  Python  has  a  serialisation

method which is referred to as pickling. Such ‘pickle’ files are usually textual, but they are

not so easy for a person to interpret. For example, the Python list data structure:

x = [1, 2, 'a', 'b', True, None]

is saved by the pickling method as:

(lp1


I1

aI2


aS'a'

aS'b'


aI01

aNa.


Given most file formats, it is normally easier to write files than it is to read them; it is

easier  to  extract  data  from  a  controlled  and  standardised  in-memory  representation  (e.g.

Python  structures)  than  it  is  to  interpret  someone  else’s  text,  which  may  not  be

standardised or even fully understood. So when writing a program to read a file you need

to  parse  the  file  (determine  its  syntactic  structure)  and  also  confirm  that  the  content  is

valid, however that may be defined. When writing a file you just have to make sure that

you  are  following  the  rules  for  the  file  layout.  A  common  programming  paradigm  is  to

first  read  one  or  more  files,  do  some  processing  and  then  write  out  one  or  more  files.

Although  the  processing  is  normally  the  major  objective  of  the  program,  it  is  not

uncommon to have situations where most effort needs to be spent creating the code to do

the reading and writing of the files, especially for simple programs. If there is already an

existing  piece  of  tested  code,  such  as  the  importable  BioPython  module,  which  can  be

used to read and write files, then this is often used in preference to spending time writing

something new. See

Chapter 11

for examples that use BioPython to read and write files.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   69   70   71   72   73   74   75   76   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish