Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet75/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   71   72   73   74   75   76   77   78   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Reading lines of data

The  file  object  has  certain  functionalities  associated  with  it,  which  allow  the  underlying

data in the file to be read. The most commonly used functions are: read(), readline()  and

readlines(). The read()  function  is  used  if  you  want  to  read  an  entire  file  in  one  go,  into

one long string.

data = fileObj.read()

The  read()  function  has  an  optional  argument  that  specifies  the  required  number  of

bytes  (addressable  units  of  information)  to  load,  but  reading  the  entire  file  in  one  go  is

more common for this function. Of course, if the file is huge this might not be a good idea,

because  of  memory  limitations.  Accordingly,  the  readline()  function  reads  only  one  line

from the file, and is often placed in a loop to process multiple lines, without having to load

everything at once.

line = fileObj.readline()

As far as readline() and readlines() are concerned, each line of the file is defined as a

text  string  ending  in  the  newline  character,  or  a  string  that  stops  at  the  end  of  the  file

without a newline. In other words the newline character separates the lines. Note that the

newline character at the end of the line is not removed when using this function, i.e. it is

included as the last character of the returned string.

For Unix-derived computer systems (e.g. Linux, OS X) the ‘\n’ newline character is the

normal convention. However, for Windows computers the normal convention is that a line

ends with the two characters ‘\r\n’, but that also works for the above example because the



last character is still ‘\n’. In some situations a file might have lines that only end with ‘\r’

and,  given  the  way  we  have  opened  the  file  here,  this  would  not  automatically  be

recognised by ‘readline’ as the end of the lines, even though it is intended to be.

Python has provided a convenient way to deal with the ‘\r’ versus ‘\n’ end-of-line issue.

(The  newline  ‘\n’  and  carriage  return  ‘\r’  concepts  are  originally  from  the  humble

mechanical  typewriter.)  The  mode  argument  to  the  open()  function  can  include  the

character ‘U’, to specify universal line interpretation, so, for example:

fileObj = open(path, "rU")

This means that when the file is read every occurrence of ‘\r\n’ is replaced with ‘\n’ and

every  occurrence  of  just  ‘\r’  is  replaced  with  ‘\n’.  This  is  the  recommended  method  of

opening  a  text  file  when  you  are  not  sure  of  its  line  endings,  unless  of  course  the  ‘\r’

characters (singly or in combination with ‘\n’) are required and mean something specific

for the file being considered.

Every time you read part of a file, for example, using readline(), a register of which line

is next to be read, the file pointer, advances in the file. Hence, the next time you read some

more  of  the  file,  you  read  from  where  the  previous  read  ended.  When  the  file  pointer

reaches  the  end  of  the  file  then  the  next  readline()  gives  back  an  empty  string;  a

conveniently False value. Thus if you want to process one line at a time in a file you could

do the following where the loop continues as long as the line is True:

fileObj = open(path, "rU")

line = fileObj.readline()

while line:

# process line

line = fileObj.readline()

fileObj.close()

However, there is a more elegant alternative to using readline() repeatedly: from Python

2.2 onwards you don’t have to manage the lines yourself. Rather, the open file acts as an

iterable object which leads to much simpler code, i.e. so you can loop through the file as if

it was a list, yielding the lines inside the loop:

fileObj = open(path, "rU")

for line in fileObj:

pass # process line

fileObj.close()

The function readlines() reads all the lines in the file in one go, and returns a list of the

lines; a list of strings. Accordingly, an alternative way to process an entire file would be to

do:


fileObj = open(path, "rU")

lines = fileObj.readlines()

fileObj.close()

for line in lines:

pass # process line

Again,  as  with  the  read()  function,  this  is  a  reasonable  approach  if  the  file  is  not  too




large. There is also an optional argument for readlines() giving a number of bytes to read,

whereupon that amount of data will be read, including any extra bit required to complete a

final, otherwise partial line. Another option, which is slicker, but arguably less clear, is to

open and read the file in a single statement:

for line in open(path, "rU"):

pass # process line

Here the file is closed implicitly, because it was not assigned to a variable, and this is a

case  where  that  is  acceptable  coding  style.  It  is  obvious,  given  that  no  explicit  variable

name is stated, that the file is no longer used once the loop has finished.

Another alternative to manually closing a file object is to use the with … as statement,

which was introduced on Python 2.5. For example, we could write:

with open(path, "rU") as fileObj:

for line in fileObj:

pass # process line

Here  the  with  statement  assigns  the  opened  file  object  to  the  fileObj  variable  in  a

special  way.  We  won’t  go  into  the  precise  details  of  what  is  happening,  but  the  basic

principle is that a file class of object has inbuilt methods (__enter__ and __exit__) to deal

with its setup and release. In this case the result is that the file is closed at the end of the

with  code  block.  Note  the  with  and  as  keywords  are  a  general  part  of  Python,  and  not

specifically related to files.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   71   72   73   74   75   76   77   78   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish