Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	75/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 71 72 73 74 75 76 77 78 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Reading lines of data

The file object has certain functionalities associated with it, which allow the underlying

data in the file to be read. The most commonly used functions are: read(), readline() and

readlines(). The read() function is used if you want to read an entire file in one go, into

one long string.

data = fileObj.read()

The read() function has an optional argument that specifies the required number of

bytes (addressable units of information) to load, but reading the entire file in one go is

more common for this function. Of course, if the file is huge this might not be a good idea,

because of memory limitations. Accordingly, the readline() function reads only one line

from the file, and is often placed in a loop to process multiple lines, without having to load

everything at once.

line = fileObj.readline()

As far as readline() and readlines() are concerned, each line of the file is defined as a

text string ending in the newline character, or a string that stops at the end of the file

without a newline. In other words the newline character separates the lines. Note that the

newline character at the end of the line is not removed when using this function, i.e. it is

included as the last character of the returned string.

For Unix-derived computer systems (e.g. Linux, OS X) the ‘\n’ newline character is the

normal convention. However, for Windows computers the normal convention is that a line

ends with the two characters ‘\r\n’, but that also works for the above example because the

last character is still ‘\n’. In some situations a file might have lines that only end with ‘\r’

and, given the way we have opened the file here, this would not automatically be

recognised by ‘readline’ as the end of the lines, even though it is intended to be.

Python has provided a convenient way to deal with the ‘\r’ versus ‘\n’ end-of-line issue.

(The newline ‘\n’ and carriage return ‘\r’ concepts are originally from the humble

mechanical typewriter.) The mode argument to the open() function can include the

character ‘U’, to specify universal line interpretation, so, for example:

fileObj = open(path, "rU")

This means that when the file is read every occurrence of ‘\r\n’ is replaced with ‘\n’ and

every occurrence of just ‘\r’ is replaced with ‘\n’. This is the recommended method of

opening a text file when you are not sure of its line endings, unless of course the ‘\r’

characters (singly or in combination with ‘\n’) are required and mean something specific

for the file being considered.

Every time you read part of a file, for example, using readline(), a register of which line

is next to be read, the file pointer, advances in the file. Hence, the next time you read some

more of the file, you read from where the previous read ended. When the file pointer

reaches the end of the file then the next readline() gives back an empty string; a

conveniently False value. Thus if you want to process one line at a time in a file you could

do the following where the loop continues as long as the line is True:

fileObj = open(path, "rU")

line = fileObj.readline()

while line:

# process line

line = fileObj.readline()

fileObj.close()

However, there is a more elegant alternative to using readline() repeatedly: from Python

2.2 onwards you don’t have to manage the lines yourself. Rather, the open file acts as an

iterable object which leads to much simpler code, i.e. so you can loop through the file as if

it was a list, yielding the lines inside the loop:

fileObj = open(path, "rU")

for line in fileObj:

pass # process line

fileObj.close()

The function readlines() reads all the lines in the file in one go, and returns a list of the

lines; a list of strings. Accordingly, an alternative way to process an entire file would be to

do:

fileObj = open(path, "rU")

lines = fileObj.readlines()

fileObj.close()

for line in lines:

pass # process line

Again, as with the read() function, this is a reasonable approach if the file is not too

large. There is also an optional argument for readlines() giving a number of bytes to read,

whereupon that amount of data will be read, including any extra bit required to complete a

final, otherwise partial line. Another option, which is slicker, but arguably less clear, is to

open and read the file in a single statement:

for line in open(path, "rU"):

pass # process line

Here the file is closed implicitly, because it was not assigned to a variable, and this is a

case where that is acceptable coding style. It is obvious, given that no explicit variable

name is stated, that the file is no longer used once the loop has finished.

Another alternative to manually closing a file object is to use the with … as statement,

which was introduced on Python 2.5. For example, we could write:

with open(path, "rU") as fileObj:

for line in fileObj:

pass # process line

Here the with statement assigns the opened file object to the fileObj variable in a

special way. We won’t go into the precise details of what is happening, but the basic

principle is that a file class of object has inbuilt methods (__enter__ and __exit__) to deal

with its setup and release. In this case the result is that the file is closed at the end of the

with code block. Note the with and as keywords are a general part of Python, and not

specifically related to files.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 71 72 73 74 75 76 77 78 ... 514