Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	77/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 73 74 75 76 77 78 79 80 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

File reading examples

Reading whitespace-separated files

For our first practical example we will begin with reading a simple yet commonly used

kind of file, one where each line has several fields that are separated with whitespace. By

‘whitespace’ we mean tab stops (‘\t’) or one or more spaces. An example of such a file

would be the following, where we first have a descriptive header line and then subsequent

lines with three text fields; the first is the name of a chromosome, the second is a base-pair

position in the chromosome and the last is a value representing an experimentally

determined value for that position:

chromosome position value

chr1 3417953 0.74634

chrX 152662801 0.50036

chr7 55281536 0.82376

chr4 9168943 0.73375

chr1 13170641 0.42181

For the purposes of our example we will assume that the above lines are in a file called

‘chromoData.tsv’ which lies in the ‘examples’ sub-directory of the current working

directory, where ‘.tsv’ gives a hint that the format is tab-separated values. In order to

process this file we will first read the separate header line with .readline(), given that it

doesn’t contain data we are interested in. Then we will loop through the remainder of the

lines, by iterating over the file object, and for each line we will use the string function

split() to separate the line into a list of substrings. Without any arguments split() will

separate the fields according to whitespace, which is what we want. For a different file

format we could specify a different separator, so, for example, for comma-separated fields

we would use split(‘,’) or for tab-separated fields split(‘\t’), both of which can

accommodate data items with internal spaces.

fileObj = open('examples/chromoData.tsv')

values = []

header = fileObj.readline() # Don't need this first line

for line in fileObj:

data = line.split()

chromosome, position, value = data

position = int(position)

value = float(value)

values.append(value)

mean = sum(values)/len(values)

print('Mean value', mean)

For each line we obtain a list with three items and these are extracted into separate

chromosome, position and value variables. Initially these will be text strings, given that

they were just read from the file, but in the case of the position and value we generally

want to convert them from strings into integer and floating point number data types

respectively (though in this simple example we have not used the position). Accordingly

we use the int() and float() functions to do the conversion. Once a variable is a numeric

data type we can then perform mathematical operations, like finding the mean value as

illustrated.

We will consider field-delimited formats again in the readListFile() function below,

where we handle things in a more general way, allowing different data type conversion

functions and field separators to be specified as function arguments.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 73 74 75 76 77 78 79 80 ... 514