Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	131/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 127 128 129 130 131 132 133 134 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Collections

To see if a collection (list, set, tuple, dictionary) is empty just test whether it is logically

true. So rather than

if len(myList) == 0:

doSomething()

instead do:

if myList:

doSomething()

To copy a list you can use the list() keyword or use the [:] slice notation:

duplicateList = list(firstList)

duplicateList = firstList[:]

A slice notation can also be used to get a reversed copy of a list (remembering that the

last element of the slice notation is the step):

revList = firstList[::-1]

This is more compact than copying and then using reverse():

revList = list(firstList)

revList.reverse()

or using the reversed()iterator, which is handy when going through loops in reverse order

(for example, for x in reversed (a List):), but needs an explicit conversion to make a

duplicate list:

revList = list(reversed(firstList))

For dictionaries don’t forget the .get() and .setdefault() methods. So:

if x in myDict:

y = myDict[x]

else:

y = defaultValue

becomes:

y = myDict.get(x, defaultValue)

or if the default value is None simply:

y = myDict.get(x)

If the default value should be actually put in the dictionary then you can do:

y = myDict.setdefault(x, defaultValue)

In Python 2 if you want to simply enquire whether something is present in a dictionary

it is simpler, and slightly faster, to use in rather than call has_key().

if myDict.has_key(key):

doSomething()

becomes:

if key in myDict:

doSomething()

In Python 3 dictionaries no longer have the has_key() method.

It may sometimes be helpful to construct a dictionary from a list. Rather than going

through a loop, a list of 2-tuples with (key,value) pairs can be used:

listData = [(1,'Apples'), (2, 'Bananas'), (3, 'Cherries')]

dictData = dict(listData)

print(dictData[2])

In Python 2 to do the reverse you can use .items() to get a list of all pairs, or .iteritems()

to get an iterator object, which can be looped though like a list but which yields one item

at a time, and so saves memory by not making the complete list:

for k, v in dictData.items(): # Makes a list

print('Key: %d, Value: %s' % (k,v))

for k, v in dictData.iteritems(): # Uses an efficient iterator

print('Key: %d, Value: %s' % (k,v))

In Python 3 there is no .iteritems() method and .items() returns an iterable view on the

items in the dictionary, rather than a list.

The zip keyword can be used to combine corresponding elements from multiple lists,

which is handy for dictionaries when you initially have separate lists for the keys and

values:

keys = [1, 2, 3]

values = ['Apples', 'Bananas', 'Cherries']

listData = zip(keys, values) # [(1,'Apples'), (2,'Bananas'),

(3,'Cherries')]

dictData = dict(listData)

The next tip was mentioned before, but we repeat it in the compendium, and it can

reverse the above operation (although for dictionaries, .keys() and .values() also do the

job). If you already have data in a list of lists (or tuples) then zip can neatly extract the

elements which share the same index.

listData = [(1,'Apples'), (2,'Bananas'), (3,'Cherries')]

numbers, fruits = zip(*listData)

The way to imagine this one is that the call is actually zip((1,‘Apples’), (2, ‘Bananas’),

(3, ‘Cherries’)), with the * extracting the items in the list as separate arguments. The zip

then combines the first elements and the second elements together, exactly as above. This

is neater than using the equivalent list comprehension:

listData = [(1,'Apples'), (2,'Bananas'), (3,'Cherries')]

numbers = [x[0] for x in listData]

fruits = [x[1] for x in listData]

The zip can also come in handy as a compact notation for looping through two lists,

although in Python 2 it does make a new list, so is not so space efficient. Accordingly,

something like:

for i, aValue in enumerate(aList):

bValue = bList[i]

print(aValue, bValue)

could become:

for aValue, bValue in zip(aList, bList):

print(aValue, bValue)

In Python the set data type is sometimes overlooked, especially by those who started

with early versions of Python. Nonetheless, it is exceedingly useful and can avoid the need

to do looping with lists, as long as order is not important (or can be reconstructed). There

is a caveat to such set operations, however: the elements must be hashable, which means

they cannot be internally modifiable, a requirement to keep things unique. In essence, sets

can contain most objects, numbers, strings, tuples and frozen sets but cannot contain other

sets, lists or dictionaries.

Looking up elements in a set is fast, so where you have lots of look-ups to do, instead

of:

for x in firstList:

if x in veryLongList:

doSomething()

you can make things quicker with:

bigSet = set(veryLongList)

for x in firstList:

if x in bigSet:

doSomething()

Note this assumes that the speed gained using bigSet for look-up makes up for the time

spent creating the set in the first place.

Sets provide a neat way of removing duplicates from a list, as long as you don’t want to

preserve order, you just convert to a set and back to a list again:

myList = ['apple', 'banana', 'lemon', 'apple', 'lemon', 'lemon']

uniqueList = list( set(myList) ) # ['lemon', 'apple', 'banana']

To get the common elements of several lists using set operations is neat and efficient,

although it may be prudent to simply work with sets in the first place:

a = ['G','S','T','P','A']

b = ['A','V','I','L','P']

intersection = set(a) & set(b)

commonList = list(intersection) # ['A', 'P']

Likewise to find elements that are present in either list:

a = ['G','S','T','P','A']

b = ['A','V','I','L','P']

union = set(a) | set(b)

combinedList = list(union) # ['A', 'G', 'I', 'L', 'P', 'S', 'T', 'V']

When constructing lists it can be quicker and more compact to use list comprehensions

than loops. For example:

squares = []

for x in range(1001): # in Python 2 use xrange(1001)

squares.append( x * x )

is slower than:

squares = [x*x for x in range(1001)]

Also, if we don’t need the whole loop, but just need to iterate though it, we can use

round parentheses to make a generator object (which has no length as such and does not

have indices).

squares = (x*x for x in range(1001)) # Using () not []

for y in squares:

doSomething()

squares[3] # Fail: This will not work on () generators.

It is sometimes overlooked that list comprehensions can be concatenated, although it is

easy to take this sort of thing too far:

[(x,y) for x in range(3) for y in range(3)]

# Gives [(0, 0), (0, 1), (0, 2),

# (1, 0), (1, 1), (1, 2),

# (2, 0), (2, 1), (2, 2)]

[(x,y) for x in range(3) for y in range(x,3) if x+y >1]

# Gives [(0,2), (1,1), (1,2), (2,2)]

Sometimes you may wish to construct a list of blank lists, to put items into later. For

this it is tempting to do:

data = [[]] * 3

print(data) # Gives [[], [], []]

but here the same list object was repeated three times internally:

data[1].append(True)

print(data) # Gives [[True], [True], [True]]

so try a list comprehension instead:

data = [[] for x in range(3)]

data[1].append(True)

print(data) # Gives [[], [True], []]

Although perhaps not such common operations, the any and all keywords can be used

to find whether any or all elements in a list hold a certain condition. Accordingly:

for x in myList:

if x < 2:

doSomething()

break

becomes:

if any(x < 2 for x in myList):

doSomething()

Likewise:

if len(myList) == len([x<2 for x in myList]):

doSomething()

is the same as:

if all(x<2 for x in myList]):

doSomething()

For obtaining a sorted list, the inbuilt sorted function is useful when you don’t want to

modify the original list. So instead of :

b = list(a)

b.sort()

you can do:

b = sorted(a)

If you want to sort a list on something other than the items’ innate value you can

construct a list of 2-tuples which will be sorted on the first item (which contains the values

to sort on). Here we sort according to the length of the strings:

aList = ['homer', 'bart', 'maggie', 'lisa', 'marge']

bList = [(len(x), x) for x in aList]

bList.sort()

aList = [x for (lenX, x) in bList]

# Gives ['bart', 'lisa', 'homer', 'marge', 'maggie']

However. the key option of sort() is much more nifty and allows you to pass in the

function that is used to generate the sort key:

aList = ['homer', 'bart', 'maggie', 'lisa', 'marge']

aList.sort(key=len)

Sometimes when dealing with objects we would like to sort on the value of a particular

attribute. You can readily write a function to fetch that attribute (for any object in the list,

as required by the sort operation), and thus generate a key for the sort. So, for example:

def getSortAttr(obj):

return obj.something

objList = [objA, objB, objC]

objList.sort(key=getSortAttr)

However, you can also use the key option in combination with the operator module.

The function operator.attrgetter() uses the name of an attribute to create a separate on-the-

fly function

which sends back the value of an attribute, which in this case is the value to

sort with. So an alternative to the above is:

from operator import attrgetter

objList = [objA, objB, objC]

objList.sort(key=attrgetter('something')) # Name of attribute as a string

The functions operator.itemgetter (for selecting items in a collection) and

operator.methodcaller (for invoking class functions) can also be used in a similar manner.

Loops

We’ve been using enumerate() throughout the book, but it is still something novices

occasionally overlook. So instead of:

myList = ['e', 'f', 'g']

for i in range(len(myList)):

print(i, myList[i])

do:

myList = ['e', 'f', 'g']

for i, val in enumerate(myList):

print(i, val)

And from Python 2.6 you can use a second argument to specify the start point for the

index:

for i, val in enumerate(myList, 5):

print(i, val)

# Gives:

# 5 e

# 6 f

# 7 g

In Python 2 when looping though sequential numbers, such as indices, consider using

xrange() rather than range(). This saves space because it only yields numbers on demand.

Helpfully an xrange still has a length and can be indexed. In Python 3 xrange is effectively

renamed range and replaces the old list constructor.

for x in xrange(100, 1000000): # Doesn't make all the numbers (Python 2)

doSomething()

To make an indefinite

loop, use a while loop that tests something that is logically true,

although don’t forget to break out of the loop eventually:

while 1:

test = doSomething()

if test:

break

Because loops are constructs that allow you to repeat operations many times, when

thinking about speed a general principle is to put as few operations into the loop as

possible. For example, when doing function calls in a loop to construct a list using

.append(), a speed improvement can be made if the dot notation call is done only once

outside the loop. For example:

aList = []

for x in someBigList:

if testFunc(x):

aList.append(x)

becomes the faster:

aList = []

addToList = aList.append

for x in someBigList:

if testFunc(x):

addToList(x)

Related to the above, if you know how long a list will be it is faster to pre-construct it in

a quick manner and curate it using indices, rather than appending repeatedly.

aList = [0] * n

bList = [0] * n

for i in range(n):

aList[i] = someCall(i)

bList[i] = anotherCall(i)

If you need two loops and have to break out of both of them, cunning use of else,

continue and break can do the job without having to set any flags:

for a in oneList:

for b in anotherList:

if discoverSomething(a,b):

# Quit inner loop and subsequently the outer too

break

else:

# Without a break we get here at the end of the inner loop

# Continuing the outer loop the next break is skipped

continue

# Only get here due to the first break

break

If you have a loop that may cause an error (throw an exception) then it may be tempting

to do a precautionary check to stop errors before they occur. However, it is generally

quicker to let the exception happen and then catch it in a safe way. This is because with

try: there is no repeated checking and extra time is taken only if an error is encountered.

So, for example:

for x in bigList:

if rareEvent(x):

rareEventOccurred(x)

else:

commonTask(x)

can be modified into:

for x in bigList:

try:

commonTask(x)

except SpecialException:

rareEventOccurred(x)

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 127 128 129 130 131 132 133 134 ... 514