Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	128/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 124 125 126 127 128 129 130 131 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Improving speed

When it comes to making Python programs faster there are a few general points to

consider before going through specifics. We can think of program execution time in terms

of the number of operations performed and how long individual operations take. Hence, to

improve speed we try to reduce the number of operations and/or perform faster individual

operations. Also, an important aspect of this is how the number of operations scales with

the size of the problem. For example, the time taken could stay roughly constant, increase

linearly or increase in an exponential manner (to name only a few possibilities) as the

number of data items increases. Naturally the scaling depends on the algorithm in

question. Hence, if things seem too slow it is worth questioning whether a particular

algorithmic approach is a smart one, especially if the number of data items is large and the

speed scales poorly with the size of this data. As a very simple example, albeit somewhat

contrived, to add all the integer numbers between a and b you could do:

s = 0

for x in range(a, b+1):

s += x

s = sum(range(a, b+1))

s = (b+1-a) * (a+b) / 2.0

The first two examples involve an operation for each item in a list (albeit explicitly or

from sum()), so their execution time is roughly proportional to the length of the list. The

last formula involves a small, fixed number of operations, which is much quicker for large

ranges, although you could argue that the code is not as easy for a novice to understand.

Whether analyses of this kind are worthwhile will depend on the context of the operation

within the program as a whole. For example, if you were doing this kind of summation

many times, in a long loop, the overall gains from optimisation would be significant, but if

it is only done occasionally or if you’re only adding a few values the difference is

probably negligible. On the whole we advise opting for clarity if an operation is not a

performance bottleneck. Sometimes it is easy to spot where the bottlenecks are in your

code and you can simply time how long something takes, as illustrated below. However, if

it is not easy, there are a number of profiler programs, such as the inbuilt Python module

cProfile,

that can be used to inspect how long the various operations take and thus show

you where to focus attention.

Even when keeping the same basic Python algorithm, some surprisingly simple

programmatic changes or rearrangements can yield speed differences. This chapter will

subsequently go through many more specific examples of these, but common things to

look out for are: avoiding unnecessary function calls (which are relatively slow in

Python), taking care to remove unnecessary operations from long loops and making good

use of Python collections. For the examples below we are using usingTimer(), as

described in

Chapter 5

, as a decorator to wrap the test code in a function that times how

long the operation takes to run. Firstly, we have an example which uses the math module

to take the square root for a range of numbers, the result of which is added to a total:

import math

@usingTimer

def testFunc(x):

y = 0

for i in range(x):

y += math.sqrt(i)

return y

Testing using a large number the result takes just over 2 seconds:

testFunc(10000000)

>>> Function call took 2.046932 seconds

>>> 21081849486.439312

Now a small modification is done to obtain sqrt at the start,

rather than accessing it

repeatedly from the math module inside the loop, using dot notation:

from math import sqrt

@usingTimer

def testFunc2(x):

y = 0

for i in range(x):

y += sqrt(i)

return y

Testing reveals that this new version is about a third of a second quicker:

testFunc2(10000000)

>>> Function call took 1.672865 seconds

>>> 21081849486.439312

Rewriting the function to use a list comprehension, thus avoiding explicit Python loops,

is quicker still. Note that in this version we have also avoided constructing large lists. In

Python 2 we would use the iterable xrange() instead of the list-creating range(), but in

Python 3 the range() function has the same behaviour as xrange() in Python 2. And by

using round parentheses for the list comprehension we make a list generator object, rather

than an actual large, filled list.

@usingTimer

def testFunc3(x):

yList = (sqrt(i) for i in range(x)) # in Python use xrange(x)

return sum(yList)

testFunc3(10000000)

>>> Function call took 1.360537 seconds

>>> 21081849486.439312

Another common tactic to make Python run faster is to investigate whether there are

any fast, pre-constructed libraries that you can import (as Python modules) to do the job.

An obvious example would be to use NumPy for mathematical work involving arrays.

Here the imported code is written in a fast compiled language underneath and avoids

having to go through Python loops. An example of the gain from using numpy.array

objects is illustrated below, and the improvement is impressive, over nine times faster than

the original:

from numpy import arange, sqrt

@usingTimer

def testFunc4(x):

yArray = sqrt(arange(x))

return yArray.sum()

testFunc4(10000000)

>>> Function call took 0.221356 seconds

>>> 21081849486.439312

If there is a requirement to make a Python program even faster, above and beyond what

we can achieve with changes to coding style, then we might consider parallelisation (using

multiple processors) or write a module in a fast compiled language like C. We cover these