Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	437/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 433 434 435 436 437 438 439 440 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Using data queues

Note that with the above example we fire off jobs but don’t actually take any measures to

collect the results back into the main program. We could, however, collect results using a

Queue object, which can be passed as an argument, to be filled by the sub-jobs. To

illustrate we define a modified calculation function that takes an extra argument queue to

collect results using queue.put. Note that in this we record (n, m, result) so that we know

which results go with which inputs, given that jobs will not necessarily finish in any

particular order.

def calcFuncWithQ(queue, n, m):

result = sum([x*x for x in range(n) if x % m == 0])

queue.put( (n, m, result) )

We can run things in a similar manner to before, but this time we create the Queue

object upfront and can use its .get() method at the end to fetch the results:

queue = Queue()

job1 = Process(target=calcFuncWithQ, args=(queue, 8745676, 2) )

job2 = Process(target=calcFuncWithQ, args=(queue, 2359461, 3) )

job1.start()

job2.start()

job1.join()

job2.join()

print("Result", queue.get())

queue.close()

Perhaps a simpler way of managing parallel jobs, without having to worry about

queues, is to use a Pool object to manage the processing. As mentioned above, this also

has the advantage of allocating an arbitrary number of jobs to a fixed number of

processors/cores.

As an example, we consider a list of input values and pool is created to organise the

allocation of work. By default a Pool will use the total number of central processor cores

that are available on the computer as the maximum number of jobs to run at one time, but

the number of parallel jobs may be passed in instead (the default comes from

multiprocessing.cpu_count()).

inputList = [37645, 8374634, 3487584, 191981, 754967, 12345]

pool = Pool()

With the worker pool created, we next set up the individual jobs via

pool.apply_async(), which as the name suggests will start the calculations in an

asynchronous manner. As with the earlier examples the basic point is to associate a worker

function with a set of arguments for that sub-job. The job objects themselves are collected

in a list so that we can get the return result of the calcFunc call, which in this case is

collected for us without having to take any special measures:

jobs = []

for value in inputList:

inputArgs = (value, 2)

job = pool.apply_async(calcFunc, inputArgs)

jobs.append(job)

We can then collect the results directly from the job objects using .get():

results = []

for job in jobs:

result = job.get()

results.append(result)

And finally we print the result and do some clean up, to close the worker pool and to

make sure that the main program does not proceed until the sub-processes have fully

terminated.

pool.close()

pool.join()

print(results)

If speed improvement is still really critical after optimising (cf.

Chapter 10

) and perhaps

parallelising Python, then it might be worth not using Python at all, or at least not entirely.

Python code can only be sped up to a point, after which you can try one of the compiled

languages like C, C++ or Fortran for the slow bits, which may then be interfaced with the

main Python program. With regard to parallel execution, the libraries like OpenMP

will

allow C and C++ modules to control efficient, fine-grained parallel processing on a

computer system with a single, shared memory.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 433 434 435 436 437 438 439 440 ... 514