Figure 27.1. A schematic of how a program may be executed in parallel by
spawning sub-processes. Parallelism in programming is often achieved by invoking a
function call from an originating ‘parent’ process, which then waits for the outcomes from
the ‘child’ sub-processes that it started. In general the same program code will be executed
by the child processes (here labelled ‘f’), although they will commonly operate on
different input arguments. For the examples illustrated in this chapter the sub-processes
will be run on the same computer, although it is also possible to interact with other
computers via a network. The ability to run several jobs in parallel on one computer will
depend on the number of processing cores that a computer has, although on most modern
operating systems it is possible to initiate more jobs than there are cores. In this case the
different processing tasks (the threads) will be shared at a fine scale by the processor(s),
and will not all run at their maximum speed.
Firstly, we define a function that will do the actual work, and this will be called
separately for the different jobs. This is a regular Python function, and for the example we
perform an arbitrary mathematical calculation that takes a few seconds to run and prints
out the job status, if only to illustrate the principles. The function calcFunc takes two
numbers as input arguments, but we could send any input data to the function as long as
that data is pickleable, i.e. can be converted from an in-memory to a serial representation;
this includes all the regular Python data structures.
def calcFunc(n, m):
print("Running %d %d" % (n,m))
result = sum([x*x for x in range(n) if x % m == 0])
print("Result %d %d : %d " % (n, m, result))
return result
To set up the parallel jobs we first use the Process class to create objects that represent
each of the parallel jobs. The target argument is the name of the function that will be
called upon to do the work and args is a tuple of the values that will be input to the
function when it is actually run. We create two job specifications, running the same
operation, but with different inputs thus:
job1 = Process(target=calcFunc, args=(8745678, 2))
job2 = Process(target=calcFunc, args=(2359141, 3))
Each of the jobs can be started as required, so this is where the main flow of the Python
script separates from the sub-jobs:
job1.start()
job2.start()
Then it is a simple matter of having the main, parent process wait until the sub-jobs are
complete before continuing any further operations. The waiting is easily done using the
.join() method, which optionally takes a time in seconds to wait before proceeding
regardless (this is called a timeout).
job1.join()
job2.join()
Do'stlaringiz bilan baham: |