Figure 21.6. An example output of the Binomial distribution. A graph of the output
generated from the binomialProbability() function, tested for 100 trials with event
probability 0.01 and illustrating the probability for discrete numbers of events in the range
from 0 to 6. Note that the line is only to guide the eye; because the distribution is discrete,
it is only defined for whole numbers.
It should be noted that the combinatorial factor gets very large as the number of trials n
gets large, unless k is near 0 or near n, so the binomial probability can be computationally
tricky to calculate. For example, returning to the restriction enzyme HindIII, which may
cut a random DNA sequence with a probability of 0.0025, for a total DNA length of 10
megabases we may try to estimate the probability of having 2500 cuts (the mean value) as
follows:
p = 0.00025
n = 10000000
print( binomialProbability(n, 2500, p) ) # Fails!
Although this was a reasonable question to ask the code fails because of the large
number of combinations involved. All is not lost, however, because the scipy.stats module
provides an implementation that is more robust (and quicker). To use this we define a
Python object that represents a binomial random variable with given parameters, which
here are the number of trials and event probability. This object has various methods
(functions bound to it), and one of these is .pmf(), which represents the probability mass
function, i.e. a function that calculates the probability for a given value of the random
variable, just like binomialProbability().
from scipy.stats import binom
p = 0.0025
n = 10000000
binomRandomVar = binom(n, p)
print( binomRandomVar.pmf(2500) )# Succeeds: 0.007979577
The random variable object also has other handy functions, as described in the SciPy
documentation:
print( binomRandomVar.mean() ) # Most likely value : 2500.0
print( binomRandomVar.std() ) # Standard deviation : 49.99
Also, as you might expect, the SciPy functions not only operate on single numbers, but
also work with NumPy arrays. Hence, we can create a whole array of values for the counts
k and calculate the array of probabilities for these with .pmf():
from scipy.stats import binom
from numpy import array
binomRandomVar = binom(10000000, 0.00025)
counts = array(range(2300, 2700))
probs = binomRandomVar.pmf(counts)
pyplot.plot(counts, probs)
pyplot.show()
We can also calculate the cumulative probabilities, the sum of the probabilities from
zero to each value of k (which is very handy for statistical testing, as discussed in
Chapter
22
), using .cdf():
cumulative = binomRandomVar.cdf(counts)
pyplot.plot(counts, cumulative)
pyplot.show()
Do'stlaringiz bilan baham: |