Binomial distribution
Given an event with a fixed probability of occurrence, the binomial distribution is the
probability distribution of the number of events that occur after a specified number of
independent trials. A simple example of this would be the event of rolling a six on a die,
i.e. with probability
1
/
6
, where after a specified total number of rolls we can count the
number of times that a six came up. Repeating the same experiment (with the same total
number of rolls) will result in a distribution of different counts for rolling a six. The
probability of getting a given count of sixes is described by the binomial distribution. For
a given event probability and given number of trials, the probability of a count can be
calculated using the formula presented below. This is based on the notion that the
probability of a count depends on the number of arrangements in which the count can be
obtained. To take the example of rolling a die three times, where there are 216 (6×6×6)
possible outcomes, there is only one way of getting a count of three sixes, but there are 15
ways of getting two sixes (a non-six can occur in three positions, and there are five
possibilities for each), 75 ways of getting one six (a six can occur at three positions and
there are five times five possibilities for the non-sixes) and 125 ways of getting no six
(five possibilities for each position).
The probability Pr(k) of observing k events from n independent trials given event
probability p is:
6
This is often written using
, which is notation for the combinatorial factor, giving the
number of ways of choosing k items from a total of n:
If we seek the probability of getting two sixes from three rolls we multiply the probability
of getting two sixes, p
k
= 1/6
2
, by the probability of getting a non-six in the other rolls, (1
− p)
n − k
= (5/6)
3−2
, by the number of ways of choosing two successes from three rolls,
, and the result is indeed 15/216.
We can define a function to calculate this in Python, using the handy comb, which we
can import from SciPy to calculate the combinatorial factor:
from scipy.misc import comb
def binomialProbability(n, k, p):
return comb(n, k) * p**k * (1-p) ** (n-k)
To test this we can again calculate the probability of getting two sixes from three rolls
of a die:
p = 1/6.0 # Probability of event
n = 3 # Number of trials
k = 2 # Number of events sought
print( binomialProbability(n, k, p) )
# Result is 0.069444444 = 15/216
As a biological example we could investigate the distribution in the number of
sequencing errors (i.e. calling the wrong nucleotide) we expect when determining a DNA
sequence of a given length. If the sequencing machine has a random error rate of 0.01 and
reads the sequence for a total of 100 nucleotides, then the distribution of the number of
errors can be plotted as follows:
from matplotlib import pyplot
p = 0.01
n = 100
xVals = []
yVals = []
for k in range(7):
pk = binomialProbability(n, k, p)
xVals.append(k)
yVals.append(pk)
pyplot.plot(xVals, yVals)
pyplot.show()
This (plotted in
Figure 21.6
) shows that although the expectation is to have one error
every 100 nucleotide positions, around 36% of the time there will be no errors.
Do'stlaringiz bilan baham: |