Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	324/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 320 321 322 323 324 325 326 327 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Binomial distribution

Given an event with a fixed probability of occurrence, the binomial distribution is the

probability distribution of the number of events that occur after a specified number of

independent trials. A simple example of this would be the event of rolling a six on a die,

i.e. with probability

/

6

, where after a specified total number of rolls we can count the

number of times that a six came up. Repeating the same experiment (with the same total

number of rolls) will result in a distribution of different counts for rolling a six. The

probability of getting a given count of sixes is described by the binomial distribution. For

a given event probability and given number of trials, the probability of a count can be

calculated using the formula presented below. This is based on the notion that the

probability of a count depends on the number of arrangements in which the count can be

obtained. To take the example of rolling a die three times, where there are 216 (6×6×6)

possible outcomes, there is only one way of getting a count of three sixes, but there are 15

ways of getting two sixes (a non-six can occur in three positions, and there are five

possibilities for each), 75 ways of getting one six (a six can occur at three positions and

there are five times five possibilities for the non-sixes) and 125 ways of getting no six

(five possibilities for each position).

The probability Pr(k) of observing k events from n independent trials given event

probability p is:

This is often written using

, which is notation for the combinatorial factor, giving the

number of ways of choosing k items from a total of n:

If we seek the probability of getting two sixes from three rolls we multiply the probability

of getting two sixes, p

= 1/6

, by the probability of getting a non-six in the other rolls, (1

− p)

n − k

= (5/6)

3−2

, by the number of ways of choosing two successes from three rolls,

, and the result is indeed 15/216.

We can define a function to calculate this in Python, using the handy comb, which we

can import from SciPy to calculate the combinatorial factor:

from scipy.misc import comb

def binomialProbability(n, k, p):

return comb(n, k) * p**k * (1-p) ** (n-k)

To test this we can again calculate the probability of getting two sixes from three rolls

of a die:

p = 1/6.0 # Probability of event

n = 3 # Number of trials

k = 2 # Number of events sought

print( binomialProbability(n, k, p) )

# Result is 0.069444444 = 15/216

As a biological example we could investigate the distribution in the number of

sequencing errors (i.e. calling the wrong nucleotide) we expect when determining a DNA

sequence of a given length. If the sequencing machine has a random error rate of 0.01 and

reads the sequence for a total of 100 nucleotides, then the distribution of the number of

errors can be plotted as follows:

from matplotlib import pyplot

p = 0.01

n = 100

xVals = []

yVals = []

for k in range(7):

pk = binomialProbability(n, k, p)

xVals.append(k)

yVals.append(pk)

pyplot.plot(xVals, yVals)

pyplot.show()

This (plotted in

Figure 21.6

) shows that although the expectation is to have one error

every 100 nucleotide positions, around 36% of the time there will be no errors.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 320 321 322 323 324 325 326 327 ... 514