Python Programming for Biology: Bioinformatics and Beyond

Figure 21.6. An example output of the Binomial distribution

Download 7,75 Mb.

Pdf ko'rish

bet	325/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 321 322 323 324 325 326 327 328 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 21.6. An example output of the Binomial distribution. A graph of the output

generated from the binomialProbability() function, tested for 100 trials with event

probability 0.01 and illustrating the probability for discrete numbers of events in the range

from 0 to 6. Note that the line is only to guide the eye; because the distribution is discrete,

it is only defined for whole numbers.

It should be noted that the combinatorial factor gets very large as the number of trials n

gets large, unless k is near 0 or near n, so the binomial probability can be computationally

tricky to calculate. For example, returning to the restriction enzyme HindIII, which may

cut a random DNA sequence with a probability of 0.0025, for a total DNA length of 10

megabases we may try to estimate the probability of having 2500 cuts (the mean value) as

follows:

p = 0.00025

n = 10000000

print( binomialProbability(n, 2500, p) ) # Fails!

Although this was a reasonable question to ask the code fails because of the large

number of combinations involved. All is not lost, however, because the scipy.stats module

provides an implementation that is more robust (and quicker). To use this we define a

Python object that represents a binomial random variable with given parameters, which

here are the number of trials and event probability. This object has various methods

(functions bound to it), and one of these is .pmf(), which represents the probability mass

function, i.e. a function that calculates the probability for a given value of the random

variable, just like binomialProbability().

from scipy.stats import binom

p = 0.0025

n = 10000000

binomRandomVar = binom(n, p)

print( binomRandomVar.pmf(2500) )# Succeeds: 0.007979577

The random variable object also has other handy functions, as described in the SciPy

documentation:

print( binomRandomVar.mean() ) # Most likely value : 2500.0

print( binomRandomVar.std() ) # Standard deviation : 49.99

Also, as you might expect, the SciPy functions not only operate on single numbers, but

also work with NumPy arrays. Hence, we can create a whole array of values for the counts

k and calculate the array of probabilities for these with .pmf():

from scipy.stats import binom

from numpy import array

binomRandomVar = binom(10000000, 0.00025)

counts = array(range(2300, 2700))

probs = binomRandomVar.pmf(counts)

pyplot.plot(counts, probs)

pyplot.show()

We can also calculate the cumulative probabilities, the sum of the probabilities from

zero to each value of k (which is very handy for statistical testing, as discussed in

Chapter

), using .cdf():

cumulative = binomRandomVar.cdf(counts)

pyplot.plot(counts, cumulative)

pyplot.show()

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 321 322 323 324 325 326 327 328 ... 514