Figure 22.4. A normal distribution with mean and one and two standard deviations
marked. Corresponding to the example of human heights, the graph is the probability
density function for a normal (or Gaussian) distribution with a mean (μ) of 1.76 and a
standard deviation (σ) of 0.075. The values corresponding to one and two standard
deviations above and below the mean value are marked.
The normal distribution is important because of the central limit theorem, which says
that, under fairly weak assumptions, the distribution of the average of a number of
independent and identically distributed random variables approaches a normal
distribution, as the number of random variables increases. Considering two random
variables, if for each point in the distribution for the first we superimpose the spread that
arises from the distribution of the second, then the summation is a ‘smoothed’ probability
distribution. The more independent random variables we add the closer the overall density
gets to the normal distribution.
This commonly applies in science because observed values are often complicated
combinations of multiple random variables, i.e. different factors, that all contribute to an
observed distribution of values, and data samples are often assumed to be independent and
identically distributed. Hence, the central limit theorem is often invoked to justify
considering the measurement of some property to be distributed normally. For the example
of the heights of (male) humans the independent factors that contribute to the final value
may be things like multiple genetic factors (each with probability density functions for
outcomes), nutrition, mother’s weight etc., and it is the combination of all these random
factors that gives rise to the single statistic of height.
In the same manner as for the discrete probability distributions, we can create a simple
function to do one-tailed and two-tailed probability tests for the normal distribution using
functions from the scipy.stats module.
def normalTailTest(values, meanVal, stdDev, oneSided=True):
normRandVar = norm(meanVal, stdDev)
diffs = abs(values-meanVal)
result = normRandVar.cdf(meanVal-diffs) # Distrib is symmetric
if not oneSided:
result *= 2
return result
We can test this for an array of test values (i.e. human heights):
mean = 1.76
stdDev = 0.075
values = array([1.8, 1.9, 2.0])
result = normalTailTest(values, mean, stdDev, oneSided=True)
print( 'Normal one tail', result)
# Result is: [0.297, 0.03097, 0.000687]
Assuming the normal distribution and its parameters are a good model for male human
height, the results estimate that 29.7% are 1.8 metres or taller, 3.1% are 1.9 metres or taller
and 0.069 % are over 2.0 metres.
Do'stlaringiz bilan baham: |