Z-scores and Z-test
A Z-score (also called the standard score) is the number of standard deviations an
observed value is different from the mean. So for our human height example, values of
1.685 metres and 1.910 metres have Z-scores of −1.0 and 2.0 because they are
respectively 1σ below and 2σ above the mean. This is formalised in the following
equation, i.e. subtract the mean and divide by the standard deviation:
If we apply this to a whole distribution of values then we will centre it (the mean) at zero
and give it a standard deviation of 1.0, e.g. to create the standard normal distribution,
whose random variable is often labelled Z. We can easily calculate a Z-score in Python,
here taking parameters from the human height example:
from numpy import abs
mean = 1.76
stdDev = 0.075
values = array([1.8, 1.9, 2.0])
zScores = abs(values - mean)/stdDev
print('Z scores', zScores)
Thus we estimate that 1.8, 1.9 and 2.0 metres respectively correspond to about 0.5, 1.9
and 3.2 standard deviations from the mean. Note that SciPy provides the stats.zscore()
function, but it operates differently because it estimates its own sample mean and sample
standard deviation from the input values:
from scipy.stats import zscore, norm
samples = norm.rvs(mean, stdDev, size=25) # Values for testing
zScores = zscore(samples, ddof=1) # Unbiased estimators
print('Est. Z scores ', zScores)
A related concept to this is the Z-test, which can be used when we have samples that are
taken from a normal distribution where the true mean and standard deviation are known.
The Z-test is effectively the calculation of a Z-score for a sample mean. A common
situation for use of the Z-test is where a large population is known to have a mean, μ
0
, and
standard deviation, σ, and where some other population of size n is measured to have a
sample mean,
, and the same standard deviation. We want to know whether this is
significantly different and the null hypothesis would be that the two populations have the
same mean. For the Z-test the Z-score is defined as:
As discussed above, in the context of the standard error of the mean, the standard
deviation of the sample mean is a factor of
smaller than the standard deviation of
the distribution. The analysis also works if the distribution is not normal but the number of
samples, n, is large, by the central limit theorem (assuming the conditions for the theorem
are satisfied). If the standard deviation is not known, then the T-test described in the next
section should be used instead.
Given a standard normal distribution (μ = 0, σ = 1), the probability of observing a Z-
score or worse is a two-tailed test. If this probability is low then the two populations are
deemed to have a significantly different mean, and the null hypothesis is rejected. If z
were positive we could also consider a one-tailed test, which is the probability of
observing a result at least this positive. For the Z-test there is no direct SciPy function to
perform the whole calculation of tail probabilities. Hence we need to take specific steps to
find the integral of the probability distribution from the Z-score. Fortunately this is partly
solved by having a cumulative distribution available: the summation up to a threshold of
the probability density function. The cumulative distribution of the standard normal (Φ) is
required for the tailed test. This is easily calculated in Python using the error function
12
available in SciPy, which is related to cumulative distribution of the standard normal :
, and thus solves the integral we require without too much
hassle.
The code to calculate the Z-test probability in SciPy involves calculating the Z-scores
for the standard error of the means and then using the error function erf() to derive the
cumulative probability:
from numpy import sqrt
from scipy.special import erf
def zTestMean(sMean, nSamples, normMean, stdDev, oneSided=True):
zScore = abs(sMean - normMean) / (stdDev / sqrt(nSamples))
prob = 1-erf(zScore/sqrt(2))
if oneSided:
prob *= 0.5
return prob
The calculation of the probability involves a trivial bit of arithmetic, remembering that
we want 1− Φ, the tail of the cumulative distribution of the standard normal, and noting
that the initial cumulative probability calculation is the two-tailed result (i.e. twice Φ),
which we halve for the one-tailed result. This can be tested with some example data values
which are roughly normal:
samples = array([1.752, 1.818, 1.597, 1.697, 1.644, 1.593,
1.878, 1.648, 1.819, 1.794, 1.745, 1.827])
mean = 1.76
stDev = 0.075
result = zTestMean(samples.mean(), len(samples),
mean, stdDev, oneSided=True)
print( 'Z-test', result) # Result is 0.1179
The resulting probability of the sample mean coming from the normal distribution is
11.8%, so we generally wouldn’t want to reject the notion that the samples were generated
from it.
As another example, suppose we have a large database of DNA sequences and the G:C
content of sequences in the database has mean 0.59 and standard deviation 0.1. The G:C
content would not usually be modelled using a normal distribution, but if we have 100
sequences not in the database, and measure the G:C content of each, then we could still
reasonably apply the Z-test, thus informing us whether they are likely to be from the same
population of sequences. Suppose that the average G:C content in these 100 sequences is
0.61. The one-tailed test is given by
result = zTestMean(0.59, 100, 0.61, 0.1)
with result 0.023. The two-tailed test gives twice this, so 0.046. In both cases, if 5% is the
significance level used, then the null hypothesis is rejected, and it is concluded that the
100 sequences have a significantly different G:C content than the sequences in the
database.
T-tests
The Z-test we described relied on knowledge of a distribution’s standard deviation (or
having a good estimate from a large population). However, in many situations we do not
know the underlying mean and standard deviations of the probability distributions. This is
often the natural outcome of having small statistical samples. Nonetheless, we may still
want to evaluate whether statistical samples are significantly different from one another.
This is where the idea of T-tests comes in.
T-tests are based on the notion of the T-statistic, which is similar to the Z-score
discussed before. Accordingly, the T-statistic is the measure of the number of standard
errors a measured parameter value is from its true value. In many cases the parameter
we’re interested in is the mean of a normal distribution, in which case the T-statistic could
be the number of standard errors that the sample mean ( ) lies from the true mean (μ
Do'stlaringiz bilan baham: |