Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet357/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   353   354   355   356   357   358   359   360   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Probability intervals

So  far  in  this  chapter  we  have  been  using  tailed  tests  to  calculate  the  probability  of

obtaining a given value from a statistical sample, and then on the basis of this probability

we  can  decide  whether  we  deem  the  sample  to  be  significantly  different  from  a  null

hypothesis using a threshold probability, say 5%. However, we can also take the reverse

approach  and  use  a  probability  threshold  upfront  to  calculate  what  the  equivalent  test

statistic would be for this limiting value. In turn this then leads to a corresponding interval

in the actual measurements.

Returning to the one-sample T-test example of comparing a sample mean,  , with the

true  mean,  μ



x

,  we  can  determine  a  confidence  interval  for  the  true  mean,  related  to  a

specified probability, given the sample mean and the unbiased sample standard deviation.



Mathematically  we  want  to  determine  the  interval  size  I  such  that  there  is  a  specified

probability that μ



x

is within I of  .

This  is  a  two-tailed  test,  and  the  one-sided  equivalent  would  be  the  probability  that

 is  larger  or  smaller  than  some  value.  To  calculate  the  interval  we  say  that  the

probability of the absolute difference between means is the same as the probability that the

magnitude  of  the  T-distribution  is  less  than  the  interval  divided  by  the  standard  error,

which simply comes from rearranging the formula for the T-statistic:

We need to invert this function to determine I given a probability. To do this practically we

use a function called the quantile function or percent point function. This does the inverse

job  to  the  cumulative  distribution  function,  so  we  pass  in  a  probability  and  get  out  a

threshold value that the random variable will be bounded by (at or below). Fortunately for

Python the percent point function is available for all the common probability distributions

described in the scipy.stats module, so we generally don’t have to worry about its precise

formulation. When we have calculated the inverse for a given probability we then simply

multiply  by  an  appropriate  factor,  representing  the  standard  error,  to  obtain  the

measurement interval.

We  now  provide  a  Python  function  to  calculate  the  value  of  the  interval,  given  the

probability,  or  confidence  that  the  samples  were  drawn  from  the  distribution.  The  input

can  be  a  list  or  a  NumPy  array  of  samples,  and  a  confidence  level  (e.g.  0.95  for  95%

confidence).  The  result  is  the  sampleMean  and  the  interval.  For  the  two-sided  test  this

means  that  the  actual  mean  is  between  sampleMean-interval  and  sampleMean+interval

with the probability given by the confidence level.

from numpy import mean, std, sqrt

from scipy.stats import t

def tConfInterval(samples, confidence, isOneSided=True):

n = len(samples)

sampleMean = mean(samples)

sampleStdDev = std(samples, ddof=1) # Unbiased estimate

if not isOneSided:

confidence = 0.5 * (1+confidence)

interval = t(n-1).ppf(confidence) * sampleStdDev / sqrt(n)

return sampleMean, interval

Inside the function,  if the test  is two-tailed we  adjust the  confidence value so that the

tail probability used is half that for a single tail. For example, for an input 95% confidence

(5% tail probability) we will find the interval corresponding to a one-tailed confidence of

97.5% (2.5% tail probability) because there will be two tail integrals that both contribute.

Next, using scipy.stats.t we pass the appropriate number of degrees of freedom (n-1) in to

the T-distribution and use the percent point function for this with ppf(). The value obtained




is actually

, so we scale this by

to get the required interval. The function

can be tested with our previous example, using a sample of human heights:

from numpy import array

samples = array([1.752, 1.818, 1.597, 1.697, 1.644, 1.593,

1.878, 1.648, 1.819, 1.794, 1.745, 1.827])

sMean, intvl = tConfInterval(samples, 0.95, isOneSided=False)

print('Sample mean: %.3f, 95%% interval:%.4f' % (sMean, intvl))

Note that the double ‘%%’ in the print() statement is because Python treats a single ‘%’

as the first character in a format string.

Hence,  the  difference  to  the  mean  that  we  would  accept  for  a  95%  confidence  limit,

when accepting an underlying probability distribution, is an interval of 0.0615 metres. If

the mean of our null hypothesis distribution is actually 1.76 metres, then we would accept

the  sample  mean  of  1.734  metres  because  it  is  0.0257  metres  away  from  the  mean,  and

thus lies within the interval.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   353   354   355   356   357   358   359   360   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish