Python Programming for Biology: Bioinformatics and Beyond


Figure 22.5.  Comparing the underlying mean of a probability distribution and the



Download 7,75 Mb.
Pdf ko'rish
bet356/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   352   353   354   355   356   357   358   359   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 22.5.  Comparing the underlying mean of a probability distribution and the

sample mean. A standard normal distribution, with a mean, μ, of 0.0 is superimposed on a

set of data with a sample mean,  , of 0.36. Given that the sample mean has an associated

error depending on the number of samples taken, we can use a T-test to assess whether the

separation between the two means is significant, and thus whether the probability

distribution is a good model for the data.

In mathematical parlance, for the two-sample T-test the T-statistic turns out to follow a

T-distribution with n

x

+ n



y

− 2 degrees of freedom. Similarly the one-sample T-test has a

T-statistic with n − 1 degrees of freedom. In statistics the notion of ‘degrees of freedom’

can be a somewhat tricky concept, but the principle is to know the number of independent

data  points  that  can  truly  vary.  To  take  an  arbitrary  but  simple  example,  where  there  are

three sample values that have a mean of zero, once two values are known then there is no

choice about the third, because we know it must give the known mean. Hence, in general,

for a statistical analysis the number of degrees of freedom is the number of independent

sample values, minus the number of restraining parameters.

Once  we  have  an  appropriate  T-statistic,  with  an  appropriate  number  of  degrees  of

freedom, in order to perform a statistical test we use the distribution of how the T-statistic

itself varies when taking different samplings. This probability distribution is the Student T-



distribution,

13

 which  assumes  the  samples  are  independent  and  have  the  same  normal



distribution. We won’t go into details of this distribution, only to say that it is a bell shape,

like  the  normal  distribution  but  with  thicker  tails.  For  Python  the  scipy.stats  module  has

some  pre-packaged  T-test  functions  as  well  as  facilities  to  access  the  Student  T-

distribution,  which  allow  us  to  easily  make  probability  estimates  to  assess  the  variation

due to sample variance.

The  complete  T-test  functions  available  in  scipy.stats  are  ttest_1samp,  ttest_ind  and

ttest_rel  and  these  accept  samples,  represented  as  arrays  of  values,  and  perform  the

appropriate  two-tailed  test  to  estimate  a  probability.  Also,  because  the  T-distribution  is

symmetric,  for  a  one-tailed  test  we  can  simply  halve  the  probability.  Illustrating  each  of

these functions in turn, ttest_1samp finds the probability of a sample mean being the same

as the true mean from a distribution (e.g. a null hypothesis), and thus uses the T-statistic



for

described above. Note that the T-statistic as well as the two-tailed probability

are passed back by the function:

from scipy.stats import ttest_1samp

trueMean = 1.76

samples = array([1.752, 1.818, 1.597, 1.697, 1.644, 1.593,

1.878, 1.648, 1.819, 1.794, 1.745, 1.827])

tStat, twoTailProb = ttest_1samp(samples, trueMean)

# Result is: -0.918, 0.378

The function ttest_ind performs the two-sample T-test, testing whether two independent

samples  have  the  same  underlying,  true  mean,  based  on  their  respective  sample  means,

described as   and   above:

from scipy.stats import ttest_ind

samples1 = array([1.752, 1.818, 1.597, 1.697, 1.644, 1.593])

samples2 = array([1.878, 1.648, 1.819, 1.794, 1.745, 1.827])

tStat, twoTailProb = ttest_ind(samples1, samples2)

# Result is: -2.072, 0.0650

There is an extra option to this function, to relax the requirement that both samples have

the same variance, in which case the test is called Welch’s T-test,

14

though the difference



for our test case is slight:

tStat, twoTailProb = ttest_ind(samples1, samples2, equal_var=False)

# Result is: # -2.072 0.0654

Lastly, the ttest_rel function again works with two samples in the same way as above,

but assumes that the samples are dependent, i.e. that the values in the pair of samples are

related  to  one  another  (they  must  have  the  same  variance,  hence  there  is  no  equal_var

option). An example of this would be to take some measure from a group of people as the

first  samples  and  then  to  take  repeated  measurements  for  the  same  people  at  a  different

time (perhaps after some treatment) or using a different method.


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   352   353   354   355   356   357   358   359   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish