Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet341/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   337   338   339   340   341   342   343   344   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Samples and significance

One  of  the  key  principles,  which  underpins  most  statistical  analyses,  is  the  idea  that  the

data  we  collect  contains  a  limited  number  of  samples  from  some  kind  of  underlying

probability distribution. This probability distribution can be thought of as the mechanism

by which the data values are generated, but naturally the data arises due to some physical

process and by ascribing a probability distribution we are merely forming a mathematical

model, which is often significantly simplified, to approximate the data-generation process.

For  a  given  situation,  if  we  have  an  idea  of  what  type  of  underlying  probability

distribution  would  be  appropriate,  then  by  looking  at  the  observed  data  we  can  begin  to

estimate what the parameters of the distribution are, such as where its centre is and how

much it spreads. Given parameter estimates we can then begin to answer questions which

relate to the probabilistic model, such as how likely it is that a given value is generated by

the  model.  In  virtually  all  cases  the  answer  provided  is  not  certain,  rather  the  answer  is

given  as  being  true  with  a  certain  probability,  which  for  parameter  estimation  is  often

called a confidence level. It is often the case that a 95% probability is considered a suitable

confidence  level  for  inferring  significance,  but  of  course  even  at  this  seemingly  strict

level, 5% (1 in 20) of the sampled values would lie outside the quoted range.

In  Python  several  of  the  commonly  used  probability  distributions  are  represented,

including  in  the  scipy.stats  module,  which  we  will  routinely  refer  to  in  this  chapter,  and

also  in  the  numpy.random  module,  which  allows  us  to  draw  random  samples  from  a

distribution.  Here  we  illustrate  creating  random  samplings  with  different  numbers  of

points, selecting from a normal distribution using random.normal, which we then show as

a histogram:

from matplotlib import pyplot

from numpy import random

mean = 0.0

stdDev = 1.0

for nPoints in (10, 100, 1000, 10000,100000):

sample = random.normal(mean, stdDev, nPoints)

pyplot.hist(sample, bins=20, range=(-4,4), normed=True)

pyplot.show()

Predictions from a probability distribution are often coupled to the idea of a competing

hypothesis.  Here  the  probability  distribution  is  often  a  model  of  what  we  expect  at

random  and  the  competing  hypothesis  would  mean  that  something  significantly  non-

random was happening. Hence, rather than drawing significance if this model appears to

fit the data, we assert that there is significance if the random model is unlikely to explain

the  data  samples;  that  our  data  does  not  fit  the  probability  distribution  of  the  random




situation.  So  by  applying  a  probabilistic  model  we  are  generally  not  assuming  that  we

actually have a good physical model for our data, but rather that there is a mathematical

approximation  to  the  data-generation  process,  which  is  nonetheless  useful  for  making

predictions and for understanding key aspects of what we are studying.

Lastly,  it  is  important  to  note  that  even  in  situations  where  the  underlying  probability

distribution is not known we can nonetheless estimate some statistical parameters. In the

simplest  situation,  we  might  simply  try  and  estimate  the  mean  (average)  or  standard

deviation (spread) of the distribution, given the data, and not worry too much about what

the distribution is.


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   337   338   339   340   341   342   343   344   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish