Python Programming for Biology: Bioinformatics and Beyond


Conditional probabilities



Download 7,75 Mb.
Pdf ko'rish
bet321/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   317   318   319   320   321   322   323   324   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Conditional probabilities

Moving to a different kind of example, we will consider probabilities associated with the

occurrence  of  a  disease  (D)  and  how  this  relates  to  the  experimental  observation  of  a

particular  mutant  version  (M)  of  a  gene,  i.e.  with  a  different  DNA  sequence.  Here  the

probability that both occur, Pr(D and M), on its own does nothing to suggest whether the

two are related. Naturally to investigate the link between the two we would need to know

probabilities  of  the  events  alone  (having  the  disease  and  having  the  mutation)  and  thus

whether  the  intersection  of  the  two  is  more  or  less  than  we  would  expect  if  they  were

unrelated.  By  doing  this  we  implicitly  use  the  concept  of  hypothesis  testing.  As  far  as

medical prediction and diagnosis is concerned it is helpful to consider the complementary

events. In this case these are the event that there is no disease and the event that there is no

mutation.  With  these  we  can  compare  the  hypothesis,  that  the  disease  and  mutations  are

linked, with an appropriate alternative and null hypothesis (see

Chapter 22

).

By  counting  occurrences  of  the  different  situations  we  can  estimate  the  various



combinations of conditional probabilities. For example, we can estimate the probability of

having the disease given that the mutation is present, Pr(D given M), and compare it to the

probability  of  having  the  disease  given  no  mutation  Pr(D  given  noM),  i.e.  whether  the

mutation  increases  or  decreases  the  chance  of  the  disease.  Also,  if  it  is  established  that

Pr(D  given  M)  is  much  greater  than  Pr(D  given  noM),  i.e.  that  the  mutation  is  highly

correlated with the disease, then knowing the probability of not having the disease given

the  mutation  being  present  Pr(noD  given  M)  is  vital  if  we  hope  to  use  a  genetic  test  to

predict the disease outcome; in other words we need to know whether there would be lots

of false-positive results.

We  can  also  think  of  the  dependent  DNA  events  in  the  HindIII  restriction  enzyme

example in terms of conditional probabilities, for example, what the probability of having

a  cut  site  (AAGCTT)  is  in  a  region  of  DNA  given  a  G:C  content  greater  than  60%.  It

should  be  noted  that  this  is  a  distinct  question  from  asking  what  the  probability  of  one

event and another is, though the two are related. For this example the probability that both

events  occur  considers  the  outcomes  from  all  the  possible  DNA  sequences,  while  the

probability that one occurs given the other does not, it only considers situations where the

second event has definitely occurred. The probability that they are both true is the same as

the  probability  of  one  occurring  multiplied  by  the  probability  of  the  second  occurring

given that we’ve already got the other. So for two arbitrary events X and Y we have:

Pr(X and Y) = Pr(X) × Pr(Y given X)

And it doesn’t matter which way we phrase this, the converse is also true:



Pr(X and Y) = Pr(Y) × Pr(X given Y)

Obviously  this  only  makes  sense  if  Pr(X)  and  Pr(Y)  are  not  zero.  Combining  these  two

formulations we can say that one is equal to the other, i.e. that:

Pr(X) × Pr(Y given X) = Pr(Y) × Pr(X given Y)

which is often written in the form:

Pr(Y given X) = Pr(Y) × Pr(X given Y) / Pr(X)

This is a very important result which is called Bayes’ theorem. As we discuss in the next

section this formulation is commonly used for hypothesis testing.

Returning  to  our  medical  example,  for  prognosis  and  appropriate  treatment  we  might

want  to  know  the  probability  of  getting  the  disease  given  the  mutation  Pr(D  given  M).

However,  it  may  not  be  cost-effective  to  obtain  statistics  by  genetically  testing  large

numbers of people for the mutation, just for the chance that they would get a rare disease.

Also,  it  might  be  that  the  disease  is  difficult  to  diagnose  and  doesn’t  show  immediately.

Conversely  it  may  be  easier  to  determine  Pr(M  given  D)  by  testing  a  limited  number  of

people  who  definitely  do  have  the  disease  to  discover  whether  they  have  the  mutation.

Using Bayes’ theorem we can easily get the probability we want from the other.

Pr(D given M) = Pr(M given D) Pr(D) / Pr(M)

Naturally  we  must  also  estimate  Pr(D)  and  Pr(M),  the  probabilities  of  disease  and

mutation  in  the  absence  of  any  other  information,  from  statistical  data.  However,  Pr(D)

could simply come from medical records and Pr(M) could come from testing any group of

people, whether or not they had the rare disease.


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   317   318   319   320   321   322   323   324   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish