Python Programming for Biology: Bioinformatics and Beyond

Conditional probabilities

Download 7,75 Mb.

Pdf ko'rish

bet	321/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 317 318 319 320 321 322 323 324 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Conditional probabilities

Moving to a different kind of example, we will consider probabilities associated with the

occurrence of a disease (D) and how this relates to the experimental observation of a

particular mutant version (M) of a gene, i.e. with a different DNA sequence. Here the

probability that both occur, Pr(D and M), on its own does nothing to suggest whether the

two are related. Naturally to investigate the link between the two we would need to know

probabilities of the events alone (having the disease and having the mutation) and thus

whether the intersection of the two is more or less than we would expect if they were

unrelated. By doing this we implicitly use the concept of hypothesis testing. As far as

medical prediction and diagnosis is concerned it is helpful to consider the complementary

events. In this case these are the event that there is no disease and the event that there is no

mutation. With these we can compare the hypothesis, that the disease and mutations are

linked, with an appropriate alternative and null hypothesis (see

Chapter 22

By counting occurrences of the different situations we can estimate the various

combinations of conditional probabilities. For example, we can estimate the probability of

having the disease given that the mutation is present, Pr(D given M), and compare it to the

probability of having the disease given no mutation Pr(D given noM), i.e. whether the

mutation increases or decreases the chance of the disease. Also, if it is established that

Pr(D given M) is much greater than Pr(D given noM), i.e. that the mutation is highly

correlated with the disease, then knowing the probability of not having the disease given

the mutation being present Pr(noD given M) is vital if we hope to use a genetic test to

predict the disease outcome; in other words we need to know whether there would be lots

of false-positive results.

We can also think of the dependent DNA events in the HindIII restriction enzyme

example in terms of conditional probabilities, for example, what the probability of having

a cut site (AAGCTT) is in a region of DNA given a G:C content greater than 60%. It

should be noted that this is a distinct question from asking what the probability of one

event and another is, though the two are related. For this example the probability that both

events occur considers the outcomes from all the possible DNA sequences, while the

probability that one occurs given the other does not, it only considers situations where the

second event has definitely occurred. The probability that they are both true is the same as

the probability of one occurring multiplied by the probability of the second occurring

given that we’ve already got the other. So for two arbitrary events X and Y we have:

Pr(X and Y) = Pr(X) × Pr(Y given X)

And it doesn’t matter which way we phrase this, the converse is also true:

Pr(X and Y) = Pr(Y) × Pr(X given Y)

Obviously this only makes sense if Pr(X) and Pr(Y) are not zero. Combining these two

formulations we can say that one is equal to the other, i.e. that:

Pr(X) × Pr(Y given X) = Pr(Y) × Pr(X given Y)

which is often written in the form:

Pr(Y given X) = Pr(Y) × Pr(X given Y) / Pr(X)

This is a very important result which is called Bayes’ theorem. As we discuss in the next

section this formulation is commonly used for hypothesis testing.

Returning to our medical example, for prognosis and appropriate treatment we might

want to know the probability of getting the disease given the mutation Pr(D given M).

However, it may not be cost-effective to obtain statistics by genetically testing large

numbers of people for the mutation, just for the chance that they would get a rare disease.

Also, it might be that the disease is difficult to diagnose and doesn’t show immediately.

Conversely it may be easier to determine Pr(M given D) by testing a limited number of

people who definitely do have the disease to discover whether they have the mutation.

Using Bayes’ theorem we can easily get the probability we want from the other.

Pr(D given M) = Pr(M given D) Pr(D) / Pr(M)

Naturally we must also estimate Pr(D) and Pr(M), the probabilities of disease and

mutation in the absence of any other information, from statistical data. However, Pr(D)

could simply come from medical records and Pr(M) could come from testing any group of

people, whether or not they had the rare disease.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 317 318 319 320 321 322 323 324 ... 514