Figure 21.5. Combining probabilistic events. The first event, that one nucleotide from
the 16 pairs contains an A, and the second event, that the nucleotides are different, are
subsets of the total set of outcomes. The intersection between the two events is the set of
outcomes common to both. Probabilities are calculated for the events assuming that all
outcomes are equally likely.
Something that follows from the basic axioms of probability is the notion that we can
use the probability of the intersection between events Pr(E
1
and E
2
) to calculate the
probability of the union between events Pr(E
1
or E
2
):
Pr(E
1
or E
2
) = Pr(E
1
) + Pr(E
2
) – Pr(E
1
and E
2
)
If there is an intersection between the event E
1
and the event E
2
adding the probabilities
for the two will include the overlapping outcomes twice, so subtracting the intersection
that both E
1
and E
2
happen redresses this. This way each outcome that involves E
1
or E
2
contributes the same. When considering mutually exclusive events the probability P(E
1
and E
2
) is naturally zero, in which case Pr(E
1
or E
2
) is just the sum of the independent
probabilities.
4
We can show the calculation of P(E
1
or E
2
) in Python by either creating the
appropriate set or by using the above equation:
union = event1 | event2 # Set with elements from both
pUnion = sum([probs[xy] for xy in union])
print(pUnion) # 0.81049
print(pEvent1 + pEvent2 - pEvent1and2) # 0.81049 - same
While we can treat combined dice rolls or DNA positions as discrete outcomes we can
also imagine these as arising from a chain of probabilistic selections. In the above
examples the trials are independent and the result of the first has no influence on the
second, which is reasonable for a fair die. However, for DNA (and many other analogous
situations in biology) the probabilities of the occurrence of a nucleotide at each position
may not only be different, as discussed before, but the probability for the second position
may also vary according to which base is present in the first position, or indeed many
other positions.
In this case we would say the positions were not independent and the probability of
observing the second nucleotide differs, depending on the outcome of the first. To
calculate the probability of getting each pair of nucleotides we get the probability of
obtaining the first nucleotide and multiply this by the probability of getting the second,
given the first. This is what is termed a conditional probability and in general we would
need to know what the probabilities for the four nucleotides were given each particular
preceding nucleotide.
Do'stlaringiz bilan baham: |