Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	330/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 326 327 328 329 330 331 332 333 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Markov processes

A Markov chain is defined by a set of possible states that represent the range of possible

occurrences for the trials that make up the chain. The probability for a given chain can

then be calculated using the conditional probability moving from each state to the next

state along the chain. For the simple example of rolling dice, we can use a Markov chain

to model the situation where a fair die is occasionally swapped for a loaded (unfair) die,

i.e. where the probabilities of the six outcomes are not equal. Hence, according to which

die is used the probabilities of the different roll outcomes change and we could model this

by having two states: a fair die and a loaded die. Similarly for an otherwise random DNA

model we may have different probabilities of observing C:G versus A:T depending on

whether the region is a gene or not. Here the states would be gene and non-gene

nucleotides, each with associated conditional probabilities. Technically we are describing

a discrete-time homogeneous Markov chain.

It is discrete time because we have fixed

positions or trials for the chain, which needn’t be true in the general case, and the notion

of being homogeneous refers to the fact that the conditional probabilities don’t vary along

the chain (e.g. vary with ‘time’).

Expressing this in terms of conditional probability, we model the probability of a trial

having a particular state given the state of the previous trial. A consequence of this is that

we consider all the possible probabilities of going from one state to another. Generally this

is described as a matrix of conditional probabilities, which we call the transition matrix.

Here each element of the matrix (T) is the probability of observing a particular state (State

= j) at a position in the chain (n+1) given the occurrence of a potentially different state

(State = i) at the previous position (n), i.e. we transition from state i to j:

T

i,j

= Pr(State

+ 1 = j| State

= i)

As before we use the typical mathematical notation where the ‘|’ symbol means ‘given’.

We assume that the transition matrix is the same across the whole Markov chain, i.e.

independent of n. Note that since the chain must transition to something, i.e. State

n+1

must

take some value, the summation over all destination states j for starting state i is one:

Once we know the probability of going to the next state given the previous one, which

would often be derived from statistical observations of real data, we can then use a

Markov chain to generate sequences of outcomes or states. In essence this would mean

using the transition matrix repeatedly, albeit in a random manner according to its

probabilities, to produce a sample from the model.

From a relatively simple transitioning

model we can then make long-term predictions. For example, we could have a Markov

chain that models the reproduction of a population of bacteria, with assigned probabilities

for the number of progeny in a generation given the number in the previous generation.

Although this process is really continuous we are considering a simplified model with

discrete time points by using the notion of a ‘generation’. Given different starting

populations we can then investigate the different long-term outcomes, e.g. whether the

population grows or dies out.

Here we would say the state was the size of the population, and the probability for the

size at a next point is predicted only from the current population size. We can repeat this

process and take another discrete step along the chain, to predict even further into the

future, but naturally if we do this the likelihood of any given outcome becomes even more

uncertain. At a glance it may seem futile to extend the chain very far, basing guesses upon

guesses, but the real power of the Markov process comes from the ability to compare the

relative likelihood of different outcomes and how these may have arisen from a

combination of different intermediate states.

For our simplistic model of bacterial population growth, where we assume that

generations are distinct, we will use a probability distribution that describes the likelihood

of the number of progeny that each individual bacterium can give rise to in a generation.

In this case the states that the Markov chain can take are non-negative integers, i.e.

numbers zero and above, and we will use a Poisson distribution for the probabilities. A

state of zero means that the population dies out. A general transition matrix, which is

independent of both the individual bacterium and the generation, then derives from the

Poisson distribution being applied to each state. Although the distribution describes the

likelihood of progeny for an individual we combine these for all individuals within a given

population size.

In this example we do not explicitly calculate the complete transition matrix, given that

the population size is unbounded. Rather we will calculate the population of the next

generation based on the current population by generating random outcomes for each

individual bacterium, according to the probability distribution. In subsequent examples

where there are more limited states we will describe the whole transition matrix of

probabilities upfront. The general form of the transition matrix is as follows, where the

subscripts (i and j) respectively represent the current and next population states:

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 326 327 328 329 330 331 332 333 ... 514