Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet330/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   326   327   328   329   330   331   332   333   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Markov processes

A Markov chain is defined by a set of possible states that represent the range of possible

occurrences  for  the  trials  that  make  up  the  chain.  The  probability  for  a  given  chain  can

then  be  calculated  using  the  conditional  probability  moving  from  each  state  to  the  next

state along the chain. For the simple example of rolling dice, we can use a Markov chain

to model the situation where a fair die is occasionally swapped for a loaded (unfair) die,

i.e. where the probabilities of the six outcomes are not equal. Hence, according to which



die is used the probabilities of the different roll outcomes change and we could model this

by having two states: a fair die and a loaded die. Similarly for an otherwise random DNA

model  we  may  have  different  probabilities  of  observing  C:G  versus  A:T  depending  on

whether  the  region  is  a  gene  or  not.  Here  the  states  would  be  gene  and  non-gene

nucleotides, each with associated conditional probabilities. Technically we are describing

a  discrete-time  homogeneous  Markov  chain.

8

 It  is  discrete  time  because  we  have  fixed



positions or trials for the chain, which needn’t be true in the general case, and the notion

of being homogeneous refers to the fact that the conditional probabilities don’t vary along

the chain (e.g. vary with ‘time’).

Expressing this in terms of conditional probability, we model the probability of a trial

having a particular state given the state of the previous trial. A consequence of this is that

we consider all the possible probabilities of going from one state to another. Generally this

is described as a matrix of conditional probabilities, which we call the transition  matrix.

Here each element of the matrix (T) is the probability of observing a particular state (State

=  j)  at  a  position  in  the  chain  (n+1)  given  the  occurrence  of  a  potentially  different  state

(State = i) at the previous position (n), i.e. we transition from state i to j:



T

i,j

= Pr(State



n

+ 1 = j| State



n

= i)

As  before  we  use  the  typical  mathematical  notation  where  the  ‘|’  symbol  means  ‘given’.

We  assume  that  the  transition  matrix  is  the  same  across  the  whole  Markov  chain,  i.e.

independent of n. Note that since the chain must transition to something, i.e. State

n+1

must


take some value, the summation over all destination states j for starting state i is one:

Once  we  know  the  probability  of  going  to  the  next  state  given  the  previous  one,  which

would  often  be  derived  from  statistical  observations  of  real  data,  we  can  then  use  a

Markov  chain  to  generate  sequences  of  outcomes  or  states.  In  essence  this  would  mean

using  the  transition  matrix  repeatedly,  albeit  in  a  random  manner  according  to  its

probabilities, to produce a sample from the model.

9

From a relatively simple transitioning



model  we  can  then  make  long-term  predictions.  For  example,  we  could  have  a  Markov

chain that models the reproduction of a population of bacteria, with assigned probabilities

for  the  number  of  progeny  in  a  generation  given  the  number  in  the  previous  generation.

Although  this  process  is  really  continuous  we  are  considering  a  simplified  model  with

discrete  time  points  by  using  the  notion  of  a  ‘generation’.  Given  different  starting

populations  we  can  then  investigate  the  different  long-term  outcomes,  e.g.  whether  the

population grows or dies out.

Here we would say the state was the size of the population, and the probability for the

size at a next point is predicted only from the current population size. We can repeat this

process  and  take  another  discrete  step  along  the  chain,  to  predict  even  further  into  the

future, but naturally if we do this the likelihood of any given outcome becomes even more

uncertain. At a glance it may seem futile to extend the chain very far, basing guesses upon

guesses, but the real power of the Markov process comes from the ability to compare the

relative  likelihood  of  different  outcomes  and  how  these  may  have  arisen  from  a




combination of different intermediate states.

For  our  simplistic  model  of  bacterial  population  growth,  where  we  assume  that

generations are distinct, we will use a probability distribution that describes the likelihood

of the number of progeny that each individual bacterium can give rise to in a generation.

In  this  case  the  states  that  the  Markov  chain  can  take  are  non-negative  integers,  i.e.

numbers  zero  and  above,  and  we  will  use  a  Poisson  distribution  for  the  probabilities.  A

state  of  zero  means  that  the  population  dies  out.  A  general  transition  matrix,  which  is

independent  of  both  the  individual  bacterium  and  the  generation,  then  derives  from  the

Poisson  distribution  being  applied  to  each  state.  Although  the  distribution  describes  the

likelihood of progeny for an individual we combine these for all individuals within a given

population size.

In this example we do not explicitly calculate the complete transition matrix, given that

the  population  size  is  unbounded.  Rather  we  will  calculate  the  population  of  the  next

generation  based  on  the  current  population  by  generating  random  outcomes  for  each

individual  bacterium,  according  to  the  probability  distribution.  In  subsequent  examples

where  there  are  more  limited  states  we  will  describe  the  whole  transition  matrix  of

probabilities  upfront.  The  general  form  of  the  transition  matrix  is  as  follows,  where  the

subscripts (i and j) respectively represent the current and next population states:




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   326   327   328   329   330   331   332   333   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish