The forward-backward algorithm
The forward-backward algorithm is commonly used to estimate all the probabilities of the
hidden states of the HMM for any sequence of observations. With these hidden state
probabilities various things are possible: for example, we can compare the overall
probabilities of different hidden state sequences (which may not be the best) for a set of
observations; this in turn can be useful to generate a probabilistic distribution of state
sequences rather than a single best answer, as we did with the Viterbi algorithm. It is in
keeping with the Bayesian approach to consider an ensemble of different solutions, so we
have a better idea of the error in the prediction and whether there are competing solutions
with similar probabilities.
This method works by expressing the probability of a state at a given position as the
product of two components, the forward and backward parts. The forward part is the
probability of the states for a given point in the chain given the sequence of observations
up to that point. The backward part is the probability of getting the observations for the
remainder of the chain after a point given a set of state probabilities. We will not describe
the mathematical derivation of the algorithm
12
but in essence the result stems from using
Bayes’ rule and the fact that, because of the no-memory property of the Markov chain, the
observations either side of a position are independent, given the condition that we know
the probabilities of the hidden states at that point.
The forward-backward algorithm itself is yet another example of dynamic
programming, which makes the process reasonably efficient to compute. Both the forward
and backward parts calculate probabilities for subsequent points in the Markov chain,
albeit from different directions, by using the previously calculated result from the
neighbouring position. This can be thought of as a recursion (although we don’t code the
Python in that way), where the result for each point extends the previous result and thus
includes all the results of all the calculations prior to that one.
The example Python function that implements the algorithm makes use of NumPy to
perform matrix algebra, to give a fairly compact result (avoiding loops), using the starting
probability matrix pStart, transition probability matrix pTrans and emission probability
matrix pEmit discussed before. Naturally these are inputs to the function, together with the
sequence of observations obs that we wish to estimate the probabilities for. Note that,
unlike in the Viterbi algorithm, we don’t work with logarithms and thus we will multiply
probabilities. Here we re-normalise as we go, to get relative probabilities, and so the
values don’t get especially small (unlike the probability of a long sequence).
from numpy import array, empty, identity, dot, ones
def forwardBackward(obs, pStart, pTrans, pEmit):
First in the function a few variables are initialised: the number of observations, the
number of hidden states and an identity matrix of an appropriate size.
n = len(obs)
nStates = len(pStart)
I = identity(nStates)
Next a matrix fwd is initialised as an empty NumPy array, which will hold the hidden
state probabilities for the forward part of the algorithm. Note that this array is one longer
than the number of observations, so that we can include the starting probabilities for the
states at position 0.
fwd = empty([n+1,nStates])
fwd[0] = pStart
Now comes the main loop for the forward part of the algorithm, where we go through
each index that references each observation from start to end. The forward probability is
calculated by applying the transition matrix to the previous (or starting) hidden state
probabilities and then applying the emission probabilities for the observation val, all using
dot() to do the matrix multiplication. Effectively we are saying that a hidden state
probability derives from the transitions of the previous states and the likelihood that the
state generates the observation. Note the use of the identity matrix I to give an array of the
right shape to apply the emission probabilities and that we set index [i+1], given that we
designed our matrix to begin with the starting probabilities, which we will need later,
rather than the result from the first observation. The division by fProb.sum() ensures that
the probabilities are normalised and so sum to 1.0; this implementation doesn’t bother
calculating the various probability scaling constants.
for i, val in enumerate(obs):
fProb = dot( pEmit[:,val]*I, dot(pTrans, fwd[i]) )
fwd[i+1] = fProb / fProb.sum()
With the forward part done, we next turn to the backward part. Here we define bwd,
which will hold the ‘backward’ hidden state probabilities. Unlike the forward part we
don’t need to remember the whole array of probabilities (although you could if it was
useful) so this is just a simple vector that will be updated for each subsequent step. The
starting values in bwd are 1.0, from which we can multiply to get subsequent probabilities.
The smooth array will be filled with the final result, which is the ‘smoothed’ combination
of the probabilities from the forward and backward calculations; the fwd and bwd
probabilities are multiplied and re-normalised. A separate loop is not needed to calculate
this as it can be filled in at the same time as bwd. Note the last vector of smooth
probabilities is set upfront as the last of the forward values, given that the bwd values here
are 1.0 and so multiplication would have no effect.
bwd = ones(nStates)
smooth = empty([n+1,nStates])
smooth[-1] = fwd[-1]
The backward loop goes through all the positions in reverse order and calculates bwd,
by applying the transition matrix to the matrix product of the emission probabilities for the
current observation obs[i] and the previous bwd (i.e. for the subsequent, i+1 point in the
sequence). Effectively we are saying that the new bwd, the probability of observations
(from this point to the end) given the preceding (i-1) hidden states, can be calculated
recursively. We multiply the old bwd by the emission probability of the current
observation and apply the transition matrix, adding an extra observation and transitioning
to the earlier state. The smooth probability matrix, which is passed back from the function,
is then the normalised product (element-wise) of the forward and backward components.
for i in range(n-1, -1, -1):
bwd = dot(pTrans, dot(pEmit[:,obs[i]]*I, bwd))
bwd /= bwd.sum()
prob = fwd[i] * bwd
smooth[i] = prob / prob.sum()
return smooth
Do'stlaringiz bilan baham: |