After the gentle introduction, the next example illustrates a further improvement to the
Monte Carlo optimisation approach, giving what is termed the Metropolis-Hastings
approach. The Metropolis-Hastings algorithm is formally described in terms of a
probability distribution, so it fits well with optimisation problems that have a statistical
nature; it takes samples from a probability distribution. The example problem used here is
not actually probabilistic, but we can pretend that it is and the method will work well
nonetheless.
The algorithm works by estimating what is effectively a probability ratio, comparing
the value at a previous good point to a new test point. If the test point is better it is always
accepted, but if the test point is worse it may still be accepted, in a random way,
depending on how much worse it is. This is the key difference between the Metropolis-
Hastings algorithm and the previous one, which only accepted new points (solutions to the
problem) if they are better. By sometimes accepting worse points the algorithm has an
opportunity to jump out of what may be only a local optimum, to search for a better global
optimum. Nonetheless, the algorithm will still effectively home in on good solutions. This
approach is a way of taking samples from a probability distribution (or other function) and
is sometimes solely used for that purpose; finding the absolute optimum is not always the
objective. However, here we define a Monte Carlo function that does record the optimum
point found so far. Hence, in the example below we record bestPoint, the globally best
point so far, as well as prevPoint, which is the previous ‘good’ point that we are currently
searching from, but which may not be the absolute best.
The next example is more general than the previous one, and is defined as a Python
function rather than a bare loop. Thus we could use it for multiple different problems by
defining the test function and dimensionality appropriately. Compared to the full
Metropolis-Hastings algorithm the function has a simplification, given that we will
assume that the probability of a jump from the previous point to the test point is the same
as the reverse jump; the ‘proposal density’ is symmetric. For some problems this need not
be the case, but for this example it keeps things simple and means that the acceptance
probability (prob) is based only on the values of the function, and not on the way in which
search points are generated with a normal distribution.
The appropriate import of exp to calculate exponents is made and the function
monteCarlo is defined, taking arguments representing the number of search steps, the
function to test, the spread of the normal distribution, which generates the next test point,
and the number of dimensions of the problem (defaults to 2, i.e. for x and y axes).
from math import exp
def monteCarlo(numSteps, testFunc, spread=0.1, nDims=2):
The initial coordinates for the best point so far and for the previous test point (which
may not be the best) are defined, here as the same random vector, with coordinates in the
range −1.0 to 1.0 (although other values could be used). The corresponding value of the
function being optimised is calculated for these points. Initially the best and previous
points and values are the same, but this will change after the main search starts.
bestPoint = uniform(-1.0, 1.0, nDims)
prevPoint = bestPoint
bestValue = testFunc(bestPoint)
prevValue = bestValue
The main for loop is constructed for the specified number of steps. As before, the test
point is based on a previous one, using the normal() function and the input amount of
spread. The value is then simply the value of the input function at this test point.
for i in range(numSteps):
testPoint = normal(prevPoint, spread, nDims)
value = testFunc(testPoint)
The acceptance probability is the exponent of the difference between the test value and
the previous value; this is equivalent to the ratio between two probabilities (e
(a−b)
= e
a
/e
b
),
so we are effectively saying that the exponent of the value given by the test function is
proportional to the probability. Naturally, if proper probabilities are available for a
particular problem these should be used instead.
prob = exp(prevValue-value)
Next we test whether the ‘probability’ score is greater than a random number (in the
range 0.0 to 1.0). If the test point gives a value that is better than the previous point the
value of prob will be greater than 1.0, so this test will definitely be passed. If the prob is
less than 1.0 (the test point is not as good as the previous one) the value may be accepted,
although with less likelihood the further prob is from 1.0. Effectively this can be viewed
as uniform() generating a threshold, and if prob exceeds this the test point is accepted and
the previous point for the next cycle is redefined.
if prob > uniform():
prevPoint = testPoint
prevValue = value
A second check, subject to passing the first, determines whether the tested value is the
best overall so far inspected, and, if it is, this value and the corresponding point are
recorded. As before, we print out the best points to give an indication of optimisation
progress. Although, because the number of dimensions of the points in any given situation
can vary, a Python list comprehension converts the numbers to strings and is then joined
with ‘, ‘.join() to make the coordinates into a Python string. At the end of the function the
best point and its corresponding value are passed back. Alternatively, all of the accepted
points (prevPoint) could be recorded to give an indication of the ‘trajectory’ that the
Monte Carlo search took; this may be plotted using matplotlib.
if value < bestValue:
bestPoint = testPoint
bestValue = value
coordinates = ', '.join(['%.3f' % v for v in testPoint])
print('%5d [%s] value:%e' % (i, coordinates, value))
return bestValue, bestPoint
The function is simply tested on the Rosenbrock function using a large number of
points. It is worth experimenting to see how repeat runs locate the minimum (1.0,1.0);
some search paths locate the global minimum more quickly than others: