Python Programming for Biology: Bioinformatics and Beyond


Figure 25.3.  Trajectories of subsequent trials from Metropolis-Hastings Monte



Download 7,75 Mb.
Pdf ko'rish
bet420/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   416   417   418   419   420   421   422   423   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 25.3.  Trajectories of subsequent trials from Metropolis-Hastings Monte

Carlo searches to minimise a shallow function. In this plot the lighter the background

shade the smaller is the value of a two-dimensional test function. The trajectories of test

points that result from applying the Metropolis-Hastings Monte Carlo method are

visualised as darker lines. This method works by selecting subsequent test points (after a

random start) relative to the position of the previous point, but only accepting the new

solution if it is better and occasionally if it is worse, with a probability specified by the

Metropolis-Hastings criterion. In this way the routes of accepted points will follow the

function to its minimum but have occasional jumps, which may be important in

overcoming local minima.

The  travelling  salesman  problem  solely  involves  considering  the  distances  between

cities,  but  in  other,  more  biological,  situations  there  may  be  more  information  to  reduce

the complexity of the problem. For example, in nuclear magnetic resonance (NMR) it is a

common operation to assign signals that occur in a spectrum to the amino acid residues of



a  protein  chain.  The  problem  is  that  you  don’t  know  which  signal  corresponds  to  which

amino  acid  in  the  sequence.  Specific  NMR  experiments  allow  you  to  determine  which

signals  are  likely  to  be  neighbours  in  a  sequence  (although  not  unambiguously  so)  and

also what kinds of amino acid a signal could represent. Thus, although you can solve the

problem  is  the  same  basic  way  as  the  travelling  salesman,  to  determine  the  best  data

sequence  you  have  considerably  more  information  and  can  exclude  many  possibilities;

many data points are known to not be neighbours, and certain data points can be excluded

from particular locations in the sequence. Generalising for other biological problems, the

more information you have, to give better probabilities, the more robust the solution.

To  solve  the  travelling  salesman  problem  with  a  simple  Python  function  we  will  first

make  a  few  handy  imports  and  then  define  a  helper  function,  which  has  the  job  of

determining the total distance for a particular route between all the cities.

from math import sqrt, exp

from random import shuffle, randint

from numpy import array

The  function  getRouteLength()  takes  a  dictionary  containing  the  distances  between

points (cities in our example) and the route to calculate the distance for. It should be noted

that if the distances between all of the relevant points can be determined in advance then

this  will  generally  lead  to  a  quicker  program,  given  that  we  won’t  have  to  repeat  the

distance calculation. However, caching the distances in this way requires memory, which

may be insufficient for large problems. The innards of the function are fairly simple. The

initial  distance  is  initialised  as  zero,  then  we  go  through  each  pair  of  subsequent  points

visited in the route; the order of cities in the list defines the route. The inbuilt enumerate()

is used to extract the index and each location of the route at the same time. Note that this

loop only goes up to the penultimate point in the route (route[:-1]) so that we always have

room  to  fetch  the  next  point  in  a  pair:  route[i+1].  The  two  neighbouring  points  are  then

used as a key to look up the distance between them in the distanceData dictionary passed

in, which is then added to the total. If it is not practical to have a look-up table of distances

then  this  function  could  calculate  the  distances  from  the  data  points  themselves.

Construction  of  the  dictionary  of  city  distances  is  illustrated  in  the  function

calcCityDistances() discussed later.

def getRouteLength(distanceData, route):

distance = 0.0

for i, pointA in enumerate(route[:-1]):

pointB = route[i+1]

key = frozenset((pointA, pointB))

distance += distanceData[key]

return distance

Next  we  come  to  the  function  that  does  the  actual  solving  of  the  travelling  salesman

problem  using  the  Monte  Carlo  method.  The  general  idea  is  that  we  will  start  with  a

random  route  (order  of  cities)  and  then  swap  random  pairs  of  cities  on  the  route  to

generate the next test route. A test route will be accepted or rejected in the same manner




that we did for the previous function optimisation. Swapping pairs of cities is not the only

way  that  we  could  generate  new  routes  (states)  to  test:  for  example,  random  neighbours

could be swapped. However, swapping completely random points works well for the test

problem  and  generally  allows  a  wide  range  of  solutions  to  be  sampled.  Swapping  only

neighbours  gives  less  drastic  changes  and  may  be  better  at  refining  good  solutions,  but

will not be as quick to search a wide variety of routes. Subsequently this method that uses

random swaps will be refined, using simulated annealing, so that we get an initially wide

and unbiased search that subsequently narrows to a refined solution.

The function travellingSalesman() is defined as taking the distanceData dictionary that

provides the distance look-up, a list of the cities involved and the number of search steps

to use. Naturally, the number of steps to use will depend on the number of cities that are

considered and acceptable values may be determined by experimentation.

def travellingSalesman(distanceData, cities, numSteps=10000):

The  number  of  cities  in  the  problem  is  recorded  and  the  initial  bestRoute  list  is  a

randomly shuffled copy of the input list of cities.

n = len(cities)

bestRoute = cities[:]

shuffle(bestRoute)

The acceptance scores we use below will depend on the differences in route lengths, but

the  magnitude  of  these  distances  will  vary  according  to  the  situation  at  hand  (in  our

example we happen to be dealing with thousands of kilometres). Hence, so that it doesn’t

matter what the scale of the distances are, we calculate a normalising factor, which we call

scale. The heuristic we use here is to take half of the standard deviation of the inter-city

distances  as  the  scale  factor.  This  is  easily  calculated  with  the  function  std()  inbuilt  into

NumPy arrays:

dists = list(distanceData.values()) # list() not needed in Python 2

scale = 0.5 * array(dists).std()

The  corresponding  initial  bestDistance  uses  the  helper  function  getRouteLength()  and

the dictionary of distance information to calculate the length of the initial route. Copies of

the initial route and distance are then made, which will be the starting point for the Monte

Carlo search.

bestDistance = getRouteLength(distanceData, bestRoute)

prevRoute = bestRoute

prevDistance = bestDistance

A for  loop  is  defined  to  go  through  each  of  the  search  steps.  Two  indices  a  and  b  are

defined as random numbers, between zero and the index of the last city. These will be the

indices of the two cities that will be swapped in the route. The test route for this cycle is

initially defined as a copy of the previous ‘good’ route. Then the entries at the two random

indices are swapped; position a in the old route is b in the new one and vice versa.

for i in range(numSteps):




a = randint(0, n-1)

b = randint(0, n-1)

route = prevRoute[:]

route[a] = prevRoute[b]

route[b] = prevRoute[a]

With the test route defined, the length going through the cities is calculated. The score

(equivalent to a Metropolis-Hastings acceptance probability) is calculated as the exponent

of  the  difference  in  distances  between  the  test  route  and  the  prior  one,  divided  by  the

normalising distance scale factor. The result of this is that if a test route length is longer by

the value of scale there is an e

−1

(37.8%) chance of acceptance.



distance = getRouteLength(distanceData, route)

score = exp((prevDistance-distance)/scale)

The score is compared against a random number in the range 0.0 to 1.0. If the new route

is shorter the score will be greater than 1.0 and it will definitely be accepted, otherwise it

is  a  matter  of  chance  depending  on  how  much  longer  the  distance  is.  If  the  test  route  is

accepted it is recorded as prevRoute so that it will be the state to go from in the next cycle.

if score > uniform():

prevRoute = route

prevDistance = distance

A second test determines whether the distance is the overall shortest so far measured. If

so the route is remembered as the best one. The best distance and step number are printed

to indicate progress.

if distance < bestDistance:

bestRoute = route[:]

bestDistance = distance

print('%5d %.5f' % (i, distance))

return bestDistance, bestRoute

To  finally  test  the  function  some  distance  information  is  required.  Here  this  is  done

using a dictionary that contains the coordinates (latitude and longitude in degrees) of the

cities  concerned.  Of  course,  this  will  only  allow  us  to  calculate  approximate  distance

across  the  Earth’s  globe,  rather  than  practical  road  or  rail  distances.  Nonetheless,  this  is

good enough to illustrate the procedure.

cityCoords = {'Paris':(48.856667, 2.350833),

'Marseille':(43.296386, 5.369954),

'Lyon':(45.759723, 4.842223),

'Toulouse':(43.604503, 1.444026),

'Nice':(43.703393, 7.266274),

'Strasbourg':(48.584445, 7.748612),

'Nantes':(47.21806, -1.55278),

'Bordeaux':(44.838611, -0.578333),

'Montpellier':(43.61194, 3.87722),

'Rennes':(48.114722, -1.679444),

'Lille':(50.637222, 3.063333),



'Le Havre':(49.498889, 0.121111),

'Reims':(49.26278, 4.03472),

'Saint-Etienne':(45.434722, 4.390278),

'Toulon':(43.125, 5.930556)}

The  next  function  uses  the  city  coordinates  to  calculate  a  dictionary  containing  the

distances between each pair, hence we use two for loops to get all city pairs. The distance

metric derives from the length of the great arc between the cities, given that we are dealing

with positions on the surface of the Earth rather than a flat plane. Hence we first calculate

the central angle between the cities (in radians) and then multiply this by the radius of the

Earth. Note that the key to the distances dictionary is a Python frozenset object, thus the

key to get a distance value does not depend on the order of the two cities in the key. If we

used a Python tuple as a key then we would have to store both (cityA, cityB) and (cityB,

cityA) or always sort the key items. Here a simple Python set cannot be used, it must be

the frozen kind as dictionary keys cannot have modifiable values.

from math import acos, cos, sin, radians

def calcCityDistances(coordDict):

cities = list(coordDict.keys())

n = len(cities)

distances = {}

for i in range(n-1):

cityA = cities[i]

latA, longA = coordDict[cityA]

latA = radians(latA)

longA = radians(longA)

for j in range(i+1, n):

cityB = cities[j]

latB, longB = coordDict[cityB]

latB = radians(latB)

longB = radians(longB)

dLong = abs(longA - longB)

angle = acos(sin(latA)*sin(latB) + cos(latA)*cos(latB)*cos(dLong))

dist = angle * 6371.1 # Mean Earth radius (km)

key = frozenset((cityA, cityB))

distances[key] = dist

return distances

Using the distance information, the best route between the cities can hopefully be found

using the Monte Carlo method.

distances = calcCityDistances(cityCoords)

cities = list(cityCoords.keys()) # Use all the cities

dist, route = travellingSalesman(distances, cities, 1000000)

print('%.3f %s' % (dist, ', '.join(route)))



If all goes well, the test will give the following route, or its equally good reverse, with

the distance of 2465.56 km:

Strasbourg, Reims, Lille, Paris, Le Havre, Rennes, Nantes, Bordeaux,

Toulouse, Montpellier, Marseille, Toulon, Nice, Lyon, Saint-Etienne




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   416   417   418   419   420   421   422   423   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish