Fig. 11.
Illustration of an AUV path planning with reinforcement learning.
Source:
Modified from
Sutton and Barto
(
1998
).
proposed a path planning algorithm combining Q-learning (
Watkins
and Dayan
,
1992
), a teaching method and Bayesian network for non-
holonomic AUV. The teaching method includes intensive teaching of
suggestions for AUV choosing actions and global teaching of keeping a
distance from the target point in the whole learning process. Learning
experience is stored in a Bayesian network, which enables AUV to
deal with obstacles of any shape. In addition, the error caused by the
coupling of current and the yaw motion is taken as the input, and
the continuous iterative learning is used to better resist the current.
However, because of the influence of non-Markovian effect, the pro-
posed method is slow to converge. Therefore, they further proposed a
hierarchical reinforcement learning approach, in which the high level
is composed of a motion planning module considering the position of
AUV and the low level refers to the speed of AUV to stabilize the yaw
motion
Kawano and Ura
(
2002a
). As a result, the learning speed of
the algorithm is shown to be significantly improved. In consideration
of the high risk of trial and error,
Chen et al.
(
2009
) proposed to use
neural network and case-based Q learning (
Greenwald et al.
,
2003
)
for AUV path planning. The neural network with multi-layer error
feedback has a strong approximation ability which can improve the
generalization of Q learning, and case-based Q learning was used to
guarantee the convergence. With information provided by a multi-
beam forward sonar, the proposed method was shown to be able to
enable AUV to find an optimal path among multiple obstacles. In the
real-time obstacle avoidance of small AUVs, a single beam sonar is
used to measure information of obstacles in turn, and the steering
action is selected based on a reinforcement learning method (
Huang
et al.
,
2014
). When AUV approaches an obstacle, it gains a negative
enhancement, and when AUV moves away from the obstacle it receives
a positive reward. The simulation results show that AUV can safely
avoid the obstacles in the range of 90 degree open angle by learning to
control the change of propeller and course of AUV.
In the presence of dynamic obstacles,
Gore et al.
(
2019
) show
that by allowing AUV to obtain state information within a Markov
decision process, it can learn to take corresponding actions to obtain
the path with minimum deviation from obstacles.
Noguchi and Maki
(
2019
) applied SARSA(
𝜆
) to path planning of AUV, and they show
that it can find collision-free paths to capture sea urchins in com-
plex environments. In their method, considering the limitation and
fuzziness of the information obtained with sonar sensors, the map
based on occupancy probability is used to obtain the state information
of AUV. In addition,
Bhopale et al.
(
2019
) proposed a modified Q-
learning algorithm based on back-propagation neural network to deal
with unknown obstacles in the environment. The proposed method
overcomes the problem of dimensional disaster, and introduces a factor
Do'stlaringiz bilan baham: |