Ocean Engineering 235 (2021) 109355
14
C. Cheng et al.
good results. However, the fuzzy rules are usually defined based on
the expert’s experience and cannot adapt to the environment. In com-
plex uncertain underwater environment, there is no prior knowledge
available and the construction of fuzzy rules would be difficult or even
impossible. The path planning of AUV with reinforcement learning (RL)
can plan an optimal path by interaction with the environment without
any prior knowledge. Therefore, AUV with reinforcement learning can
adapt flexibly and work well in complex and uncertain environments.
AUV with deep reinforcement learning (DRL) can even learn in high-
dimensional and complex environments from raw sensory input data in
an end-to-end way.
However, RL and DRL is sample inefficient and slow to converge
because of the scarce reward signals (
Goecks et al.
,
2020
). It is difficult
or even unpractical to design an efficient reward function for each task,
which makes applying traditional RL and DRL methods directly to path
planning of physical AUVs a great challenge (
Riedmiller et al.
,
2018
).
Nevertheless, sampling in a simulated environment is faster, cheaper
and safer than learning directly in the real world, but using the policy
trained in simulation directly in the real AUV is difficult and risky since
there is a gap between simulation and reality (
Kober and Peters
,
2014
).
Many sim-to-real algorithms have been proposed to solve this problem,
such as domain adaption (
Tzeng et al.
,
2015
), inverse dynamics model
(
Christiano et al.
,
2016
), domain randomization (
Tobin et al.
,
2017
)
and progressive network (
Shojania and Li
,
2007
), etc., but there seems
no work on path planning of AUV in terms of this prospect yet (
Zhao
et al.
,
2020
).
On the other hand, some researchers proposed to leverage human’s
knowledge to speed up AUV’s learning, e.g., by allowing a human
trainer to provide demonstrations, evaluative feedback (
Li et al.
,
2019a
)
etc. For example,
Chu et al.
(
2020
) proposed a deep imitation reinforce-
ment learning (DIRL) for motion control of the unmanned underwater
vehicles (UUVs). DIRL combines imitation learning from expert demon-
strations and used the learned policy to initialize the TD3 algorithm
(
Fujimoto et al.
,
2018
). In addition,
Zhang et al.
(
2020
) proposed
deep interactive reinforcement learning for AUV path tracking task
by allowing a human trainer to transfer her knowledge via delivering
evaluative feedback over the quality of AUV’s actions. Therefore, how
to make full use of human experience and knowledge to improve AUV
path planning would be an interesting research direction.
5.3. Combination of different path planning algorithms
The above surveyed path planning methods all have their own ad-
vantages and disadvantages in specific application scenarios. It would
be immensely useful to combine multiple path planning algorithms,
which can complement each other and better deal with unknown dy-
namic obstacles and complex situations. For instance, the combination
of fuzzy logic algorithm and reinforcement learning can improve the
control accuracy of the system in strong current (
Yang et al.
,
2009
).
Adaptive neuro fuzzy inference system and particle swarm optimization
algorithm can be used together to generate feasible path in environ-
ments full of moving targets (
Yan et al.
,
2018b
). Quantum particle
swarm optimization algorithm with selective differential evolution can
significantly shorten the time to generate the best path (
Lim et al.
,
2020a
). In addition,
Yao and Zhao
(
2018
) used the improved genetic
algorithm combined with the gray wolf optimization algorithm (
Mir-
jalili et al.
,
2014
) to optimize the improved interfered fluid dynamical
(
Yao et al.
,
2015
) coefficient, and verified that the combination of path
planning algorithm and some mathematical methods can better deal
with dynamic obstacles.
Do'stlaringiz bilan baham: