Path planning and obstacle avoidance for auv: a review

Download 1,78 Mb.

Pdf ko'rish

bet	15/24
Sana	01.01.2022
Hajmi	1,78 Mb.
	#302313

1 ... 11 12 13 14 15 16 17 18 ... 24

Bog'liq
OceanEngineering2021-cheng

̂

𝐵

to balance exploration and exploitation. For example, when obstacles

are detected,

̂

𝐵

is set to be 1 for pure exploitation. Compared with

standard Q-learning, the probability of collision between AUV and

obstacles is reduced. Based on hierarchical reinforcement learning,

Sun

et al.

(

2020

) designed Hierarchical Deep Q Network for AUV path

planning. The obstacle avoidance and the target approaching are set

as two subtasks to obtain different selection strategies. In addition, the

combination of Hierarchical Deep Q Network and priority experience

replay improves the learning efficiency of AUV. Moreover,

Cao et al.

(

2020

) proposed a potential field hierarchical reinforcement learning

approach to improve the cooperation efficiency of multi-AUV in a target

searching task. In their method, the multi-agent cooperative MAXQ al-

gorithm was used for hierarchical reinforcement learning (HRL) (

Cheng

et al.

2007

;

Li et al.

2010

;

Shen et al.

2006

) and a potential field was

used to automatically adjust parameters of HRL. The proposed method

was shown to be able to enable multi-AUV to successfully bypass the

dynamic and static obstacles and find the nearest target point to each

AUV in simulated experiments.

Compared with other mobile robots, when applying reinforcement

learning to AUV path planning, it usually uses the output of the learning

algorithm to select actions from the action space and directly control

the rudder, elevator and propeller of AUV. When the current exists in

the environment, the reinforcement learning algorithm will take the

current as one of the state inputs, and use continuous iterative learning

to better deal with the current. Reinforcement learning has proven to

be able to facilitate AUV to complete target search and navigation in

environments with unknown obstacles though mostly in simulations.

However, how to learn to find an optimal policy in tasks with large

state space and apply to a physical AUV platform still remain to be a

huge challenge.

4.6. Deep reinforcement learning

Deep reinforcement learning (DRL) combines the perception of

deep learning with the decision making of reinforcement learning. The

advantage of DRL is that it can use deep learning (e.g., a deep neural

network) to automatically learn low-dimensional state characteristics of

high-dimensional states by reducing dimensions through iterative inter-

action with the environment. It solves the limitations of reinforcement

learning caused by large state and action space (

Arulkumaran et al.

2017

). Deep reinforcement learning has opened up a new way to solve

the problem of learning from the complex nonlinear, high-dimensional

sensory input in unknown environments. DRL has been widely used in

obstacle avoidance of surface unmanned aerial (

Singla et al.

2019

;

Yan

et al.

2019

), and more and more researchers have started trying to

apply it to AUV path planning.

For example, in a target search task,

Cao et al.

(

2019

) applied

the asynchronous advantages actor–critic (A3C) method (

Mnih et al.

2016

) to obstacle avoidance of AUV. In their method, an A3C net-

work structure was used in which each thread independently interacts

with the environment. The learning results of all threads were col-

lected into a global actor–critic pair and combined with a dual stream

Q-network (

Simonyan and Zisserman

2014

). The network structure

is composed of multiple convolution layers and long-term and long

short-term memory (

Hochreiter and Schmidhuber

1997

). The input

information goes through Otsu method (

Gupta et al.

2018

), disk-

shaped structural, closing operation, disk-shaped structural, coordinate

system transformation and rasterization to remove the noise points

of the original sonar image. Their simulated experiments show that

AUV with A3C can effectively avoid obstacles in various environments

and complete the target search task efficiently. In addition,

Wu et al.

(

2019

) proposed an end-to-end AUV motion control framework based

on the Proximal Policy Optimization algorithm (

Schulman et al.

2017

which takes the original sonar sensory information as the input directly

and does not need to consider the dynamic characteristics of AUV.

In their experiments, the reward function takes multiple objectives

such as waypoint tracking, obstacle avoidance, collision penalty and

speed as constraints, making the algorithm more suitable for AUV

path planning in underwater dangerous environments full of obstacles.

In addition, in order to avoid the difficulty and noise of underwater

positioning, they proposed a new state encoder and reward shaping

strategy, which can realize learning without knowing the position of

AUV. Their results in detailed comparative experiments show that

AUV can complete the obstacle avoidance task in a 2D environment.

Download 1,78 Mb.

Do'stlaringiz bilan baham:

1 ... 11 12 13 14 15 16 17 18 ... 24