lr: Some float value where the learning rate lr >= 0. The learning rate
is a hyperparameter that determines how big of a step to take when
optimizing the loss function. The paper describes good results with a
value of 0.001 (the paper refers to the learning rate as
alpha).
•
beta_1: Some float value where 0 < beta_1 < 1. This is usually some
value close to 1, but the paper describes good results with a value of 0.9.
•
beta_2: Some float value where 0 < beta_2 < 1. This is usually some
value close to 1, but the paper describes good results with a value
of 0.999.
•
epsilon: Some float value where epsilon e >= 0. If None, then it
defaults to K.epsilon(). Epsilon is some small number, described as
10E-8 in the paper, to help prevent division by 0.
•
Do'stlaringiz bilan baham: |