Optimizers
SGD
torch.optim.SGD()
This is the
stochastic gradient descent optimizer, a type of algorithm that aids in
the backpropagation process by adjust the weights. It is commonly used as a training
algorithm in a variety of machine learning applications, including neural networks.
This function has several parameters:
•
params: Some iterable of parameters to optimize, or dictionaries with
parameter groups. This can be something like model.parameters().
•
lr: A float value specifying the learning rate.
•
momentum: (Optional) Some float value specifying the momentum
factor. This parameter helps accelerate the optimization steps in the
direction of the optimization, and helps reduce oscillations when
the local minimum is overshot (refer to Chapter
3
to refresh your
understanding on how a loss function is optimized). Default = 0.
•
weight_decay: A l2_penalty for weights that are too high, helping
incentivize smaller model weights. Default = 0.
•
dampening: The dampening factor for momentum. Default = 0.
•
nesterov: A Boolean value to determine whether or not to apply
Nesterov momentum. Nesterov momentum is a variation of
momentum where the gradient is computed not from the current
position, but from a position that takes into account the momentum.
This is because the gradient always points in the right direction,
but the momentum might carry the position too far forward and
overshoot. Since this doesn’t use the current position but instead
some intermediate position that takes into account momentum, the
gradient from that position can help correct the current course so
that the momentum doesn’t carry the new weights too far forward.
It essentially helps for more accurate weight updates and helps
converge faster. Default = False.
appendix B intro to pytorch
391
Adam
torch.optim.Adam()
The Adam optimizer is an algorithm that extends upon SGD. It has grown quite
popular in deep learning applications in computer vision and in natural language
processing.
This function has several parameters:
•
params: Some iterable of parameters to optimize, or dictionaries
with parameter groups. This can be something like model.
parameters().
•
lr: A float value specifying the learning rate. Default = 0.001 (or 1e-3).
•
betas: (Optional) A tuple of two floats to define the beta values
beta_1 and beta_2. The paper describes good results with (0.9, 0.999)
respectively, which is also the default value.
•
Do'stlaringiz bilan baham: |