Beginning Anomaly Detection Using

Download 26,57 Mb.

Pdf ko'rish

bet	272/283
Sana	12.07.2021
Hajmi	26,57 Mb.
	#116397

1 ... 268 269 270 271 272 273 274 275 ... 283

Bog'liq
Beginning Anomaly Detection Using Python-Based Deep Learning

Optimizers

SGD

torch.optim.SGD()

This is the

stochastic gradient descent optimizer, a type of algorithm that aids in

the backpropagation process by adjust the weights. It is commonly used as a training

algorithm in a variety of machine learning applications, including neural networks.

This function has several parameters:

•

params: Some iterable of parameters to optimize, or dictionaries with

parameter groups. This can be something like model.parameters().

•

lr: A float value specifying the learning rate.

•

momentum: (Optional) Some float value specifying the momentum

factor. This parameter helps accelerate the optimization steps in the

direction of the optimization, and helps reduce oscillations when

the local minimum is overshot (refer to Chapter

to refresh your

understanding on how a loss function is optimized). Default = 0.

•

weight_decay: A l2_penalty for weights that are too high, helping

incentivize smaller model weights. Default = 0.

•

dampening: The dampening factor for momentum. Default = 0.

•

nesterov: A Boolean value to determine whether or not to apply

Nesterov momentum. Nesterov momentum is a variation of

momentum where the gradient is computed not from the current

position, but from a position that takes into account the momentum.

This is because the gradient always points in the right direction,

but the momentum might carry the position too far forward and

overshoot. Since this doesn’t use the current position but instead

some intermediate position that takes into account momentum, the

gradient from that position can help correct the current course so

that the momentum doesn’t carry the new weights too far forward.

It essentially helps for more accurate weight updates and helps

converge faster. Default = False.

appendix B intro to pytorch

391

Adam

torch.optim.Adam()

The Adam optimizer is an algorithm that extends upon SGD. It has grown quite

popular in deep learning applications in computer vision and in natural language

processing.

This function has several parameters:

•

params: Some iterable of parameters to optimize, or dictionaries

with parameter groups. This can be something like model.

parameters().

•

lr: A float value specifying the learning rate. Default = 0.001 (or 1e-3).

•

betas: (Optional) A tuple of two floats to define the beta values

beta_1 and beta_2. The paper describes good results with (0.9, 0.999)

respectively, which is also the default value.

•

Download 26,57 Mb.

Do'stlaringiz bilan baham:

1 ... 268 269 270 271 272 273 274 275 ... 283