PyTorch torch.optim.Adam() method is “used to implement the Adam algorithm that optimizes the parameters of a neural network model.”
Syntax
torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999),
eps=1e-08, weight_decay=0, amsgrad=False, *,
foreach=None, maximize=False, capturable=False,
differentiable=False, fused=None)
Parameters
-
params (iterable): Iterable of parameters to optimize or dicts defining parameter groups
-
lr (float, optional): Learning rate (default: 1e-3)
-
betas (Tuple[float, float], optional): Coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
- eps (float, optional): Term added to the denominator to improve numerical stability (default: 1e-8).
- weight_decay (float, optional): Weight decay (L2 penalty) (default: 0)
- amsgrad (bool, optional): Whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: False)
- foreach (bool, optional): Whether foreach implementation of the optimizer is used. If unspecified by the user (so foreach is None), we will try to use foreach over the for-loop implementation on CUDA since it is usually significantly more performant. (default: None)
- maximize (bool, optional): Maximize the params based on the objective instead of minimizing (default: False)
- capturable (bool, optional): Whether this instance is safe to capture in a CUDA graph. Passing True can impair ungraphed performance, so if you don’t intend to graph capture this instance, leave it False (default: False)
- differentiable (bool, optional): Whether autograd should occur through the optimizer step in training. Otherwise, the step() function runs in a torch.no_grad() context. Setting to True can impair the performance, so leave it False if you don’t intend to run autograd through this instance (default: False)
-
fused (bool, optional): Whether the fused implementation (CUDA only) is used. Currently, torch.float64, torch.float32, torch.float16, and torch.bfloat16 are supported. (default: None)
Example
Here’s a simple example of how to use torch.optim.Adam()
in PyTorch:
import torch
import torch.nn as nn
import torch.optim as optim
# Suppose we have a simple model
model = nn.Sequential(
nn.Linear(10, 5),
nn.ReLU(),
nn.Linear(5, 2),
)
# Suppose our data is a tensor of size (1, 10)
# and target is a tensor of size (1, 2)
data = torch.randn(1, 10)
target = torch.randn(1, 2)
# Define the criterion (loss function)
criterion = nn.MSELoss()
# Initialize the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)
# A single optimization step would look like this:
# Zero the gradients
optimizer.zero_grad()
# Forward pass
output = model(data)
# Calculate the loss
loss = criterion(output, target)
# Backward pass
loss.backward()
# Update the weights
optimizer.step()
Explanation
-
We first define a simple model, which could be any PyTorch model. The
nn.Linear
layers are just fully connected layers, andnn.ReLU
is a common activation function. - We create some dummy data and target.
- We define a loss function, which is used to measure how far the model’s predictions are from the target. We use Mean Squared Error (MSE) loss in this case, but this could be any PyTorch loss function.
- We initialize the optimizer. The first argument is the model parameters that should be optimized. The
lr
argument is the learning rate, which determines how much the weights are updated in each optimization step. -
We perform a single optimization step, which consists of:
- Zeroing the gradients is necessary because PyTorch accumulates gradients on subsequent backward passes.
- Performing a forward pass involves passing the data through the model and getting the output.
- Calculating the loss is the difference between the output and the target.
- Performing a backward pass involves calculating the loss’s gradients with respect to the model parameters.
-
Updating the weights, which is done by the optimizer.
That’s it.

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Machine Learning frameworks like PyTorch and Tensorflow is a testament to his versatility and commitment to the craft.