PyTorch torch.nn.Conv2d class is “used for applying a 2D convolution over an input tensor, typically used in image processing tasks within a neural network.”
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
- in_channels: Number of channels in the input tensor (e.g., 3 for an RGB image).
- out_channels: Number of channels in the output tensor (i.e., number of filters to use).
- kernel_size: Size of the filter to use. This can be an integer (for square filters) or a tuple (for rectangular filters).
- stride: The stride of the convolution operation. Can be a single integer or a tuple. Default is 1.
- padding: Zero-padding added to both sides of the input. Can be a single integer or a tuple. Default is 0.
- dilation: Spacing between kernel elements. Default is 1.
- groups: Controls the connections between inputs and outputs. Default is 1 (standard convolution). Use groups=in_channels for depthwise convolution.
- bias: If True, adds a learnable bias to the output. Default is True.
- padding_mode: Specifies the padding mode to use. Default is ‘zeros’.
Example 1: Basic Example
Here is a simple example where we define a 2D convolutional layer with a single input channel, a single output channel, and a kernel size of 3×3:
import torch import torch.nn as nn # Define the Conv2d layer conv = nn.Conv2d(1, 1, 3) input_tensor = torch.randn(1, 1, 5, 5) # Perform the convolution output_tensor = conv(input_tensor) print(output_tensor)
Example 2: Multiple channels
In this example, we have three input channels and 16 output channels, often seen in the first layer of a CNN dealing with RGB images:
import torch import torch.nn as nn # Define the Conv2d layer conv = nn.Conv2d(3, 16, 3) input_tensor = torch.randn(1, 3, 5, 5) # Perform the convolution output_tensor = conv(input_tensor) print(output_tensor)
Example 3: Stride and Padding
Here, we set stride and padding:
import torch import torch.nn as nn conv = nn.Conv2d(1, 1, 3, stride=2, padding=1) input_tensor = torch.randn(1, 1, 5, 5) # Perform the convolution output_tensor = conv(input_tensor) print(output_tensor)
tensor([[[[ 0.5023, 0.3969, 1.0621], [-0.2315, 0.3366, 0.1586], [ 0.3768, 0.4520, 0.3447]]]], grad_fn=<ConvolutionBackward0>)