cyberangles blog

Types of Padding in Convolutional Layers

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision and other related areas. One of the fundamental operations in a CNN is the convolution, which involves sliding a filter (also known as a kernel) over an input feature map to produce an output feature map. Padding is a crucial technique used in the convolution layer to control the size of the output feature map and to ensure that important information at the edges of the input is not lost. In this blog post, we will explore the different types of padding used in convolutional layers, their properties, and their practical applications.

2026-06

Table of Contents#

  1. What is Padding?
  2. Types of Padding
    • No Padding (Valid Padding)
    • Same Padding
    • Full Padding
    • Other Padding Variants
  3. Common Practices
  4. Best Practices
  5. Example Usage
  6. Conclusion
  7. References

What is Padding?#

Padding is the process of adding extra pixels around the input feature map before performing the convolution operation. The extra pixels can be filled with different values, such as zeros, mirror reflections of the original pixels, or constant values. The main reasons for using padding are:

  • Control the output size: Without padding, the output feature map size is usually smaller than the input feature map size. Padding can be used to control the size of the output feature map, making it possible to keep the output size the same as the input size or even larger.
  • Preserve information at the edges: Pixels at the edges of the input feature map are used less frequently in the convolution operation compared to pixels in the center. Padding can help to preserve the information at the edges by ensuring that the filter can cover these pixels more effectively.

Types of Padding#

No Padding (Valid Padding)#

  • Explanation: When using no padding (also known as valid padding), the filter is only applied to positions where it fully overlaps with the input feature map. This means that no extra pixels are added around the input. As a result, the output feature map size is smaller than the input feature map size.
  • Formula for output size: If the input feature map has a size of (W_{in}\times H_{in}), the filter has a size of (F\times F), and the stride is (S), the output feature map size (W_{out}) and (H_{out}) are calculated as follows: [W_{out}=\left\lfloor\frac{W_{in}-F}{S}+1\right\rfloor] [H_{out}=\left\lfloor\frac{H_{in}-F}{S}+1\right\rfloor]
  • Example: Suppose we have an input feature map of size (5\times5) and a filter of size (3\times3) with a stride of (1). Using valid padding, the output feature map size will be (\left\lfloor\frac{5 - 3}{1}+1\right\rfloor=3), so the output feature map will be of size (3\times3).

Same Padding#

  • Explanation: Same padding is used to ensure that the output feature map size is the same as the input feature map size. To achieve this, extra pixels are added around the input feature map. The number of padding pixels is calculated based on the filter size and the stride.
  • Formula for padding amount: If the filter has a size of (F\times F) and the stride is (S), the amount of padding (P) on each side of the input is calculated as: [P=\frac{F - 1}{2}]
  • Example: For a filter of size (3\times3) with a stride of (1), the amount of padding (P=\frac{3 - 1}{2}=1). So, we add one pixel of padding around the input feature map. If the input feature map is of size (5\times5), after adding padding, the effective input size becomes (7\times7). When we apply the (3\times3) filter with a stride of (1), the output feature map size will be (\left\lfloor\frac{7 - 3}{1}+1\right\rfloor = 5), which is the same as the original input size.

Full Padding#

  • Explanation: Full padding adds the maximum amount of padding around the input feature map. This allows the filter to start and end at positions where it only partially overlaps with the input feature map. As a result, the output feature map size is larger than the input feature map size.
  • Formula for output size: If the input feature map has a size of (W_{in}\times H_{in}), the filter has a size of (F\times F), and the stride is (S), the output feature map size (W_{out}) and (H_{out}) are calculated as follows: [W_{out}=W_{in}+F - 1] [H_{out}=H_{in}+F - 1]
  • Example: For an input feature map of size (5\times5) and a filter of size (3\times3) with a stride of (1), the output feature map size using full padding will be (5 + 3 - 1=7), so the output feature map will be of size (7\times7).

Other Padding Variants#

  • Reflective Padding: In reflective padding, the extra pixels are filled with mirror reflections of the original pixels at the edges of the input feature map. This can be useful in cases where we want to preserve the local structure of the input at the edges.
  • Constant Padding: Constant padding fills the extra pixels with a constant value, such as zero. Zero-padding is the most common form of constant padding, and it is widely used in CNNs.

Common Practices#

  • Same Padding in Early Layers: In the early layers of a CNN, same padding is often used to preserve the spatial information of the input image. This helps the network to learn features at different scales without losing too much information at the edges.
  • Valid Padding in Later Layers: As the network progresses, valid padding may be used in later layers to reduce the spatial dimensions of the feature maps and increase the receptive field of the neurons.
  • Zero-padding: Zero-padding is the most commonly used form of padding in CNNs because it is simple to implement and has been shown to work well in practice.

Best Practices#

  • Choose Padding Based on the Task: The choice of padding should be based on the specific task and the architecture of the CNN. For example, if the task requires preserving the spatial information of the input, same padding may be a better choice. If the goal is to reduce the spatial dimensions and increase the receptive field, valid padding may be more appropriate.
  • Experiment with Different Padding Types: It is often a good idea to experiment with different padding types to see which one works best for a particular task. This can help to find the optimal configuration for the CNN.
  • Understand the Impact on Output Size: Make sure to understand how the choice of padding affects the output size of the feature maps. This is important for designing the architecture of the CNN and ensuring that the dimensions of the feature maps are consistent throughout the network.

Example Usage#

Here is an example of using different types of padding in PyTorch:

import torch
import torch.nn as nn
 
# Input feature map of size (1, 1, 5, 5)
input_tensor = torch.randn(1, 1, 5, 5)
 
# Define a 3x3 convolutional layer
conv_layer_valid = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=0)  # Valid Padding
conv_layer_same = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=1)  # Same Padding
conv_layer_full = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=2)  # Approximation of Full Padding
 
# Apply convolution with different padding types
output_valid = conv_layer_valid(input_tensor)
output_same = conv_layer_same(input_tensor)
output_full = conv_layer_full(input_tensor)
 
print("Output size with valid padding:", output_valid.shape)
print("Output size with same padding:", output_same.shape)
print("Output size with full padding:", output_full.shape)

In this example, we create a (5\times5) input feature map and apply a (3\times3) convolutional layer with different padding types. We then print the output sizes of the feature maps to see the effect of padding on the output size.

Conclusion#

Padding is an important technique in convolutional layers that allows us to control the size of the output feature map and preserve information at the edges of the input. By understanding the different types of padding and their properties, we can choose the most appropriate padding type for a particular task and design more effective CNN architectures. Experimenting with different padding types and understanding their impact on the output size is crucial for achieving good performance in CNNs.

References#

  • Goodfellow, I. J., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  • Chollet, F. (2017). Deep Learning with Python. Manning Publications.
  • PyTorch Documentation: https://pytorch.org/docs/stable/index.html