Table of Contents#
- Understanding Strides in NumPy Arrays
- What is
stride_tricks.as_strided? - Creating Non-Overlapping Blocks: Basic Workflow
- Common Issues: Overlap and Edge Cells
- Fixing Overlap: Correct Stride Calculation
- Excluding Edge Cells: Cropping for Clean Blocks
- Advanced Example: 3D Arrays and Real-World Use Cases
- Best Practices and Safety Tips
- Conclusion
- References
1. Understanding Strides in NumPy Arrays#
Before diving into as_strided, you need to understand strides. Strides are a tuple of integers that define how many bytes you must skip in memory to move from one element to the next in each dimension of an array.
For example, consider a 2D array arr = np.array([[0, 1, 2], [3, 4, 5]], dtype=np.int64):
- The array has shape
(2, 3)(2 rows, 3 columns). np.int64elements occupy 8 bytes each.- To move from
arr[0,0](0) toarr[0,1](1), you skip 8 bytes (1 element). - To move from
arr[0,0]toarr[1,0](3), you skip3 * 8 = 24bytes (3 elements, the entire first row).
Thus, arr.strides returns (24, 8): (row_stride, column_stride).
2. What is stride_tricks.as_strided?#
numpy.lib.stride_tricks.as_strided creates a new array view by redefining the shape and strides of the original array—without copying data. This makes it incredibly memory-efficient for large arrays.
The syntax is:
np.lib.stride_tricks.as_strided(x, shape, strides, writeable=True) x: Original array.shape: Shape of the new view.strides: Strides of the new view (bytes per step in each dimension).writeable: Whether the new view allows modifications (useFalsefor safety).
Warning: Incorrect shape or strides can cause the new view to access memory outside the original array, leading to crashes or garbage values. Always validate outputs!
3. Creating Non-Overlapping Blocks: Basic Workflow#
To split an array into non-overlapping blocks, we need to:
- Define block dimensions (e.g.,
(block_height, block_width)for 2D arrays). - Compute the new
shapeof the blocked view. - Compute the new
stridesto ensure blocks don’t overlap.
Step 1: Define Block Dimensions#
Let’s use a 2D array arr with shape (H, W) and block size (bh, bw) (block height, block width). For non-overlapping blocks, we assume H is divisible by bh and W by bw (we’ll handle edge cases later).
Step 2: Compute New Shape#
The blocked view will have 4 dimensions:
(num_blocks_height, num_blocks_width, block_height, block_width)
Where:
num_blocks_height = H // bh(number of blocks along height).num_blocks_width = W // bw(number of blocks along width).
Step 3: Compute New Strides#
To ensure non-overlapping blocks, the strides for the "block dimensions" must skip entire blocks. For a 2D array with original strides (row_stride, col_stride):
- Stride for
num_blocks_height:bh * row_stride(skipbhrows to start the next block row). - Stride for
num_blocks_width:bw * col_stride(skipbwcolumns to start the next block column). - Strides for the inner block dimensions: Reuse the original array’s strides (
row_stride,col_stride).
Example: 2D Array Split into 2x2 Blocks#
Let’s split a (4, 4) array into (2, 2) blocks:
import numpy as np
# Original array: 4x4 with values 0-15
arr = np.arange(16).reshape(4, 4)
print("Original Array:\n", arr, "\n")
# Block dimensions
bh, bw = 2, 2 # 2x2 blocks
H, W = arr.shape
# New shape: (num_blocks_h, num_blocks_w, bh, bw)
num_blocks_h = H // bh # 4//2 = 2
num_blocks_w = W // bw # 4//2 = 2
new_shape = (num_blocks_h, num_blocks_w, bh, bw)
# New strides: (bh*row_stride, bw*col_stride, row_stride, col_stride)
row_stride, col_stride = arr.strides
new_strides = (bh * row_stride, bw * col_stride, row_stride, col_stride)
# Create blocked view
blocks = np.lib.stride_tricks.as_strided(
arr, shape=new_shape, strides=new_strides, writeable=False
)
print("Non-Overlapping Blocks (shape: {})\n".format(blocks.shape))
for i in range(num_blocks_h):
for j in range(num_blocks_w):
print(f"Block ({i}, {j}):\n", blocks[i, j], "\n") Output:#
Original Array:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
Non-Overlapping Blocks (shape: (2, 2, 2, 2))
Block (0, 0):
[[0 1]
[4 5]]
Block (0, 1):
[[2 3]
[6 7]]
Block (1, 0):
[[ 8 9]
[12 13]]
Block (1, 1):
[[10 11]
[14 15]]
Perfect! The array is split into 4 non-overlapping 2x2 blocks.
4. Common Issues: Overlap and Edge Cells#
The basic workflow assumes H % bh == 0 and W % bw == 0. In practice:
- Overlap occurs if strides are miscalculated (e.g., not scaling by block size).
- Edge cells are leftover rows/columns when
HorWisn’t divisible bybh/bw(e.g., a 5x5 array split into 2x2 blocks leaves 1 row/column of edge cells).
5. Fixing Overlap: Correct Stride Calculation#
Overlap happens when the "block strides" (strides for num_blocks_height and num_blocks_width) are too small. For example, using the original array’s strides instead of scaling by block size.
Mistake Example: Overlapping Blocks#
If we incorrectly set new_strides = (row_stride, col_stride, row_stride, col_stride) (no scaling by bh/bw), the blocks will overlap:
# Bad: Overlapping strides
bad_strides = (row_stride, col_stride, row_stride, col_stride)
bad_blocks = np.lib.stride_tricks.as_strided(arr, shape=(3, 3, 2, 2), strides=bad_strides)
print("Overlapping Block (0,0):\n", bad_blocks[0,0])
print("Overlapping Block (0,1):\n", bad_blocks[0,1]) # Overlaps with Block (0,0)! Output:#
Overlapping Block (0,0):
[[0 1]
[4 5]]
Overlapping Block (0,1):
[[1 2]
[5 6]] # Overlaps with Block (0,0) (shares elements 1 and 5)!
Fix: Scale Strides by Block Size#
The fix is to scale the block strides by bh and bw, as in the basic workflow:
new_strides = (bh * row_stride, bw * col_stride, row_stride, col_stride) # Correct! 6. Excluding Edge Cells: Cropping for Clean Blocks#
Edge cells are rows/columns that don’t form full blocks (e.g., a 5x5 array with 2x2 blocks leaves 1 row and 1 column of edge cells). To exclude them, crop the original array to dimensions divisible by bh and bw.
Example: Excluding Edge Cells in a 5x5 Array#
Let’s split a 5x5 array into 2x2 blocks by cropping first:
# Original array: 5x5 (edge cells exist)
arr = np.arange(25).reshape(5, 5)
print("Original Array (5x5):\n", arr, "\n")
bh, bw = 2, 2
# Crop to (H_cropped, W_cropped) where H_cropped = (H // bh) * bh
H, W = arr.shape
H_cropped = (H // bh) * bh # 5//2=2 → 2*2=4
W_cropped = (W // bw) * bw # 5//2=2 → 2*2=4
cropped_arr = arr[:H_cropped, :W_cropped] # Keep first 4 rows/columns
# Now split into non-overlapping blocks
num_blocks_h = H_cropped // bh # 4//2=2
num_blocks_w = W_cropped // bw # 4//2=2
new_shape = (num_blocks_h, num_blocks_w, bh, bw)
new_strides = (bh * cropped_arr.strides[0], bw * cropped_arr.strides[1], cropped_arr.strides[0], cropped_arr.strides[1])
blocks = np.lib.stride_tricks.as_strided(cropped_arr, shape=new_shape, strides=new_strides)
print("Cropped Array (4x4):\n", cropped_arr, "\n")
print("Non-Overlapping Blocks (shape: {})\n".format(blocks.shape))
for i in range(num_blocks_h):
for j in range(num_blocks_w):
print(f"Block ({i}, {j}):\n", blocks[i, j], "\n") Output:#
Original Array (5x5):
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
Cropped Array (4x4):
[[ 0 1 2 3]
[ 5 6 7 8]
[10 11 12 13]
[15 16 17 18]]
Non-Overlapping Blocks (shape: (2, 2, 2, 2))
Block (0, 0):
[[0 1]
[5 6]]
Block (0, 1):
[[2 3]
[7 8]]
Block (1, 0):
[[10 11]
[15 16]]
Block (1, 1):
[[12 13]
[17 18]]
Edge cells (the 5th row/column) are excluded, leaving clean non-overlapping blocks.
7. Advanced Example: 3D Arrays and Real-World Use Cases#
as_strided works for higher-dimensional arrays too. For a 3D array (D, H, W) (e.g., a video with depth/frames, height, width) and block size (bd, bh, bw):
- New shape:
(num_blocks_depth, num_blocks_height, num_blocks_width, bd, bh, bw) - New strides:
(bd*D_stride, bh*H_stride, bw*W_stride, D_stride, H_stride, W_stride)
Use Case: Image Processing#
Splitting an image into non-overlapping patches for feature extraction:
from PIL import Image
# Load image (shape: (height, width, channels))
img = Image.open("image.jpg").convert("RGB")
arr = np.array(img) # Shape: (H, W, 3)
bh, bw = 64, 64 # 64x64 patches
H, W, C = arr.shape
# Crop to remove edge cells
H_cropped = (H // bh) * bh
W_cropped = (W // bw) * bw
cropped_arr = arr[:H_cropped, :W_cropped]
# Create patches
num_blocks_h = H_cropped // bh
num_blocks_w = W_cropped // bw
new_shape = (num_blocks_h, num_blocks_w, bh, bw, C)
new_strides = (bh * cropped_arr.strides[0], bw * cropped_arr.strides[1], cropped_arr.strides[0], cropped_arr.strides[1], cropped_arr.strides[2])
patches = np.lib.stride_tricks.as_strided(cropped_arr, shape=new_shape, strides=new_strides)
print("Image patches shape:", patches.shape) # (num_patches_h, num_patches_w, 64, 64, 3) 8. Best Practices and Safety Tips#
- Validate Shape/Strides: Always print the new
shapeandstridesto verify correctness. - Avoid Modifications: Set
writeable=Falseto prevent accidental data corruption (strided views share memory with the original array). - Handle Edge Cases: Crop arrays to divisible dimensions before blocking.
- Test with Small Arrays: Use small arrays (e.g., 4x4) to debug before scaling to large data.
9. Conclusion#
numpy.lib.stride_tricks.as_strided is a powerful tool for creating non-overlapping array blocks efficiently. By mastering strides, computing correct shape and strides, and handling edge cells via cropping, you can split large arrays into clean blocks for tasks like image processing, data chunking, and machine learning.
Remember: with great power comes great responsibility—always validate your strided views!