Selected topic

Capsule Networks

Capsule Networks

Prefer practical output? Use related tools below while reading.

Open developer tools Try JDE log analyzer Use OFDM simulator

Key Idea:

In CNNs and RNNs, each pixel or time-step is treated as an independent input feature that contributes to the output in a fixed way. However, this can lead to issues such as:

Translation invariance: Networks may struggle to recognize objects at different locations within an image.
Scale invariance: Networks may have difficulty recognizing objects of varying sizes.
Rotational invariance: Networks may not perform well with rotated or mirrored images.

Capsule Network Solution:

The Capsule Network architecture introduces a new way of representing and processing inputs:

Vectors as capsules: Instead of treating each pixel or time-step as an independent input, Capsule Networks represent them as vectors (called "capsules") that contain information about the object's properties (e.g., orientation, scale).
Routed connections: Each capsule is connected to a set of other capsules through dynamic routing algorithms, which allow the network to selectively focus on certain features or patterns.
Hierarchical representation: Capsule Networks build hierarchical representations by combining lower-level capsules into higher-level ones.

How it Works:

Here's an example:

Suppose we want to recognize handwritten digits in images (MNIST dataset). We use a Capsule Network with the following architecture:

Input layer: 28x28 image, divided into 64 capsules.
Primary capsule layer: Each capsule represents a small region of the image (e.g., top-left corner).
Secondary capsule layer: Higher-level capsules combine information from primary capsules to represent more abstract features (e.g., digit orientation).

The routing algorithm dynamically selects which lower-level capsules contribute to each higher-level capsule, allowing the network to focus on relevant features.

Example Code in PyTorch:

Here's a simplified example of a Capsule Network for MNIST:

python
import torch
import torch.nn as nnclass PrimaryCaps(nn.Module):
    def __init__(self, num_capsules=64):
        super(PrimaryCaps, self).__init__()
        self.conv = nn.Conv2d(1, 8, kernel_size=9)
        self.cap = nn.Linear(144, 16)  # 144 input channels (7x7 receptive field)
def forward(self, x):
        x = F.relu(self.conv(x))
        x = x.view(-1, 144)  # Flatten
        return F.softmax(self.cap(x), dim=1)
class DigitCaps(nn.Module):
    def __init__(self, num_capsules=10):
        super(DigitCaps, self).__init__()
        self.cap = nn.Linear(16, 16 * num_capsules)
def forward(self, x):
        x = F.softmax(self.cap(x), dim=1)
        return x
class CapsNet(nn.Module):
    def __init__(self):
        super(CapsNet, self).__init__()
        self.primary_caps = PrimaryCaps()
        self.digit_caps = DigitCaps()
def forward(self, x):
        x = self.primary_caps(x)
        return self.digit_caps(x)model = CapsNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

This code defines a Capsule Network with two layers: PrimaryCaps and DigitCaps. The forward method implements the routing algorithm, which dynamically selects which lower-level capsules contribute to each higher-level capsule.

Keep in mind that this is a simplified example, and real-world implementations may involve more complex architectures and training procedures.

Download PDF Back to topic options Back to blog home