Selected topic

Capsule Networks

Capsule Networks

Prefer practical output? Use related tools below while reading.

Key Idea:

In CNNs and RNNs, each pixel or time-step is treated as an independent input feature that contributes to the output in a fixed way. However, this can lead to issues such as:
  1. Translation invariance: Networks may struggle to recognize objects at different locations within an image.
  2. Scale invariance: Networks may have difficulty recognizing objects of varying sizes.
  3. Rotational invariance: Networks may not perform well with rotated or mirrored images.

Capsule Network Solution:

The Capsule Network architecture introduces a new way of representing and processing inputs:
  1. Vectors as capsules: Instead of treating each pixel or time-step as an independent input, Capsule Networks represent them as vectors (called "capsules") that contain information about the object's properties (e.g., orientation, scale).
  2. Routed connections: Each capsule is connected to a set of other capsules through dynamic routing algorithms, which allow the network to selectively focus on certain features or patterns.
  3. Hierarchical representation: Capsule Networks build hierarchical representations by combining lower-level capsules into higher-level ones.

How it Works:

Here's an example:

Suppose we want to recognize handwritten digits in images (MNIST dataset). We use a Capsule Network with the following architecture:

  1. Input layer: 28x28 image, divided into 64 capsules.
  2. Primary capsule layer: Each capsule represents a small region of the image (e.g., top-left corner).
  3. Secondary capsule layer: Higher-level capsules combine information from primary capsules to represent more abstract features (e.g., digit orientation).
The routing algorithm dynamically selects which lower-level capsules contribute to each higher-level capsule, allowing the network to focus on relevant features.

Example Code in PyTorch:

Here's a simplified example of a Capsule Network for MNIST:
python
import torch
import torch.nn as nn

class PrimaryCaps(nn.Module):
def __init__(self, num_capsules=64):
super(PrimaryCaps, self).__init__()
self.conv = nn.Conv2d(1, 8, kernel_size=9)
self.cap = nn.Linear(144, 16) # 144 input channels (7x7 receptive field)

def forward(self, x):
x = F.relu(self.conv(x))
x = x.view(-1, 144) # Flatten
return F.softmax(self.cap(x), dim=1)

class DigitCaps(nn.Module):
def __init__(self, num_capsules=10):
super(DigitCaps, self).__init__()
self.cap = nn.Linear(16, 16 * num_capsules)

def forward(self, x):
x = F.softmax(self.cap(x), dim=1)
return x

class CapsNet(nn.Module):
def __init__(self):
super(CapsNet, self).__init__()
self.primary_caps = PrimaryCaps()
self.digit_caps = DigitCaps()

def forward(self, x):
x = self.primary_caps(x)
return self.digit_caps(x)

model = CapsNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)


This code defines a Capsule Network with two layers: PrimaryCaps and DigitCaps. The forward method implements the routing algorithm, which dynamically selects which lower-level capsules contribute to each higher-level capsule.

Keep in mind that this is a simplified example, and real-world implementations may involve more complex architectures and training procedures.