Selected topic

Variational Autoencoders

Variational Autoencoders

Prefer practical output? Use related tools below while reading.

Open developer tools Try JDE log analyzer Use OFDM simulator

Motivation

VAEs aim to solve the following problems:

Dimensionality reduction: VAEs can compress high-dimensional data into lower-dimensional representations while retaining most of the information.
Generative modeling: VAEs can generate new samples from the input data distribution.

Architecture

A VAE consists of two main components:

Encoder (E): Maps input data x to a latent representation z.
Decoder (D): Maps the latent representation z back to the original input space.

The encoder and decoder are both neural networks, typically implemented using a Variational Autoencoder architecture:

Variational Autoencoder Architecture

Encoder (E(x)): x → z

+ Input layer (e.g., 784 dimensions for MNIST) + Hidden layers (e.g., multiple fully connected layers with ReLU activation) + Output layer: latent space (mean and log variance of the normal distribution) μ and σ²

Decoder (D(z)): z → x

+ Input layer (latent space, e.g., 2 dimensions for a simple example) + Hidden layers (e.g., multiple fully connected layers with ReLU activation) + Output layer: reconstructed input data

Objective Function

The VAE is trained to maximize the Evidence Lower Bound (ELBO) of the log likelihood of the data. The ELBO can be written as:

ELBO = E[log p(x|z)] - KL[q(z|x)||p(z)]

where:

E[log p(x|z)]: reconstruction term, measures how well the VAE can reconstruct the input
KL[q(z|x)||p(z)]: Kullback-Leibler divergence between the approximate posterior distribution q(z|x) and the prior distribution p(z), encourages the VAE to learn a meaningful representation

Training

To train a VAE, we typically use stochastic gradient descent (SGD) with the following loss function:

Loss = -ELBO

The VAE is trained by minimizing this loss function.

Example

Suppose we have a dataset of 784x784 images from MNIST. We can implement a simple VAE using PyTorch:

python
import torch
import torch.nn as nn
class Encoder(nn.Module):
    def __init__(self, input_dim=784, hidden_dim=256, latent_dim=2):
        super(Encoder, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, latent_dim*2)  # mean and log variance
def forward(self, x):
        x = torch.relu(self.fc1(x))
        z_mean_logvar = self.fc2(x)
        return z_mean_logvar
class Decoder(nn.Module):
    def __init__(self, latent_dim=2, hidden_dim=256, output_dim=784):
        super(Decoder, self).__init__()
        self.fc1 = nn.Linear(latent_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, z):
        x = torch.relu(self.fc1(z))
        x = torch.sigmoid(self.fc2(x))
        return x
# Initialize the VAE
vae = VariationalAutoencoder(Encoder, Decoder)
# Train the VAE using SGD and ELBO as loss function
optimizer = torch.optim.Adam(vae.parameters(), lr=0.001)
for epoch in range(100):
    # Forward pass
    z_mean_logvar = vae.encoder(x)
    x_reconstructed = vae.decoder(z_mean_logvar)
    
    # Compute the ELBO
    elbo = -vae.loss_function(x, z_mean_logvar)
    
    # Backward pass and update parameters
    optimizer.zero_grad()
    elbo.backward()
    optimizer.step()# Example use case: generate new samples from the learned distribution
new_samples = vae.decoder(z_mean_logvar)

This example illustrates a basic VAE architecture for dimensionality reduction and generative modeling.

Download PDF Back to topic options Back to blog home