Model Compression Guide

This guide explains how to use quantization and pruning to further compress your MobileStyleGAN models.

Overview

The compression module provides two main techniques:

Pruning: Removes less important weights from the model
- Unstructured pruning: Removes individual weights (faster inference, no size reduction)
- Structured pruning: Removes entire channels/filters (reduces file size)
- Sparse format: Converts pruned models to sparse storage (reduces file size)
Quantization: Reduces precision of model weights
- Static quantization: Post-training quantization (int8)
- Dynamic quantization: Weights quantized, activations float
- Quantization-Aware Training (QAT): Best accuracy, requires retraining

Quick Start

Basic Usage

# Compress with both pruning and quantization (recommended)
python compress_model.py \
    --cfg configs/mobile_stylegan_ffhq.json \
    --ckpt mobilestylegan_ffhq.ckpt \
    --method both \
    --amount 0.2 \
    --output mobilestylegan_compressed.ckpt

Unstructured Pruning (Default)

Removes individual weights. Speeds up inference but doesn't reduce file size.

python compress_model.py \
    --cfg configs/mobile_stylegan_ffhq.json \
    --ckpt mobilestylegan_ffhq.ckpt \
    --method prune \
    --prune-type unstructured \
    --amount 0.3 \
    --prune-method magnitude \
    --output mobilestylegan_pruned.ckpt

Structured Pruning

Removes entire channels/filters. Actually reduces model file size.

python compress_model.py \
    --cfg configs/mobile_stylegan_ffhq.json \
    --ckpt mobilestylegan_ffhq.ckpt \
    --method prune \
    --prune-type structured \
    --amount 0.3 \
    --prune-dim 0 \
    --output mobilestylegan_structured_pruned.ckpt

Options:

--prune-dim 0: Prune output channels (default)
--prune-dim 1: Prune input channels

Unstructured Pruning + Sparse Format

Convert pruned model to sparse format to reduce file size:

python compress_model.py \
    --cfg configs/mobile_stylegan_ffhq.json \
    --ckpt mobilestylegan_ffhq.ckpt \
    --method prune \
    --prune-type unstructured \
    --amount 0.4 \
    --sparse-format coo \
    --output mobilestylegan_sparse.ckpt

This creates:

mobilestylegan_sparse.ckpt - Regular checkpoint (for inference)
mobilestylegan_sparse_sparse.ckpt - Sparse format (smaller file size)

⚠️ IMPORTANT:

Use the regular checkpoint (mobilestylegan_sparse.ckpt) for inference/generating images
DO NOT use the sparse checkpoint (mobilestylegan_sparse_sparse.ckpt) for inference - it will produce corrupted/grey images
The sparse checkpoint is only for storage/archival purposes to save disk space
Sparse tensors cannot be used directly in model inference and will corrupt the weights

Quantization Only

python compress_model.py \
    --cfg configs/mobile_stylegan_ffhq.json \
    --ckpt mobilestylegan_ffhq.ckpt \
    --method quantize \
    --backend fbgemm \
    --output mobilestylegan_quantized.ckpt

Parameters

Pruning Parameters

--amount: Fraction of weights to prune (0.0 to 1.0)
- 0.1 = 10% of weights removed
- 0.2 = 20% of weights removed (recommended starting point)
- 0.3 = 30% of weights removed
- Higher values may significantly impact quality
--prune-method: Pruning strategy
- magnitude: Removes smallest magnitude weights (recommended)
- random: Randomly removes weights (for comparison)

Quantization Parameters

--backend: Quantization backend
- fbgemm: For CPU inference (recommended for desktop)
- qnnpack: For mobile/ARM devices

Using Compressed Models

After compression, use the compressed model just like the original:

# Generate images with compressed model
python generate.py \
    --cfg configs/mobile_stylegan_ffhq.json \
    --ckpt mobilestylegan_compressed.ckpt \
    --device cpu \
    --output-path ./outputs \
    --batch-size 5 \
    --n-batches 10

Programmatic Usage

You can also use compression programmatically:

from core.distiller import Distiller
from core.utils import load_cfg, load_weights
from core.model_zoo import model_zoo

# Load model
cfg = load_cfg("configs/mobile_stylegan_ffhq.json")
distiller = Distiller(cfg)
ckpt = model_zoo("mobilestylegan_ffhq.ckpt")
load_weights(distiller, ckpt["state_dict"])

# Apply compression
distiller.compress_model(
    compression_type='both',
    prune_amount=0.2,
    prune_method='magnitude',
    backend='fbgemm'
)

# Get model statistics
stats = distiller.get_model_stats()
print(f"Model size: {stats['size_mb']:.2f} MB")
print(f"Sparsity: {stats['sparsity']*100:.2f}%")

Compression Strategies

Conservative (Quality Priority)

Pruning: 10-20%
Quantization: Static quantization only
Expected compression: 2-3x
Quality impact: Minimal

Balanced (Recommended)

Pruning: 20-30%
Quantization: Static quantization
Expected compression: 3-5x
Quality impact: Small, usually acceptable

Aggressive (Size Priority)

Pruning: 30-50%
Quantization: Static quantization
Expected compression: 5-10x
Quality impact: Noticeable, may require retraining

Iterative Pruning

For better results, you can use iterative pruning:

from core.compression import ModelCompressor

compressor = ModelCompressor(model)

# Prune gradually: 10% -> 20% -> 30% -> 40%
compressor.prune_iterative(
    amounts=[0.1, 0.2, 0.3, 0.4],
    method='magnitude',
    retrain_fn=retrain_function  # Optional: retrain between steps
)

Best Practices

Start Small: Begin with 10-20% pruning to assess quality impact
Test Quality: Always test compressed models on sample images
Iterative Approach: Use iterative pruning for better results
Retrain After Pruning: Consider fine-tuning after aggressive pruning
Backend Selection: Use fbgemm for CPU, qnnpack for mobile
Save Original: Keep original model for comparison

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Compression Guide

Overview

Quick Start

Basic Usage

Unstructured Pruning (Default)

Structured Pruning

Unstructured Pruning + Sparse Format

Quantization Only

Parameters

Pruning Parameters

Quantization Parameters

Using Compressed Models

Programmatic Usage

Compression Strategies

Conservative (Quality Priority)

Balanced (Recommended)

Aggressive (Size Priority)

Iterative Pruning

Best Practices

Troubleshooting

Quantization Errors

Pruning Issues

Model Size Not Reduced

Technical Details

Pruning Implementation

Quantization Implementation

References

FilesExpand file tree

COMPRESSION_GUIDE.md

Latest commit

History

COMPRESSION_GUIDE.md

File metadata and controls

Model Compression Guide

Overview

Quick Start

Basic Usage

Unstructured Pruning (Default)

Structured Pruning

Unstructured Pruning + Sparse Format

Quantization Only

Parameters

Pruning Parameters

Quantization Parameters

Using Compressed Models

Programmatic Usage

Compression Strategies

Conservative (Quality Priority)

Balanced (Recommended)

Aggressive (Size Priority)

Iterative Pruning

Best Practices

Troubleshooting

Quantization Errors

Pruning Issues

Model Size Not Reduced

Technical Details

Pruning Implementation

Quantization Implementation

References