QuantEcon Benchmarks

A collection of benchmarks and diagnostic scripts for profiling numerical computing performance across different hardware configurations.

Overview

This repository contains benchmarks and diagnostic tools developed during QuantEcon's work on GPU-accelerated lecture builds. These scripts help identify performance characteristics and potential issues when running numerical code on different hardware (CPU vs GPU).

Repository Structure

benchmarks/
├── jax/                    # JAX-specific benchmarks
│   ├── lax_scan/          # lax.scan performance analysis
│   └── matmul/            # Matrix multiplication benchmarks
├── hardware/              # Hardware detection and general benchmarks
├── notebooks/             # Jupyter notebook benchmarks
└── docs/                  # Documentation and findings

Key Findings

lax.scan GPU Performance Issue

When running lax.scan with millions of lightweight iterations on GPU, performance can be 1000x+ slower than CPU due to kernel launch overhead:

Each iteration launches 3 separate GPU kernels (mul, add, dynamic_update_slice)
Each kernel launch has ~2-3µs overhead
With 10M iterations: 3 kernels × 10M × ~3µs ≈ 90 seconds of overhead

Solution: Use device=cpu for sequential scalar operations:

from functools import partial
import jax

cpu = jax.devices("cpu")[0]

@partial(jax.jit, static_argnums=(1,), device=cpu)
def sequential_operation(x0, n):
    # ... lax.scan code ...

Usage

Running lax.scan Profiler

# Basic timing comparison
python jax/lax_scan/profile_lax_scan.py

# With diagnostic output showing per-iteration overhead
python jax/lax_scan/profile_lax_scan.py --diagnose

# With NVIDIA Nsight Systems profiling
nsys profile -o lax_scan_profile python jax/lax_scan/profile_lax_scan.py --nsys

# With JAX profiler (view with TensorBoard)
python jax/lax_scan/profile_lax_scan.py --jax-profile
tensorboard --logdir=/tmp/jax-trace

Running Hardware Benchmarks

python hardware/benchmark_hardware.py

Requirements

Python 3.10+
JAX (with CUDA support for GPU benchmarks)
NumPy
Numba (optional, for Numba benchmarks)

For GPU profiling:

NVIDIA Nsight Systems
TensorBoard with profile plugin

Contributing

When adding new benchmarks:

Place them in the appropriate category directory
Include clear documentation of what the benchmark measures
Add usage instructions to the script's docstring
Update this README with any significant findings

References

JAX Issue #2491 - lax.scan GPU performance
QuantEcon PR #437 - Original investigation

License

BSD-3-Clause (same as QuantEcon)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
hardware		hardware
jax/lax_scan		jax/lax_scan
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

QuantEcon Benchmarks

Overview

Repository Structure

Categories

JAX Benchmarks (`jax/`)

Hardware Benchmarks (`hardware/`)

Notebook Benchmarks (`notebooks/`)

Key Findings

lax.scan GPU Performance Issue

Usage

Running lax.scan Profiler

Running Hardware Benchmarks

Requirements

Contributing

References

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

License

QuantEcon/benchmarks

Folders and files

Latest commit

History

Repository files navigation

QuantEcon Benchmarks

Overview

Repository Structure

Categories

JAX Benchmarks (jax/)

Hardware Benchmarks (hardware/)

Notebook Benchmarks (notebooks/)

Key Findings

lax.scan GPU Performance Issue

Usage

Running lax.scan Profiler

Running Hardware Benchmarks

Requirements

Contributing

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

JAX Benchmarks (`jax/`)

Hardware Benchmarks (`hardware/`)

Notebook Benchmarks (`notebooks/`)

Packages