CVI-RL: Characteristic Value Iteration for Reinforcement Learning

Frequency-domain reinforcement learning using characteristic functions on tabular MDPs.

Overview

This repository implements Characteristic Value Iteration (CVI), a novel approach to RL that operates in the frequency domain. Instead of computing scalar value functions, CVI represents return distributions via their characteristic functions φ(ω) = E[e^(iωG)], enabling distributional policy evaluation through CF Bellman operators.

Installation

# Clone the repository
cd cvi_rl

# Install dependencies
pip install -r requirements.txt

Experiment Suite (`experiment_suite.ipynb`)

The notebook contains a comprehensive hyperparameter grid search evaluating 360 CVI configurations on Taxi-v3:

Variables tested:
- Grid strategies: uniform, two_density_regions, three_density_regions, four_density_regions, exponential_decay, linear_decay, quadratic_decay
- Frequency ranges (W): 10.0, 20.0
- Grid sizes (K): 128, 256, 512
- Interpolation methods: linear, polar, pchip, lanczos
- Collapse methods: ls, fft, gaussian
Paradigm-Shifting Discovery: Method combination matters FAR more than grid strategy or hyperparameters
- polar + gaussian: 33 of top 40 configs (82.5%) achieve MAE ~10⁻¹⁵ (machine precision)
- Works with ALL grid strategies - even uniform achieves excellent results with this combination
- FFT collapse: Catastrophically bad (MAE = 7-27) regardless of grid strategy or interpolation
- Grid strategy choice is nearly irrelevant when methods are correct
Key Results:
- Best config: four_density_regions + polar + gaussian (MAE = 1.27×10⁻¹⁵)
- All grids work with polar+gaussian: linear_decay (1.99×10⁻¹⁵), quadratic_decay (2.19×10⁻¹⁵), three_density_regions (2.26×10⁻¹⁵), even uniform (3.18×10⁻¹⁴)
- All grids fail with FFT or lanczos+LS: errors of 6-27 regardless of strategy
- Hyperparameters barely matter: K=128 with polar+gaussian outperforms K=512 with FFT by 15 orders of magnitude

Grid Strategy Comparison

The choice of ω-grid has surprisingly minimal impact when proper methods are used. Below shows the distribution of grid points for each strategy:

Key insight: The experiments reveal that method combination (interpolation + collapse) is THE critical factor, not grid strategy. All seven grid strategies achieve MAE ~10⁻¹⁵ when paired with polar interpolation and Gaussian collapse. Conversely, even sophisticated multi-density grids fail catastrophically (MAE ~10+) when paired with FFT collapse. The universal recipe for success: polar + gaussian + any reasonable grid. This combination exploits the natural structure of characteristic functions to achieve results at the numerical precision limit.

Key Methods Implemented

Interpolation (for V(s, γω))

Polar: Magnitude/phase interpolation (best peak performance, appears in all top configs)
Linear: Cartesian interpolation of real/imaginary parts (equally good average performance)
PCHIP: Monotonicity-preserving cubic (equally good average performance)
Lanczos: Windowed sinc interpolation (underperforms despite theoretical advantages)

Collapse (extracting mean from CF)

Gaussian: Phase unwrapping with linear regression (best performer, 96% of excellent configs)
LS: Least-squares quadratic fit around ω=0 (solid alternative)
FFT: Inverse Fourier transform to spatial domain (unreliable)

Documentation

Comprehensive documentation is available in the docs/ directory:

STATE.md: Overall implementation status, design choices, experimental findings, and known limitations
GRIDS.md: Grid construction methods (uniform, two/three/four density regions, exponential/linear/quadratic decay)
INTERPOLATION.md: Interpolation methods for evaluating V(s, γω) at off-grid frequencies
COLLAPSE.md: Collapse methods for extracting scalar Q-values from characteristic functions

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
config		config
cvi_rl		cvi_rl
docs		docs
figures		figures
mila		mila
.gitignore		.gitignore
CVI(NeurIPS2019)(extended).pdf		CVI(NeurIPS2019)(extended).pdf
PLAN_OF_ACTION.md		PLAN_OF_ACTION.md
README.md		README.md
compare_policies.py		compare_policies.py
cvi_gridsearch_taxi.csv		cvi_gridsearch_taxi.csv
experiment_suite.ipynb		experiment_suite.ipynb
grid_comparison.png		grid_comparison.png
gridsearch_plot.py		gridsearch_plot.py
main.py		main.py
process_experiments.py		process_experiments.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_experiments.ipynb		run_experiments.ipynb
train.py		train.py
uv.lock		uv.lock
visualize_grids.py		visualize_grids.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVI-RL: Characteristic Value Iteration for Reinforcement Learning

Overview

Installation

Experiment Suite (`experiment_suite.ipynb`)

Grid Strategy Comparison

Key Methods Implemented

Interpolation (for V(s, γω))

Collapse (extracting mean from CF)

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CVI-RL: Characteristic Value Iteration for Reinforcement Learning

Overview

Installation

Experiment Suite (experiment_suite.ipynb)

Grid Strategy Comparison

Key Methods Implemented

Interpolation (for V(s, γω))

Collapse (extracting mean from CF)

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Experiment Suite (`experiment_suite.ipynb`)

Packages