π¦ Unicorn: A Universal and Collaborative Reinforcement Learning Approach Toward Generalizable Network-Wide Traffic Signal Control
Official implementation of Unicorn, accepted in IEEE Transactions on Intelligent Transportation Systems (T-ITS).
Yifeng Zhang,
Yilin Liu,
Ping Gong,
Peizhuo Li,
Mingfeng Fan,
Guillaume Sartoretti
MARMot Lab @ National University of Singapore
- Highlights
- Requirements
- Installation
- Project Structure
- Supported Datasets
- Configuration
- Training
- Testing
- Evaluation with Non-RL Baselines
- Citation
- License
- Unified Traffic Movement Representation: A traffic movement-based state-action representation that unifies intersection states and signal phases across different intersection topologies.
- Universal Traffic Representation (UTR) Module: A decoder-only feature extraction architecture with cross-attention, designed to capture general traffic features across different intersections.
- Intersection-Specific Representation (ISR) Module: A feature extraction module combining a Variational Autoencoder (VAE) and contrastive learning to capture intersection-specific characteristics.
- Collaborative Multi-Intersection Learning: An attention-based coordination mechanism that adaptively models state-action dependencies among neighboring intersections for scalable network-level signal control.
- Evaluation on Diverse Traffic Networks: Experiments conducted on eight traffic datasets in SUMO, including three synthetic traffic networks and five real-world city-scale networks, supporting both single-scenario training and multi-scenario joint training.
| Dependency | Version |
|---|---|
| Python | β₯ 3.8 |
| SUMO | β₯ 1.16.0 |
| PyTorch | 1.13.0 (CUDA 11.7) |
| Ray | 2.3.1 |
| Gym | 0.26.2 |
| SciPy | 1.10.1 |
| einops | 0.6.0 |
| NumPy | 1.24.2 |
| TensorBoard | 2.13.0 |
Note
Different PyTorch and CUDA versions may affect training performance and reproducibility. The code is tested with PyTorch 1.13.0 + CUDA 11.7.
git clone https://github.com/marmotlab/Unicorn.git
cd Unicorn# Create a new conda environment
conda create -n unicorn python=3.8 -y
conda activate unicornInstall PyTorch first, selecting the command that matches your CUDA version:
# CUDA 11.6
pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
# CUDA 11.7
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
# CPU only
pip install torch==1.13.0+cpu torchvision==0.14.0+cpu torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cpupip install -r requirements.txt-
Download and install SUMO by following the official instructions at: π https://sumo.dlr.de/docs/Downloads.php
-
Set the environment variable
SUMO_HOME. Refer to the SUMO Basic Computer Skills guide for detailed instructions on setting SUMO environment variables. -
Verify the installation:
sumo --versionUnicorn/
βββ π driver_unicorn.py # Main training script (gradient updates & PPO optimization)
βββ π runner_unicorn.py # Distributed experience collection via Ray workers
βββ π evaluator_rl.py # Evaluation script for RL-based models (Unicorn)
βββ π evaluator_non_rl.py # Evaluation script for non-RL baselines (Fixed, Greedy, Pressure)
βββ π parameters.py # All training, simulation & experiment configurations
βββ π utils.py # Utility functions
βββ π requirements.txt # Python dependencies
β
βββ π models/
β βββ Unicorn.py # Unicorn network architecture (Actor-Critic)
β
βββ π env/
β βββ matsc.py # Multi-Agent TSC Gym environment (SUMO interface)
β βββ tls.py # Traffic light signal controller module
β
βββ π maps/ # SUMO network datasets & configuration files
β βββ grid_network_5_5/ # Synthetic 5Γ5 Grid (MA2C)
β βββ monaco_network_30/ # Real-world Monaco 30 intersections (MA2C)
β βββ cologne_network_8/ # Real-world Cologne 8 intersections (RESCO)
β βββ ingolstadt_network_21/ # Real-world Ingolstadt 21 intersections (RESCO)
β βββ grid_network_4_4/ # Synthetic 4Γ4 Grid (RESCO)
β βββ arterial_network_4_4/ # Synthetic 4Γ4 Arterial (RESCO)
β βββ shaoxing_network_7/ # Real-world Shaoxing 7 intersections (GESA)
β βββ shenzhen_network_29/ # Real-world Shenzhen 29 intersections (GESA)
β βββ shenzhen_network_55/ # Real-world Shenzhen 55 intersections (GESA)
β
βββ π images/
βββ framework.png # Framework overview figure
Unicorn is evaluated on 8 SUMO traffic network scenarios from three benchmark suites:
| Benchmark | Network | # Intersections | Type |
|---|---|---|---|
| MA2C | grid_network_5_5 |
25 | Synthetic |
| MA2C | monaco_network_30 |
30 | Real-world |
| RESCO | cologne_network_8 |
8 | Real-world |
| RESCO | ingolstadt_network_21 |
21 | Real-world |
| RESCO | grid_network_4_4 |
16 | Synthetic |
| RESCO | arterial_network_4_4 |
16 | Synthetic |
| GESA | shaoxing_network_7 |
7 | Real-world |
| GESA | shenzhen_network_29 |
29 | Real-world |
References:
- MA2C: Chu, T., Wang, J., CodecΓ , L., & Li, Z. (2020). Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE T-ITS.
- RESCO: Ault, J., & Sharon, G. (2021). Reinforcement Learning Benchmarks for Traffic Signal Control. NeurIPS Datasets & Benchmarks.
- GESA: Jiang, H., et al. (2024). A General Scenario-Agnostic Reinforcement Learning for Traffic Signal Control. IEEE T-ITS.
All configurations are centralized in parameters.py. Below are the key parameters:
| Parameter | Description | Default |
|---|---|---|
MAX_EPISODES |
Total number of training episodes | 3000 |
NUM_META_AGENTS |
Number of parallel Ray worker processes | 6 |
LOAD_MODEL |
Whether to resume training from a checkpoint | False |
EXPERIMENT_PATH |
Path to the checkpoint experiment (for resume) | None |
CO_TRAIN |
Enable multi-scenario co-training | False |
Important
When switching between single-scenario and multi-scenario training modes, make sure to set CO_TRAIN accordingly in INPUT_PARAMS and configure the appropriate dataset(s).
In single-scenario mode, the model trains on one specific traffic network at a time. This is the default mode.
Step 1: Configure parameters.py:
class INPUT_PARAMS:
MAX_EPISODES = 3000 # Total training episodes
NUM_META_AGENTS = 6 # Number of parallel workers
CO_TRAIN = False # β¬
οΈ Set to False for single-scenario
class SUMO_PARAMS:
NET_NAME = 'grid_network_5_5' # β¬
οΈ Choose your target datasetStep 2: Launch training:
python driver_unicorn.pyTip
Recommended configurations by dataset:
| Dataset | Green / Yellow | Teleport Time |
|---|---|---|
| MA2C networks | 10s / 3s | 300 |
| RESCO networks | 15s / 5s | -1 |
| GESA networks | 15s / 5s | 600 |
In multi-scenario co-training mode, the model trains simultaneously across multiple traffic networks using different workers for different scenarios. This enables cross-domain generalization.
Step 1: Configure parameters.py:
class INPUT_PARAMS:
MAX_EPISODES = 3000
NUM_META_AGENTS = 6 # Each worker trains on a different scenario
CO_TRAIN = True # β¬
οΈ Set to True for multi-scenario
class SUMO_PARAMS:
ALL_DATASETS = [ # β¬
οΈ Define the scenarios to co-train on
'cologne_network_8',
'ingolstadt_network_21',
'arterial_network_4_4',
'grid_network_4_4',
'shaoxing_network_7',
'shenzhen_network_29',
]Note
In co-training mode:
- Each worker (indexed by
server_number) is assigned a different dataset fromALL_DATASETS. - The number of
NUM_META_AGENTSshould match the number of datasets inALL_DATASETS. - The observation and action spaces are automatically padded to the maximum dimensions across all scenarios (max movement dim = 36, max phase dim = 8, agent space = 97).
Step 2: Launch training:
python driver_unicorn.pyTraining logs are automatically recorded with TensorBoard:
tensorboard --logdir ./Train_MATSC/<EXPERIMENT_NAME>/trainKey metrics tracked:
- Policy Loss, Value Loss, Entropy Loss
- Actor/Critic VAE Loss & Contrastive Loss
- Episode Reward, Episode Length, Action Change Rate
To resume training from a saved checkpoint:
class INPUT_PARAMS:
LOAD_MODEL = True
EXPERIMENT_PATH = './Train_MATSC/<YOUR_EXPERIMENT_NAME>' # β¬
οΈ Path to saved experimentAfter training, evaluate the trained model on specific test scenarios.
Step 1: Configure the test settings in evaluator_rl.py:
# Set the experiment directory and model path
exp_dir = './Test'
agent_name_list = ['UNICORN']
model_path_list = ['./Train_MATSC/<EXPERIMENT_NAME>/model/checkpoint<EPISODE>.pkl']Step 2: Ensure the corresponding map and flow settings in parameters.py match the training configuration:
class SUMO_PARAMS:
NET_NAME = 'grid_network_5_5' # β¬
οΈ Must match the map used during trainingStep 3: Run evaluation:
python evaluator_rl.pyStep 4: After testing, the results (traffic data & trip data) will be saved in:
./Test/eval_data/
βββ <map_name>_UNICORN_traffic.csv # Traffic metrics per timestep
βββ <map_name>_UNICORN_trip.csv # Individual vehicle trip info
Unicorn includes built-in non-RL baseline evaluators for comparison:
| Baseline | Description |
|---|---|
FIXED |
Fixed-time signal plan |
GREEDY |
Greedy policy based on queue length |
PRESSURE |
Max-pressure based control |
Run baseline evaluations:
python evaluator_non_rl.pyConfigure the baselines in evaluator_non_rl.py:
agent_name_list = ['FIXED', 'GREEDY', 'PRESSURE']If you find this code useful in your research, please consider citing our paper:
@ARTICLE{11360985,
author={Zhang, Yifeng and Liu, Yilin and Gong, Ping and Li, Peizhuo and Fan, Mingfeng and Sartoretti, Guillaume},
journal={IEEE Transactions on Intelligent Transportation Systems},
title={Unicorn: A Universal and Collaborative Reinforcement Learning Approach Toward Generalizable Network-Wide Traffic Signal Control},
year={2026},
volume={},
number={},
pages={1-17},
keywords={Collaboration;Topology;Network topology;Feature extraction;Vectors;Urban areas;Real-time systems;Training;Reinforcement learning;Scalability;Generalizable adaptive traffic signal control;multi-agent reinforcement learning;contrastive learning},
doi={10.1109/TITS.2026.3653478}}
You may also find our related work useful:
@INPROCEEDINGS{10801524,
author={Zhang, Yifeng and Li, Peizhuo and Fan, Mingfeng and Sartoretti, Guillaume},
booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
title={HeteroLight: A General and Efficient Learning Approach for Heterogeneous Traffic Signal Control},
year={2024},
volume={},
number={},
pages={1010-1017},
keywords={Measurement;Network topology;Urban areas;Reinforcement learning;Feature extraction;Vectors;Robustness;Topology;Optimization;Intelligent robots},
doi={10.1109/IROS58592.2024.10801524}}
This project is licensed under the MIT License - see the LICENSE file for details.
Β© 2026 MARMot Lab @ NUS-ME
β If you find this project useful, please consider giving it a star! β
