Skip to content

marmotlab/Unicorn

Repository files navigation

πŸ¦„ Unicorn: A Universal and Collaborative Reinforcement Learning Approach Toward Generalizable Network-Wide Traffic Signal Control

Paper arXiv License: MIT

Official implementation of Unicorn, accepted in IEEE Transactions on Intelligent Transportation Systems (T-ITS).
Yifeng Zhang, Yilin Liu, Ping Gong, Peizhuo Li, Mingfeng Fan, Guillaume Sartoretti
MARMot Lab @ National University of Singapore

Unicorn Framework

πŸ“‹ Table of Contents


Highlights

  • Unified Traffic Movement Representation: A traffic movement-based state-action representation that unifies intersection states and signal phases across different intersection topologies.
  • Universal Traffic Representation (UTR) Module: A decoder-only feature extraction architecture with cross-attention, designed to capture general traffic features across different intersections.
  • Intersection-Specific Representation (ISR) Module: A feature extraction module combining a Variational Autoencoder (VAE) and contrastive learning to capture intersection-specific characteristics.
  • Collaborative Multi-Intersection Learning: An attention-based coordination mechanism that adaptively models state-action dependencies among neighboring intersections for scalable network-level signal control.
  • Evaluation on Diverse Traffic Networks: Experiments conducted on eight traffic datasets in SUMO, including three synthetic traffic networks and five real-world city-scale networks, supporting both single-scenario training and multi-scenario joint training.

Requirements

Dependency Version
Python β‰₯ 3.8
SUMO β‰₯ 1.16.0
PyTorch 1.13.0 (CUDA 11.7)
Ray 2.3.1
Gym 0.26.2
SciPy 1.10.1
einops 0.6.0
NumPy 1.24.2
TensorBoard 2.13.0

Note

Different PyTorch and CUDA versions may affect training performance and reproducibility. The code is tested with PyTorch 1.13.0 + CUDA 11.7.


Installation

Step 1: Clone the Repository

git clone https://github.com/marmotlab/Unicorn.git
cd Unicorn

Step 2: Create Conda Environment

# Create a new conda environment
conda create -n unicorn python=3.8 -y
conda activate unicorn

Step 3: Install PyTorch

Install PyTorch first, selecting the command that matches your CUDA version:

# CUDA 11.6
pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116

# CUDA 11.7
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117

# CPU only
pip install torch==1.13.0+cpu torchvision==0.14.0+cpu torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cpu

Step 4: Install Other Dependencies

pip install -r requirements.txt

Step 5: Install SUMO Traffic Simulator

  1. Download and install SUMO by following the official instructions at: πŸ‘‰ https://sumo.dlr.de/docs/Downloads.php

  2. Set the environment variable SUMO_HOME. Refer to the SUMO Basic Computer Skills guide for detailed instructions on setting SUMO environment variables.

  3. Verify the installation:

sumo --version

Project Structure

Unicorn/
β”œβ”€β”€ πŸ“„ driver_unicorn.py         # Main training script (gradient updates & PPO optimization)
β”œβ”€β”€ πŸ“„ runner_unicorn.py         # Distributed experience collection via Ray workers
β”œβ”€β”€ πŸ“„ evaluator_rl.py           # Evaluation script for RL-based models (Unicorn)
β”œβ”€β”€ πŸ“„ evaluator_non_rl.py       # Evaluation script for non-RL baselines (Fixed, Greedy, Pressure)
β”œβ”€β”€ πŸ“„ parameters.py             # All training, simulation & experiment configurations
β”œβ”€β”€ πŸ“„ utils.py                  # Utility functions
β”œβ”€β”€ πŸ“„ requirements.txt          # Python dependencies
β”‚
β”œβ”€β”€ πŸ“‚ models/
β”‚   └── Unicorn.py               # Unicorn network architecture (Actor-Critic)
β”‚
β”œβ”€β”€ πŸ“‚ env/
β”‚   β”œβ”€β”€ matsc.py                 # Multi-Agent TSC Gym environment (SUMO interface)
β”‚   └── tls.py                   # Traffic light signal controller module
β”‚
β”œβ”€β”€ πŸ“‚ maps/                     # SUMO network datasets & configuration files
β”‚   β”œβ”€β”€ grid_network_5_5/        # Synthetic 5Γ—5 Grid (MA2C)
β”‚   β”œβ”€β”€ monaco_network_30/       # Real-world Monaco 30 intersections (MA2C)
β”‚   β”œβ”€β”€ cologne_network_8/       # Real-world Cologne 8 intersections (RESCO)
β”‚   β”œβ”€β”€ ingolstadt_network_21/   # Real-world Ingolstadt 21 intersections (RESCO)
β”‚   β”œβ”€β”€ grid_network_4_4/        # Synthetic 4Γ—4 Grid (RESCO)
β”‚   β”œβ”€β”€ arterial_network_4_4/    # Synthetic 4Γ—4 Arterial (RESCO)
β”‚   β”œβ”€β”€ shaoxing_network_7/      # Real-world Shaoxing 7 intersections (GESA)
β”‚   β”œβ”€β”€ shenzhen_network_29/     # Real-world Shenzhen 29 intersections (GESA)
β”‚   └── shenzhen_network_55/     # Real-world Shenzhen 55 intersections (GESA)
β”‚
└── πŸ“‚ images/
    └── framework.png            # Framework overview figure

Supported Datasets

Unicorn is evaluated on 8 SUMO traffic network scenarios from three benchmark suites:

Benchmark Network # Intersections Type
MA2C grid_network_5_5 25 Synthetic
MA2C monaco_network_30 30 Real-world
RESCO cologne_network_8 8 Real-world
RESCO ingolstadt_network_21 21 Real-world
RESCO grid_network_4_4 16 Synthetic
RESCO arterial_network_4_4 16 Synthetic
GESA shaoxing_network_7 7 Real-world
GESA shenzhen_network_29 29 Real-world

References:

  • MA2C: Chu, T., Wang, J., CodecΓ , L., & Li, Z. (2020). Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE T-ITS.
  • RESCO: Ault, J., & Sharon, G. (2021). Reinforcement Learning Benchmarks for Traffic Signal Control. NeurIPS Datasets & Benchmarks.
  • GESA: Jiang, H., et al. (2024). A General Scenario-Agnostic Reinforcement Learning for Traffic Signal Control. IEEE T-ITS.

βš™οΈ Configuration

All configurations are centralized in parameters.py. Below are the key parameters:

πŸ”Ή Input Parameters (INPUT_PARAMS)

Parameter Description Default
MAX_EPISODES Total number of training episodes 3000
NUM_META_AGENTS Number of parallel Ray worker processes 6
LOAD_MODEL Whether to resume training from a checkpoint False
EXPERIMENT_PATH Path to the checkpoint experiment (for resume) None
CO_TRAIN Enable multi-scenario co-training False

Important

When switching between single-scenario and multi-scenario training modes, make sure to set CO_TRAIN accordingly in INPUT_PARAMS and configure the appropriate dataset(s).


Training

πŸ”Έ Single-Scenario Training

In single-scenario mode, the model trains on one specific traffic network at a time. This is the default mode.

Step 1: Configure parameters.py:

class INPUT_PARAMS:
    MAX_EPISODES    = 3000        # Total training episodes
    NUM_META_AGENTS = 6           # Number of parallel workers
    CO_TRAIN        = False       # ⬅️ Set to False for single-scenario

class SUMO_PARAMS:
    NET_NAME        = 'grid_network_5_5'  # ⬅️ Choose your target dataset

Step 2: Launch training:

python driver_unicorn.py

Tip

Recommended configurations by dataset:

Dataset Green / Yellow Teleport Time
MA2C networks 10s / 3s 300
RESCO networks 15s / 5s -1
GESA networks 15s / 5s 600

πŸ”Έ Multi-Scenario Co-Training

In multi-scenario co-training mode, the model trains simultaneously across multiple traffic networks using different workers for different scenarios. This enables cross-domain generalization.

Step 1: Configure parameters.py:

class INPUT_PARAMS:
    MAX_EPISODES    = 3000
    NUM_META_AGENTS = 6           # Each worker trains on a different scenario
    CO_TRAIN        = True        # ⬅️ Set to True for multi-scenario

class SUMO_PARAMS:
    ALL_DATASETS    = [           # ⬅️ Define the scenarios to co-train on
        'cologne_network_8',
        'ingolstadt_network_21',
        'arterial_network_4_4',
        'grid_network_4_4',
        'shaoxing_network_7',
        'shenzhen_network_29',
    ]

Note

In co-training mode:

  • Each worker (indexed by server_number) is assigned a different dataset from ALL_DATASETS.
  • The number of NUM_META_AGENTS should match the number of datasets in ALL_DATASETS.
  • The observation and action spaces are automatically padded to the maximum dimensions across all scenarios (max movement dim = 36, max phase dim = 8, agent space = 97).

Step 2: Launch training:

python driver_unicorn.py

Monitoring Training

Training logs are automatically recorded with TensorBoard:

tensorboard --logdir ./Train_MATSC/<EXPERIMENT_NAME>/train

Key metrics tracked:

  • Policy Loss, Value Loss, Entropy Loss
  • Actor/Critic VAE Loss & Contrastive Loss
  • Episode Reward, Episode Length, Action Change Rate

Resume Training from Checkpoint

To resume training from a saved checkpoint:

class INPUT_PARAMS:
    LOAD_MODEL      = True
    EXPERIMENT_PATH = './Train_MATSC/<YOUR_EXPERIMENT_NAME>'  # ⬅️ Path to saved experiment

Testing

After training, evaluate the trained model on specific test scenarios.

Step 1: Configure the test settings in evaluator_rl.py:

# Set the experiment directory and model path
exp_dir = './Test'
agent_name_list = ['UNICORN']
model_path_list = ['./Train_MATSC/<EXPERIMENT_NAME>/model/checkpoint<EPISODE>.pkl']

Step 2: Ensure the corresponding map and flow settings in parameters.py match the training configuration:

class SUMO_PARAMS:
    NET_NAME = 'grid_network_5_5'  # ⬅️ Must match the map used during training

Step 3: Run evaluation:

python evaluator_rl.py

Step 4: After testing, the results (traffic data & trip data) will be saved in:

./Test/eval_data/
β”œβ”€β”€ <map_name>_UNICORN_traffic.csv    # Traffic metrics per timestep
└── <map_name>_UNICORN_trip.csv       # Individual vehicle trip info

Evaluation with Non-RL Baselines

Unicorn includes built-in non-RL baseline evaluators for comparison:

Baseline Description
FIXED Fixed-time signal plan
GREEDY Greedy policy based on queue length
PRESSURE Max-pressure based control

Run baseline evaluations:

python evaluator_non_rl.py

Configure the baselines in evaluator_non_rl.py:

agent_name_list = ['FIXED', 'GREEDY', 'PRESSURE']

Citation

If you find this code useful in your research, please consider citing our paper:

@ARTICLE{11360985,
  author={Zhang, Yifeng and Liu, Yilin and Gong, Ping and Li, Peizhuo and Fan, Mingfeng and Sartoretti, Guillaume},
  journal={IEEE Transactions on Intelligent Transportation Systems}, 
  title={Unicorn: A Universal and Collaborative Reinforcement Learning Approach Toward Generalizable Network-Wide Traffic Signal Control}, 
  year={2026},
  volume={},
  number={},
  pages={1-17},
  keywords={Collaboration;Topology;Network topology;Feature extraction;Vectors;Urban areas;Real-time systems;Training;Reinforcement learning;Scalability;Generalizable adaptive traffic signal control;multi-agent reinforcement learning;contrastive learning},
  doi={10.1109/TITS.2026.3653478}}

You may also find our related work useful:

@INPROCEEDINGS{10801524,
  author={Zhang, Yifeng and Li, Peizhuo and Fan, Mingfeng and Sartoretti, Guillaume},
  booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, 
  title={HeteroLight: A General and Efficient Learning Approach for Heterogeneous Traffic Signal Control}, 
  year={2024},
  volume={},
  number={},
  pages={1010-1017},
  keywords={Measurement;Network topology;Urban areas;Reinforcement learning;Feature extraction;Vectors;Robustness;Topology;Optimization;Intelligent robots},
  doi={10.1109/IROS58592.2024.10801524}}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Β© 2026 MARMot Lab @ NUS-ME


⭐ If you find this project useful, please consider giving it a star! ⭐

About

Official implementation of Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages