@article{slam-former,
title={SLAM-Former: Putting SLAM into One Transformer},
author={Yijun Yuan, Zhuoguang Chen, Kenan Li, Weibang Wang, and Hang Zhao},
journal={arXiv preprint arXiv:2509.16909},
year={2025}
}- [May 11, 2026] Added two ConvHead checkpoint variants.
V1.1.pthis recommended with--target_size 518;V1.1-long.pthis recommended with--target_size 224.V1.1-long.pthis trained with scaled sequence lengths and supports longer-sequence inference. ConvHead mainly fixes the grid artifact issue. Thanks to Pi3X for the insight. - [Mar 11, 2026] Released training code. See the training branch for details.
- [Mar 4, 2026] Released SLAM code with KV pruning available.
- [Feb 26, 2026] Provides the training data.
- [Sep 24, 2025] Some good blogs can help you read SLAM-Former: here and here.
- [Sep 23, 2025] Preprint release.
git clone https://github.com/Tsinghua-MARS-Lab/SLAM-Former.git
cd SLAM-Formerconda create -n SLAM-Former python=3.11
conda activate SLAM-Former pip install -r requirements.txt
pip install -e .Download checkpoint: v1
Prepare a folder containing your image sequence, then run:
python slam/demo.py \
--ckpt_path .ckpt/checkpoint-10.pth.model \
--image_folder /path/to/your/images/ \
--output_dir ./output/result \
--target_size 518 \
--retention_ratio 0.5For evaluation, please use 100% kv: --retention_ratio 1. See issue #15 for details.
Real-time visualization during inference: add --vis to the command above. The 3D reconstruction process can be viewed interactively in Rerun. This mode is intended for local machines with a desktop session; it is not recommended on remote servers because it depends on launching an interactive viewer during inference.
Static visualization of saved results: first run slam/demo.py without --vis to save final.ply, final_traj.txt, and final_pc/, then start the browser-based viewer:
python slam/visualize_results.py \
--result_dir /path/to/output_dir \
--port 8080The static viewer serves an HTTP page at http://localhost:8080. This mode is recommended for remote servers: forward the port to your local machine, then open the forwarded URL in your browser.
ssh -L 8080:localhost:8080 user@remote-server- Links:
- Hugging Face (ARKitScenes, MVS-Synth, ScanNet, ScanNet++, Blended-MVS, MegaDepth)
- Hugging Face (Hypersim)
- v1 — recommended to use
--target_size 518for inference. - V1.1.pth — ConvHead checkpoint, recommended to use
--target_size 518for inference. - V1.1-long.pth — ConvHead checkpoint, recommended to use
--target_size 224for inference; trained with scaled sequence lengths to support longer-sequence inference.
This project adopts a dual-licensing strategy:
| Component | License | Commercial Use |
|---|---|---|
| Code | BSD 3-Clause | Permitted |
| Model Weights (checkpoints) | CC BY-NC 4.0 | Strictly Non-Commercial |