Tengbo Yu *, Guanxing Lu *, Zaijia Yang *, Haoyuan Deng, Season Si Chen, Jiwen Lu, Wenbo Ding, Guoqiang Hu, Yansong Tang †, Ziwei Wang
Multi-task robotic bimanual manipulation is becoming increasingly popular as it enables sophisticated tasks that require diverse dual-arm collaboration patterns. Compared to unimanual manipulation, bimanual tasks pose challenges to understanding the multi-body spatiotemporal dynamics. An existing method ManiGaussian pioneers encoding the spatiotemporal dynamics into the visual representation via Gaussian world model for single-arm settings, which ignores the interaction of multiple embodiments for dual-arm systems with significant performance drop. In this paper, we propose ManiGaussian++, an extension of ManiGaussian framework that improves multi-task bimanual manipulation by digesting multi-body scene dynamics through a hierarchical Gaussian world model. To be specific, we first generate task-oriented Gaussian Splatting from intermediate visual features, which aims to differentiate acting and stabilizing arms for multibody spatiotemporal dynamics modeling. We then build a hierarchical Gaussian world model with the leader-follower architecture, where the multi-body spatiotemporal dynamics is mined for intermediate visual representation via future scene prediction. The leader predicts Gaussian Splatting deformation caused by motions of the stabilizing arm, through which the follower generates the physical consequences resulted from the movement of the acting arm. As a result, our method significantly outperforms the current state-of-the-art bimanual manipulation techniques by an improvement of 20.2% in 10 simulated tasks, and achieves 60% success rate on average in 9 challenging real-world tasks.
- Jun. 2025: Codebase for simulated experiments is released!
- Jun. 2025: Our paper is accepted by IROS2025!
NOTE: ManiGaussian++ is mainly built upon the Perceiver-Actor^2 repo by Markus Grotz et al.
See INSTALL.md for installation instructions.
The following steps are structured in order.
bash scripts/gen_demonstrations_nerf.sh
We use wandb to log some curves and visualizations. Login to wandb before running the scripts.
wandb login
To train our ManiGaussian without deformation predictor, task-oriented Gaussian and hierarchical Gaussian world model
bash scripts/train_bimanual.sh ManiGaussian_BC2 0,1 12345 ${exp_name}
To train our ManiGaussian without task-oriented Gaussian and hierarchical Gaussian world, run:
bash scripts/train_bimanual_dyn.sh ManiGaussian_BC2 0,1 12345 ${exp_name}
To train our ManiGaussian without hierarchical Gaussian world, run:
bash scripts/train_LF_MASK_IN_NERF.sh ManiGaussian_BC2 0,1 12345 ${exp_name}
To train our vanilla ManiGaussian, run:
bash scripts/train_LF_MASK_IN_NERF_HIER.sh ManiGaussian_BC2 0,1 12345 ${exp_name}
To evaluate the checkpoint, you can use:
bash scripts/eval.sh ManiGaussian_BC2 ${exp_name} 0This repository is released under the MIT license.
Our code is built upon ManiGaussian, AnyBimanual, Perceiver-Actor^2, PerAct, RLBench, and CLIP. We thank all these authors for their nicely open sourced code and their great contributions to the community.
If you find this repository helpful, please consider citing:
@misc{yu2025manigaussiangeneralroboticbimanual,
title={ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model},
author={Tengbo Yu and Guanxing Lu and Zaijia Yang and Haoyuan Deng and Season Si Chen and Jiwen Lu and Wenbo Ding and Guoqiang Hu and Yansong Tang and Ziwei Wang},
year={2025},
eprint={2506.19842},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2506.19842},
}
