Skip to content

Maxwell-Zhao/RoboSimGS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

High-Fidelity Simulated Data Generation for Real-World

Zero-Shot Robotic Manipulation Learning

with Gaussian Splatting

This is the official repository of High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting (RA-L). For more information, please visit our project page.

[Website] [Arxiv] [Video]

Linux platform License: MIT

Pipeline

TODO

  • [✅] Release arXiv technique report
  • [✅] Release 3D reconstruction pipeline
  • [✅] Release MLLM-based Articulation Inference and Physics Estimation
  • [✅] Release code for World Coordinate Frame Alignment
  • [✅] Release simulated data generation pipeline
  • Release code for Camera Pose Alignment

📚 Table of Contents

  1. Overview

  2. Installation & Setup

  3. Scene Reconstruction
    3.1 Background Reconstruction

    3.2 Object Reconstruction

  4. Simulated Data Generation
    4.1 World Coordinate Frame Alignment
    4.2 Camera Pose Alignment
    4.3 Data Generation
    4.4 Holistic Scene Augmentation

  5. Policy Training

  6. Sim2Real Deployment

  7. Citation

  8. License

Installation

RoboSimGS codebase is built on top of Genesis and Lerobot. We are actively working on this section and will update it soon. 🚧

Scene Reconstruction

Background Reconstruction

We reconstruct a 3D background scene using the 3D Gaussian Splatting (3DGS) method within the Nerfstudio. For more in-depth information, always refer to the official Nerfstudio GitHub repository. To achieve World Coordinate Frame Alignment, it is essential to segment the robot arm from the background. We accomplish this by reconstructing the 3D scene using the Feature Splatting method provided by Nerfstudio.

The final output of this process will be a .ply file representing the 3D Gaussian Splatting scene, which can be viewed in compatible real-time renderers.

Object Reconstruction

As mentioned in our paper, the object reconstruction is performed using the 'AR Code' app on an iPhone 16 Pro. By scanning the object with the phone, we can obtain a high-quality mesh.

MLLM-driven Articulation Inference. A comprehensive overview of the architecture is available in the Document.

MLLM-driven Physics Estimation. A comprehensive overview of the architecture is available in the Document.

Simulated Data Generation

conda create --name robosimgs python=3.10 -y
conda activate robosimgs
pip install -r requirements.txt

World Coordinate Frame Alignment

The core of our World Coordinate Frame Alignment procedure is the registration of two geometric representations of the robot arm. First, we generate the ground-truth point cloud of the robot arm directly from the simulation environment and save it to the specified path: sim_robot_path.

Second, during the Background Reconstruction process, the 3D Gaussian Splatting (3DGS) representation of the arm is segmented from the real-world scene. This segmented model is then stored at: real_robot_path.

These two files provide the source and target geometries for the subsequent alignment algorithm. A key challenge in this process is the scale mismatch between the simulated point cloud. Since the standard ICP algorithm does not inherently solve for scale, a direct registration would fail. To address this, we employ a human-in-the-loop approach. A user interactively adjusts the scale of the reconstructed model, using a real-time visualization window to visually assess the alignment quality.

conda activate robosimgs
cd utils
python icp.py --lerobot_real real_robot_path --lerobot_sim sim_robot_path --size 0.7

Camera Pose Alignment

We are actively working on this section and will update it soon. 🚧

Data Generation

We provide an example of a pre-aligned 3DGS, which is ready for direct use in data generation. You can download it from our Hugging Face.

conda activate robosimgs
cd DataGeneration
python pick_banana.py --start 0 --num_steps 5 --use_gs True --data_augmentation True --save_dir collected_data --reset_cam 0.001 --single_view False

We will provide an example of collected data, which is ready for direct use in policy training.

Holistic Scene Augmentation

To enhance the diversity and robustness of our training data, we have integrated data augmentation directly into our data collection pipeline.

Policy Training

We utilize codebase from IDP3 for policy training.

Sim2Real Deployment

We utilize the lerobot for our Sim2Real deployment.

Citation

If you find our work useful, please consider citing us!

@article{zhao2025high,
  title={High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting},
  author={Zhao, Haoyu and Zeng, Cheng and Zhuang, Linghao and Zhao, Yaxi and Xue, Shengke and Wang, Hao and Zhao, Xingyue and Li, Zhongyu and Li, Kehan and Huang, Siteng and others},
  journal={arXiv preprint arXiv:2510.10637},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Code for [RA-L] High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages