GitHub - shockwaveHe/Robot-Trains-Robot: [CoRL 2025] Real-world RL. Official implementation of "Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids"

Paper | Website | Video | Tweet |

Robot-Trains-Robot (RTR) is a novel framework where a robotic arm teacher actively supports and guides a humanoid robot student. The RTR system provides protection, learning schedule, reward, perturbation, failure detection, and automatic resets. It enables efficient long-term real-world humanoid training with minimal human intervention.

Setup

Our setup overall is similar to todderbot. For ubuntu system, install via the following steps:

Set up the Repo

Run the following commands to clone the repo:

mkdir ~/projects
cd ~/projects
git clone git@github.com:hukz_18/robot_trains_robot.git
cd robot_trains_robot
git submodule update --init --recursive

Install Miniforge

If conda is not installed yet, we recommend installing Miniforge.

Run the following commands to determine your system architecture:

uname -m

Based on your system architecture, download the appropriate Miniforge installer. For example, for a Linux machine with arm64 architecture, download Linux aarch64 (arm64) from their website. Do NOT run the install script with sudo. Answer yes to all the options.

Restart your terminal to activate the conda environment.

Set up Conda Environment (Linux)

mamba create --name rtr python=3.10
conda activate rtr
pip install -e toddlerbot/brax
pip install -e toddlerbot/rsl_rl
pip install -e ".[linux]"

Hardware

Toddlerbot: follow the guidline from todderbot to setup the robot.

F/T sensor: lease follow the manual starting from page 19 to setup the F/T sensor communication, and configure the ip to 192.168.2.1. (Mount: TODO)

UR5: Download the driver from hardware-interface and build it. Configure the ip.

Walkthrough

Pretraining Walking Policy

The following command execute the training procedure presented in section 3.1 in the paper. Please first config wandb to log your results. For example, you may need to setup the "WANDB_USERNAME" environment variable.

Stage 1 Training

python -m toddlerbot.locomotion.train_mjx --tag <your tag>

This command will generate a subfolder under results folder, with name like toddlerbot_2xm_walk_ppo_yyyymmdd_hhmmss_<your tag>, contain all running results of that stage 1 training run. You can verify that the eval/mean_reward raise to around 300 at the end of training (around 5k steps), while the eval/mean_reward_zero_z score remains low, typically under 50. The later metric measures the policy's performance when the dynamics latent is set to all zero, verifying the policy indeed learns to make use of the latent.

Stage 2 Training

python -m toddlerbot.locomotion.train_mjx --tag <the same tag as that above> --optimize-z --restore <yyyymmdd_hhmmss> # timestamp output from stage 1

In this stage, the policy network is freezed and only the dynamics latent is optimized, because the universal latent is intialized from the average of all environment latents, the initial performance will drop to zero but it will increase quickly, reaching around 300 in less than 300 steps.

In addition, we have provided our pretrained checkpoint here. You can download the checkpoint and place it under results folder.

Real-world Training

An examplar execution process is presented in the launch folder. Please make sure that the robot, computer and remote learner are under the same network.

Before running the experiments, start the hardware driver by running

cd force_ctl/hardware_interfaces/build/applications/force_control_demo
./force_control_<task> # task could be chosen from [swing, walk]

Real-world Adaptation for Walking Policy (Pretrain Needed)

On computer, run the script to control the arm and treadmill

python -m toddlerbot.policies.run_policy --policy at_leader --sim finetune --ip <your robot ip>

On robot, run the finetune script

python toddlerbot/policies/run_policy.py --sim real --ip <your computer ip> --policy walk_finetune --robot toddlerbot_2xm  --ckpt <pretrained checkpoint>

On remote learner, run the remote learning script

python toddlerbot/policies/run_policy.py --policy walk

Real-world Learning from Scratch for Swing-up Policy (No Pretrain)

On computer, run the script to control the arm

python toddlerbot/policies/run_policy.py --policy swing_arm_leader --ip <your robot ip> --robot toddlerbot_2xm

On robot, run the script to learn the swing-up policy

python toddlerbot/policies/run_policy.py --sim real --ip <your computer ip> --policy swing --robot toddlerbot_2xm --no-plot

Citation

If you use RTR for published research, please cite:

@inproceedings{hu2025robot,
    title={Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids},
    author={Hu, Kaizhe and Shi, Haochen and He, Yao and Wang, Weizhuo and Liu, C. Karen and Song, Shuran},
    booktitle={Conference on Robot Learning (CoRL)},
    year={2025}
}

License

The RTR codebase (including the documentation) is released under the .

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
docs		docs
examples		examples
launch		launch
motion		motion
scripts		scripts
toddlerbot		toddlerbot
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
finetune_config.gin		finetune_config.gin
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Set up the Repo

Install Miniforge

Set up Conda Environment (Linux)

Hardware

Walkthrough

Pretraining Walking Policy

Stage 1 Training

Stage 2 Training

Real-world Training

Real-world Adaptation for Walking Policy (Pretrain Needed)

Real-world Learning from Scratch for Swing-up Policy (No Pretrain)

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Setup

Set up the Repo

Install Miniforge

Set up Conda Environment (Linux)

Hardware

Walkthrough

Pretraining Walking Policy

Stage 1 Training

Stage 2 Training

Real-world Training

Real-world Adaptation for Walking Policy (Pretrain Needed)

Real-world Learning from Scratch for Swing-up Policy (No Pretrain)

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages