Reproduced success rates are lower than reported on the Robotwin benchmark

I am currently trying to reproduce the evaluation results using the robotwin-posttrain checkpoint. I ran the evaluation using the provided launch_client.sh and launch_server.sh scripts. I have not modified any code or configuration files.

However, I noticed that the success rates I'm getting are consistently lower than those reported in the paper. Specifically, under the demo_randomized setting, tasks that are reported to achieve a 90%+ success rate in the paper are only getting around 70%+ in my local tests.

Since I am using the default configs and checkpoint, could you help me analyze the potential reasons for this performance drop?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduced success rates are lower than reported on the Robotwin benchmark #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproduced success rates are lower than reported on the Robotwin benchmark #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions