I am currently trying to reproduce the evaluation results using the robotwin-posttrain checkpoint. I ran the evaluation using the provided launch_client.sh and launch_server.sh scripts. I have not modified any code or configuration files.
However, I noticed that the success rates I'm getting are consistently lower than those reported in the paper. Specifically, under the demo_randomized setting, tasks that are reported to achieve a 90%+ success rate in the paper are only getting around 70%+ in my local tests.
Since I am using the default configs and checkpoint, could you help me analyze the potential reasons for this performance drop?