Lots of people hit training failures after 100 steps especially in multi-turn agentic RL.
For example AgentR1/Agent-R1#30 (comment)
This kind of problem is very difficult to debug due to lacking tools.
The idea in this issue is to log input\output from LLM and tool calls into external tracking system such as wandb.