Skip to content

sglang multiturn problem: the actor died after every first epoch trained #2189

@LH019

Description

@LH019

I have tried the solutions in the same issue but it doesn't work. And the script I run is as follows:

set -x
export HYDRA_FULL_ERROR=1
ulimit -n 65535

PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"

python3 -m verl.trainer.main_ppo \
    --config-path="$CONFIG_PATH" \
    --config-name='gsm8k_multiturn_grpo' \
    algorithm.adv_estimator=grpo \
    data.train_batch_size=128 \
    data.max_prompt_length=1024 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.return_raw_chat=True \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.3 \
    actor_rollout_ref.rollout.n=16 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console'] \
    trainer.project_name='gsm8k_async_rl' \
    trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-async-sgl-multi-w-tool-verify-n16-4cards' \
    trainer.n_gpus_per_node=4 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=20 \
    trainer.total_epochs=15 \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=8192 \
    actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=8192 \
    actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=8192 \
    critic.ppo_max_token_len_per_gpu=8192 \
    critic.forward_max_token_len_per_gpu=8192 \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
    actor_rollout_ref.rollout.multi_turn.interaction_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml" \
    actor_rollout_ref.rollout.multi_turn.max_user_turns=1 \
    $@

The error is shown follow:

(TaskRunner pid=121540) [prompt] system
(TaskRunner pid=121540) You are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the `calc_gsm8k_reward` tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of `#### <answer>`.
(TaskRunner pid=121540) 
(TaskRunner pid=121540) # Tools
(TaskRunner pid=121540) 
(TaskRunner pid=121540) You may call one or more functions to assist with the user query.
(TaskRunner pid=121540) 
(TaskRunner pid=121540) You are provided with function signatures within <tools></tools> XML tags:
(TaskRunner pid=121540) <tools>
(TaskRunner pid=121540) {"type": "function", "function": {"name": "calc_gsm8k_reward", "description": "A tool for calculating the reward of gsm8k. (1.0 if parsed answer is correct, 0.0 if parsed answer is incorrect or not correctly parsed)", "parameters": {"type": "object", "properties": {"answer": {"type": "string", "description": "The model's answer to the GSM8K math problem, must be a digits", "enum": null}}, "required": ["answer"]}, "strict": false}}
(TaskRunner pid=121540) </tools>
(TaskRunner pid=121540) 
(TaskRunner pid=121540) For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
(TaskRunner pid=121540) <tool_call>
(TaskRunner pid=121540) {"name": <function-name>, "arguments": <args-json-object>}
(TaskRunner pid=121540) </tool_call>
(TaskRunner pid=121540) user
(TaskRunner pid=121540) Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market? Let's think step by step and output the final answer after `####`.
(TaskRunner pid=121540) assistant
(TaskRunner pid=121540) 
(TaskRunner pid=121540) [response] To find out how much Janet makes every day at the farmers' market, we need to follow these steps:
(TaskRunner pid=121540) 
(TaskRunner pid=121540) 1. Calculate the total number of eggs laid per day.
(TaskRunner pid=121540) 2. Subtract the number of eggs eaten for breakfast.
(TaskRunner pid=121540) 3. Subtract the number of eggs used for baking.
(TaskRunner pid=121540) 4. The remainder is the number of eggs sold at the market.
(TaskRunner pid=121540) 5. Multiply the number of eggs sold by the price per egg.
(TaskRunner pid=121540) 
(TaskRunner pid=121540) Let's calculate this step by step:
(TaskRunner pid=121540) 
(TaskRunner pid=121540) 1. Total eggs laid per day: 16
(TaskRunner pid=121540) 2. Eggs eaten for breakfast: 3
(TaskRunner pid=121540) 3. Eggs used for baking: 4
(TaskRunner pid=121540) 
(TaskRunner pid=121540) Now, let's calculate the number of eggs sold at the market.
(TaskRunner pid=121540) <tool_call>
(TaskRunner pid=121540) {"name": "calc_gsm8k_reward", "arguments": {"answer": "1"}}
(TaskRunner pid=121540) 
(TaskRunner pid=121540) [ground_truth] 18
(TaskRunner pid=121540) [score] 0.0
(TaskRunner pid=121540) len reward_extra_infos_dict['reward']: 1319
(TaskRunner pid=121540) ("Initial validation metrics: {'val-core/openai/gsm8k/reward/mean@1': "
(TaskRunner pid=121540)  '0.7073540561031084}')
(TaskRunner pid=121540) step:0 - val-core/openai/gsm8k/reward/mean@1:0.707
Training Progress:   0%|          | 0/870 [00:00<?, ?it/s]
(WorkerDict pid=122626) [2025-06-24 22:37:48] Inconsistent training and inference tokenization detected (strict). This may lead to unexpected behavior during training. Please review your chat template to determine if this is intentional. For more information, refer to the multiturn README.md.
(WorkerDict pid=122626) [2025-06-24 22:37:48] Showing 10 characters before and after the diffs for context and better readability.
(WorkerDict pid=122626) [2025-06-24 22:37:48] Found differences:
(WorkerDict pid=122626) idx 1767:1788 -> 1767:1787 | full_prompt_chunk: '>assistant\n\n{"name": ' | current_prompt_chunk: '>assistant\n{"name": '
(WorkerDict pid=122894) /home/hanling.lh/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(WorkerDict pid=122894)   tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device)
(TaskRunner pid=121540) step:1 - global_seqlen/min:467346.000 - global_seqlen/max:520258.000 - global_seqlen/minmax_diff:52912.000 - global_seqlen/balanced_min:492136.000 - global_seqlen/balanced_max:492137.000 - global_seqlen/mean:492136.750 - actor/entropy:0.370 - actor/kl_loss:0.001 - actor/kl_coef:0.001 - actor/pg_loss:-0.065 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.775 - perf/mfu/actor:0.372 - perf/max_memory_allocated_gb:30.948 - perf/max_memory_reserved_gb:70.426 - perf/cpu_memory_used_gb:24.265 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.694 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.694 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:0.013 - critic/advantages/max:3.750 - critic/advantages/min:-3.750 - critic/returns/mean:0.013 - critic/returns/max:3.750 - critic/returns/min:-3.750 - response_length/mean:595.908 - response_length/max:1024.000 - response_length/min:76.000 - response_length/clip_ratio:0.063 - prompt_length/mean:365.297 - prompt_length/max:430.000 - prompt_length/min:326.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:164.396 - timing_s/reshard:4.166 - timing_s/gen:169.202 - timing_s/reward:0.733 - timing_s/old_log_prob:36.700 - timing_s/ref:29.754 - timing_s/adv:0.043 - timing_s/update_actor:90.720 - timing_s/step:327.221 - timing_per_token_ms/gen:0.139 - timing_per_token_ms/update_actor:0.046 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.015 - perf/total_num_tokens:1968547.000 - perf/time_per_step:327.221 - perf/throughput:1503.990
Training Progress:   0%|          | 1/870 [05:27<79:08:02, 327.83s/it]
(WorkerDict pid=122894) [torch_memory_saver.cpp] CUresult error  result=2 file=csrc/torch_memory_saver.cpp func=cu_mem_create line=104
(WorkerDict pid=122894) [2025-06-24 22:41:37 TP1] Scheduler hit an exception: Traceback (most recent call last):
(WorkerDict pid=122894)   File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 2311, in run_scheduler_process
(WorkerDict pid=122894)     scheduler.event_loop_overlap()
(WorkerDict pid=122894)   File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(WorkerDict pid=122894)     return func(*args, **kwargs)
(WorkerDict pid=122894)   File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 661, in event_loop_overlap
(WorkerDict pid=122894)     recv_reqs = self.recv_requests()
(WorkerDict pid=122894)   File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 872, in recv_requests
(WorkerDict pid=122894)     recv_reqs = broadcast_pyobj(
(WorkerDict pid=122894)   File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/sglang/srt/utils.py", line 950, in broadcast_pyobj
(WorkerDict pid=122894)     dist.broadcast(tensor_size, src=src, group=dist_group)
(WorkerDict pid=122894)   File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
(WorkerDict pid=122894)     return func(*args, **kwargs)
(WorkerDict pid=122894)   File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2730, in broadcast
(WorkerDict pid=122894)     work.wait()
(WorkerDict pid=122894) RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [33.103.193.204]:39314
(WorkerDict pid=122894) 
(WorkerDict pid=122894) [2025-06-24 22:41:37] Received sigquit from a child process. It usually means the child failed.
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff9afdd2e3ddd8a5f428b8254501000000 Worker ID: ef40762fa19a756411f289b01976984a04a3e70879e0bdd21f750f2b Node ID: e76e1ea40f1fd2d1aa98c86c76d0c8ea4ed24647fb43bd872e0960f8 Worker IP address: 33.103.193.204 Worker port: 35233 Worker PID: 122894 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Error executing job with overrides: ['algorithm.adv_estimator=grpo', 'data.train_batch_size=128', 'data.max_prompt_length=1024', 'data.max_response_length=1024', 'data.filter_overlong_prompts=True', 'data.truncation=error', 'data.return_raw_chat=True', 'actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct', 'actor_rollout_ref.actor.optim.lr=1e-6', 'actor_rollout_ref.model.use_remove_padding=True', 'actor_rollout_ref.actor.ppo_mini_batch_size=128', 'actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16', 'actor_rollout_ref.actor.use_kl_loss=True', 'actor_rollout_ref.actor.kl_loss_coef=0.001', 'actor_rollout_ref.actor.kl_loss_type=low_var_kl', 'actor_rollout_ref.actor.entropy_coeff=0', 'actor_rollout_ref.model.enable_gradient_checkpointing=True', 'actor_rollout_ref.actor.fsdp_config.param_offload=False', 'actor_rollout_ref.actor.fsdp_config.optimizer_offload=False', 'actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16', 'actor_rollout_ref.rollout.tensor_model_parallel_size=2', 'actor_rollout_ref.rollout.name=sglang', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.3', 'actor_rollout_ref.rollout.n=16', 'actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16', 'actor_rollout_ref.ref.fsdp_config.param_offload=True', 'algorithm.use_kl_in_reward=False', 'trainer.critic_warmup=0', 'trainer.logger=[console]', 'trainer.project_name=gsm8k_async_rl', 'trainer.experiment_name=qwen2.5-3b_function_rm-gsm8k-async-sgl-multi-w-tool-verify-n16-4cards', 'trainer.n_gpus_per_node=4', 'trainer.nnodes=1', 'trainer.save_freq=-1', 'trainer.test_freq=20', 'trainer.total_epochs=15', 'actor_rollout_ref.actor.ppo_max_token_len_per_gpu=8192', 'actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=8192', 'actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=8192', 'critic.ppo_max_token_len_per_gpu=8192', 'critic.forward_max_token_len_per_gpu=8192', 'data.train_files=/home/hanling.lh/data/gsm8k/train.parquet', 'data.val_files=/home/hanling.lh/data/gsm8k/test.parquet', 'actor_rollout_ref.rollout.multi_turn.tool_config_path=/home/hanling.lh/verl/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml', 'actor_rollout_ref.rollout.multi_turn.interaction_config_path=/home/hanling.lh/verl/examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml', 'actor_rollout_ref.rollout.multi_turn.max_user_turns=1']
Traceback (most recent call last):
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/hanling.lh/verl/verl/trainer/main_ppo.py", line 262, in <module>
    main()
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/hanling.lh/verl/verl/trainer/main_ppo.py", line 31, in main
    run_ppo(config)
  File "/home/hanling.lh/verl/verl/trainer/main_ppo.py", line 54, in run_ppo
    ray.get(runner.run.remote(config))
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ActorDiedError): ray::TaskRunner.run() (pid=121540, ip=33.103.193.204, actor_id=fc6d13134659af4c1c7165ee01000000, repr=<main_ppo.TaskRunner object at 0x7edc6a269660>)
  File "/home/hanling.lh/verl/verl/trainer/main_ppo.py", line 190, in run
    trainer.fit()
  File "/home/hanling.lh/verl/verl/trainer/ppo/ray_trainer.py", line 981, in fit
    gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
  File "/home/hanling.lh/verl/verl/single_controller/ray/base.py", line 51, in __call__
    output = ray.get(output)
ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
        class_name: create_colocated_worker_cls.<locals>.WorkerDict
        actor_id: 9afdd2e3ddd8a5f428b8254501000000
        pid: 122894
        name: LP5NYYWorkerDict_0:2
        namespace: fbedfb08-6ed7-49dc-8853-532ee2d23ada
        ip: 33.103.193.204
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

We can find the first sample has been successfully run, but every time before the second sample, this error will happen.
My package version is as follows:

Package                                  Version       Editable project location
---------------------------------------- ------------- --------------------------------
accelerate                               1.8.1
aiohappyeyeballs                         2.6.1
aiohttp                                  3.12.13
aiohttp-cors                             0.8.1
aiosignal                                1.3.2
airportsdata                             20250622
annotated-types                          0.7.0
anthropic                                0.55.0
antlr4-python3-runtime                   4.9.3
anyio                                    4.9.0
astor                                    0.8.1
asttokens                                3.0.0
async-timeout                            5.0.1
attrs                                    25.3.0
av                                       14.4.0
blake3                                   1.0.5
blessed                                  1.21.0
cachetools                               5.5.2
certifi                                  2025.6.15
cffi                                     1.17.1
cfgv                                     3.4.0
charset-normalizer                       3.4.2
click                                    8.2.1
cloudpickle                              3.1.1
codetiming                               1.4.0
colorful                                 0.5.6
compressed-tensors                       0.9.3
cuda-bindings                            12.9.0
cuda-python                              12.9.0
cupy-cuda12x                             13.4.1
datasets                                 3.6.0
decorator                                5.2.1
decord                                   0.6.0
Deprecated                               1.2.18
depyf                                    0.18.0
dill                                     0.3.8
diskcache                                5.6.3
distlib                                  0.3.9
distro                                   1.9.0
dnspython                                2.7.0
einops                                   0.8.1
email_validator                          2.2.0
exceptiongroup                           1.3.0
executing                                2.2.0
fastapi                                  0.115.13
fastapi-cli                              0.0.7
fastrlock                                0.8.3
filelock                                 3.18.0
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.5
frozenlist                               1.7.0
fsspec                                   2025.3.0
gguf                                     0.17.1
gitdb                                    4.0.12
GitPython                                3.1.44
google-api-core                          2.25.1
google-auth                              2.40.3
googleapis-common-protos                 1.70.0
gpustat                                  1.1.1
grpcio                                   1.73.0
h11                                      0.16.0
hf_transfer                              0.1.9
hf-xet                                   1.1.5
httpcore                                 1.0.9
httptools                                0.6.4
httpx                                    0.28.1
huggingface-hub                          0.33.0
hydra-core                               1.3.2
identify                                 2.6.12
idna                                     3.10
importlib_metadata                       8.0.0
iniconfig                                2.1.0
interegular                              0.3.3
ipython                                  8.37.0
jedi                                     0.19.2
Jinja2                                   3.1.6
jiter                                    0.10.0
jsonschema                               4.24.0
jsonschema-specifications                2025.4.1
lark                                     1.2.2
liger_kernel                             0.5.10
litellm                                  1.73.0
llguidance                               0.7.30
llvmlite                                 0.44.0
lm-format-enforcer                       0.10.11
markdown-it-py                           3.0.0
MarkupSafe                               3.0.2
mathruler                                0.1.0
matplotlib-inline                        0.1.7
mdurl                                    0.1.2
mistral_common                           1.6.2
modelscope                               1.27.1
mpmath                                   1.3.0
msgpack                                  1.1.1
msgspec                                  0.19.0
multidict                                6.5.0
multiprocess                             0.70.16
nanobind                                 2.7.0
nest-asyncio                             1.6.0
networkx                                 3.4.2
ninja                                    1.11.1.4
nodeenv                                  1.9.1
numba                                    0.61.2
numpy                                    1.26.4
nvidia-cublas-cu12                       12.4.5.8
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127
nvidia-cudnn-cu12                        9.1.0.70
nvidia-cufft-cu12                        11.2.1.3
nvidia-curand-cu12                       10.3.5.147
nvidia-cusolver-cu12                     11.6.1.9
nvidia-cusparse-cu12                     12.3.1.170
nvidia-cusparselt-cu12                   0.6.2
nvidia-ml-py                             12.575.51
nvidia-nccl-cu12                         2.21.5
nvidia-nvjitlink-cu12                    12.4.127
nvidia-nvtx-cu12                         12.4.127
omegaconf                                2.3.0
openai                                   1.91.0
opencensus                               0.11.4
opencensus-context                       0.1.3
opencv-fixer                             0.2.5
opencv-python                            4.11.0.86
opencv-python-headless                   4.11.0.86
opentelemetry-api                        1.34.1
opentelemetry-exporter-otlp              1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc   1.26.0
opentelemetry-exporter-otlp-proto-http   1.26.0
opentelemetry-exporter-prometheus        0.55b1
opentelemetry-proto                      1.26.0
opentelemetry-sdk                        1.34.1
opentelemetry-semantic-conventions       0.55b1
opentelemetry-semantic-conventions-ai    0.4.9
optree                                   0.16.0
orjson                                   3.10.18
outlines                                 0.1.11
outlines_core                            0.1.26
packaging                                25.0
pandas                                   2.3.0
parso                                    0.8.4
partial-json-parser                      0.2.1.1.post6
peft                                     0.15.2
pexpect                                  4.9.0
pillow                                   11.2.1
pip                                      25.1
platformdirs                             4.3.8
pluggy                                   1.6.0
pre_commit                               4.2.0
prometheus_client                        0.22.1
prometheus-fastapi-instrumentator        7.1.0
prompt_toolkit                           3.0.51
propcache                                0.3.2
proto-plus                               1.26.1
protobuf                                 4.25.8
psutil                                   7.0.0
ptyprocess                               0.7.0
pure_eval                                0.2.3
py-cpuinfo                               9.0.0
py-spy                                   0.4.0
pyarrow                                  20.0.0
pyasn1                                   0.6.1
pyasn1_modules                           0.4.2
pybind11                                 2.13.6
pycountry                                24.6.1
pycparser                                2.22
pydantic                                 2.11.7
pydantic_core                            2.33.2
pyext                                    0.7
Pygments                                 2.19.2
pylatexenc                               2.10
pynvml                                   12.0.0
pytest                                   8.4.1
python-dateutil                          2.9.0.post0
python-dotenv                            1.1.0
python-json-logger                       3.3.0
python-multipart                         0.0.20
pytz                                     2025.2
PyYAML                                   6.0.2
pyzmq                                    27.0.0
qwen-vl-utils                            0.0.11
ray                                      2.47.1
referencing                              0.36.2
regex                                    2024.11.6
requests                                 2.32.4
rich                                     14.0.0
rich-toolkit                             0.14.7
rpds-py                                  0.25.1
rsa                                      4.9.1
ruff                                     0.12.0
safetensors                              0.5.3
scipy                                    1.15.3
sentencepiece                            0.2.0
sentry-sdk                               2.30.0
setproctitle                             1.3.6
setuptools                               78.1.1
sgl-kernel                               0.1.4
sglang                                   0.4.6.post5
shellingham                              1.5.4
six                                      1.17.0
smart-open                               7.1.0
smmap                                    5.0.2
sniffio                                  1.3.1
soundfile                                0.13.1
stack-data                               0.6.3
starlette                                0.46.2
sympy                                    1.13.1
tensordict                               0.6.2
tiktoken                                 0.9.0
tokenizers                               0.21.1
tomli                                    2.2.1
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.11.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
tqdm                                     4.67.1
traitlets                                5.14.3
transformers                             4.51.1
triton                                   3.2.0
typer                                    0.16.0
typing_extensions                        4.14.0
typing-inspection                        0.4.1
tzdata                                   2025.2
urllib3                                  2.5.0
uvicorn                                  0.34.3
uvloop                                   0.21.0
verl                                     0.4.0.dev0    /home/hanling.lh/verl/tests/verl
virtualenv                               20.31.2
vllm                                     0.8.5.post1
wandb                                    0.20.1
watchfiles                               1.1.0
wcwidth                                  0.2.13
websockets                               15.0.1
wheel                                    0.45.1
wrapt                                    1.17.2
xformers                                 0.0.29.post2
xgrammar                                 0.1.18
xxhash                                   3.5.0
yarl                                     1.20.1
zipp                                     3.23.0

I run on 4*A100 and have tried reducing batch_size, but it not worked. I've been stuck here for a long time, thank you very much for your solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions