-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Closed
Description
I have tried the solutions in the same issue but it doesn't work. And the script I run is as follows:
set -x
export HYDRA_FULL_ERROR=1
ulimit -n 65535
PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"
python3 -m verl.trainer.main_ppo \
--config-path="$CONFIG_PATH" \
--config-name='gsm8k_multiturn_grpo' \
algorithm.adv_estimator=grpo \
data.train_batch_size=128 \
data.max_prompt_length=1024 \
data.max_response_length=1024 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.return_raw_chat=True \
actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=128 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.001 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=sglang \
actor_rollout_ref.rollout.gpu_memory_utilization=0.3 \
actor_rollout_ref.rollout.n=16 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.logger=['console'] \
trainer.project_name='gsm8k_async_rl' \
trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-async-sgl-multi-w-tool-verify-n16-4cards' \
trainer.n_gpus_per_node=4 \
trainer.nnodes=1 \
trainer.save_freq=-1 \
trainer.test_freq=20 \
trainer.total_epochs=15 \
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=8192 \
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=8192 \
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=8192 \
critic.ppo_max_token_len_per_gpu=8192 \
critic.forward_max_token_len_per_gpu=8192 \
data.train_files=$HOME/data/gsm8k/train.parquet \
data.val_files=$HOME/data/gsm8k/test.parquet \
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
actor_rollout_ref.rollout.multi_turn.interaction_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml" \
actor_rollout_ref.rollout.multi_turn.max_user_turns=1 \
$@
The error is shown follow:
(TaskRunner pid=121540) [prompt] system
(TaskRunner pid=121540) You are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the `calc_gsm8k_reward` tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of `#### <answer>`.
(TaskRunner pid=121540)
(TaskRunner pid=121540) # Tools
(TaskRunner pid=121540)
(TaskRunner pid=121540) You may call one or more functions to assist with the user query.
(TaskRunner pid=121540)
(TaskRunner pid=121540) You are provided with function signatures within <tools></tools> XML tags:
(TaskRunner pid=121540) <tools>
(TaskRunner pid=121540) {"type": "function", "function": {"name": "calc_gsm8k_reward", "description": "A tool for calculating the reward of gsm8k. (1.0 if parsed answer is correct, 0.0 if parsed answer is incorrect or not correctly parsed)", "parameters": {"type": "object", "properties": {"answer": {"type": "string", "description": "The model's answer to the GSM8K math problem, must be a digits", "enum": null}}, "required": ["answer"]}, "strict": false}}
(TaskRunner pid=121540) </tools>
(TaskRunner pid=121540)
(TaskRunner pid=121540) For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
(TaskRunner pid=121540) <tool_call>
(TaskRunner pid=121540) {"name": <function-name>, "arguments": <args-json-object>}
(TaskRunner pid=121540) </tool_call>
(TaskRunner pid=121540) user
(TaskRunner pid=121540) Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market? Let's think step by step and output the final answer after `####`.
(TaskRunner pid=121540) assistant
(TaskRunner pid=121540)
(TaskRunner pid=121540) [response] To find out how much Janet makes every day at the farmers' market, we need to follow these steps:
(TaskRunner pid=121540)
(TaskRunner pid=121540) 1. Calculate the total number of eggs laid per day.
(TaskRunner pid=121540) 2. Subtract the number of eggs eaten for breakfast.
(TaskRunner pid=121540) 3. Subtract the number of eggs used for baking.
(TaskRunner pid=121540) 4. The remainder is the number of eggs sold at the market.
(TaskRunner pid=121540) 5. Multiply the number of eggs sold by the price per egg.
(TaskRunner pid=121540)
(TaskRunner pid=121540) Let's calculate this step by step:
(TaskRunner pid=121540)
(TaskRunner pid=121540) 1. Total eggs laid per day: 16
(TaskRunner pid=121540) 2. Eggs eaten for breakfast: 3
(TaskRunner pid=121540) 3. Eggs used for baking: 4
(TaskRunner pid=121540)
(TaskRunner pid=121540) Now, let's calculate the number of eggs sold at the market.
(TaskRunner pid=121540) <tool_call>
(TaskRunner pid=121540) {"name": "calc_gsm8k_reward", "arguments": {"answer": "1"}}
(TaskRunner pid=121540)
(TaskRunner pid=121540) [ground_truth] 18
(TaskRunner pid=121540) [score] 0.0
(TaskRunner pid=121540) len reward_extra_infos_dict['reward']: 1319
(TaskRunner pid=121540) ("Initial validation metrics: {'val-core/openai/gsm8k/reward/mean@1': "
(TaskRunner pid=121540) '0.7073540561031084}')
(TaskRunner pid=121540) step:0 - val-core/openai/gsm8k/reward/mean@1:0.707
Training Progress: 0%| | 0/870 [00:00<?, ?it/s]
(WorkerDict pid=122626) [2025-06-24 22:37:48] Inconsistent training and inference tokenization detected (strict). This may lead to unexpected behavior during training. Please review your chat template to determine if this is intentional. For more information, refer to the multiturn README.md.
(WorkerDict pid=122626) [2025-06-24 22:37:48] Showing 10 characters before and after the diffs for context and better readability.
(WorkerDict pid=122626) [2025-06-24 22:37:48] Found differences:
(WorkerDict pid=122626) idx 1767:1788 -> 1767:1787 | full_prompt_chunk: '>assistant\n\n{"name": ' | current_prompt_chunk: '>assistant\n{"name": '
(WorkerDict pid=122894) /home/hanling.lh/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(WorkerDict pid=122894) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device)
(TaskRunner pid=121540) step:1 - global_seqlen/min:467346.000 - global_seqlen/max:520258.000 - global_seqlen/minmax_diff:52912.000 - global_seqlen/balanced_min:492136.000 - global_seqlen/balanced_max:492137.000 - global_seqlen/mean:492136.750 - actor/entropy:0.370 - actor/kl_loss:0.001 - actor/kl_coef:0.001 - actor/pg_loss:-0.065 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.775 - perf/mfu/actor:0.372 - perf/max_memory_allocated_gb:30.948 - perf/max_memory_reserved_gb:70.426 - perf/cpu_memory_used_gb:24.265 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.694 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.694 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:0.013 - critic/advantages/max:3.750 - critic/advantages/min:-3.750 - critic/returns/mean:0.013 - critic/returns/max:3.750 - critic/returns/min:-3.750 - response_length/mean:595.908 - response_length/max:1024.000 - response_length/min:76.000 - response_length/clip_ratio:0.063 - prompt_length/mean:365.297 - prompt_length/max:430.000 - prompt_length/min:326.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:164.396 - timing_s/reshard:4.166 - timing_s/gen:169.202 - timing_s/reward:0.733 - timing_s/old_log_prob:36.700 - timing_s/ref:29.754 - timing_s/adv:0.043 - timing_s/update_actor:90.720 - timing_s/step:327.221 - timing_per_token_ms/gen:0.139 - timing_per_token_ms/update_actor:0.046 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.015 - perf/total_num_tokens:1968547.000 - perf/time_per_step:327.221 - perf/throughput:1503.990
Training Progress: 0%| | 1/870 [05:27<79:08:02, 327.83s/it]
(WorkerDict pid=122894) [torch_memory_saver.cpp] CUresult error result=2 file=csrc/torch_memory_saver.cpp func=cu_mem_create line=104
(WorkerDict pid=122894) [2025-06-24 22:41:37 TP1] Scheduler hit an exception: Traceback (most recent call last):
(WorkerDict pid=122894) File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 2311, in run_scheduler_process
(WorkerDict pid=122894) scheduler.event_loop_overlap()
(WorkerDict pid=122894) File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(WorkerDict pid=122894) return func(*args, **kwargs)
(WorkerDict pid=122894) File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 661, in event_loop_overlap
(WorkerDict pid=122894) recv_reqs = self.recv_requests()
(WorkerDict pid=122894) File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 872, in recv_requests
(WorkerDict pid=122894) recv_reqs = broadcast_pyobj(
(WorkerDict pid=122894) File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/sglang/srt/utils.py", line 950, in broadcast_pyobj
(WorkerDict pid=122894) dist.broadcast(tensor_size, src=src, group=dist_group)
(WorkerDict pid=122894) File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
(WorkerDict pid=122894) return func(*args, **kwargs)
(WorkerDict pid=122894) File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2730, in broadcast
(WorkerDict pid=122894) work.wait()
(WorkerDict pid=122894) RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:534] Connection closed by peer [33.103.193.204]:39314
(WorkerDict pid=122894)
(WorkerDict pid=122894) [2025-06-24 22:41:37] Received sigquit from a child process. It usually means the child failed.
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff9afdd2e3ddd8a5f428b8254501000000 Worker ID: ef40762fa19a756411f289b01976984a04a3e70879e0bdd21f750f2b Node ID: e76e1ea40f1fd2d1aa98c86c76d0c8ea4ed24647fb43bd872e0960f8 Worker IP address: 33.103.193.204 Worker port: 35233 Worker PID: 122894 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Error executing job with overrides: ['algorithm.adv_estimator=grpo', 'data.train_batch_size=128', 'data.max_prompt_length=1024', 'data.max_response_length=1024', 'data.filter_overlong_prompts=True', 'data.truncation=error', 'data.return_raw_chat=True', 'actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct', 'actor_rollout_ref.actor.optim.lr=1e-6', 'actor_rollout_ref.model.use_remove_padding=True', 'actor_rollout_ref.actor.ppo_mini_batch_size=128', 'actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16', 'actor_rollout_ref.actor.use_kl_loss=True', 'actor_rollout_ref.actor.kl_loss_coef=0.001', 'actor_rollout_ref.actor.kl_loss_type=low_var_kl', 'actor_rollout_ref.actor.entropy_coeff=0', 'actor_rollout_ref.model.enable_gradient_checkpointing=True', 'actor_rollout_ref.actor.fsdp_config.param_offload=False', 'actor_rollout_ref.actor.fsdp_config.optimizer_offload=False', 'actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16', 'actor_rollout_ref.rollout.tensor_model_parallel_size=2', 'actor_rollout_ref.rollout.name=sglang', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.3', 'actor_rollout_ref.rollout.n=16', 'actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16', 'actor_rollout_ref.ref.fsdp_config.param_offload=True', 'algorithm.use_kl_in_reward=False', 'trainer.critic_warmup=0', 'trainer.logger=[console]', 'trainer.project_name=gsm8k_async_rl', 'trainer.experiment_name=qwen2.5-3b_function_rm-gsm8k-async-sgl-multi-w-tool-verify-n16-4cards', 'trainer.n_gpus_per_node=4', 'trainer.nnodes=1', 'trainer.save_freq=-1', 'trainer.test_freq=20', 'trainer.total_epochs=15', 'actor_rollout_ref.actor.ppo_max_token_len_per_gpu=8192', 'actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=8192', 'actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=8192', 'critic.ppo_max_token_len_per_gpu=8192', 'critic.forward_max_token_len_per_gpu=8192', 'data.train_files=/home/hanling.lh/data/gsm8k/train.parquet', 'data.val_files=/home/hanling.lh/data/gsm8k/test.parquet', 'actor_rollout_ref.rollout.multi_turn.tool_config_path=/home/hanling.lh/verl/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml', 'actor_rollout_ref.rollout.multi_turn.interaction_config_path=/home/hanling.lh/verl/examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml', 'actor_rollout_ref.rollout.multi_turn.max_user_turns=1']
Traceback (most recent call last):
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/hanling.lh/verl/verl/trainer/main_ppo.py", line 262, in <module>
main()
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
lambda: hydra.run(
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/home/hanling.lh/verl/verl/trainer/main_ppo.py", line 31, in main
run_ppo(config)
File "/home/hanling.lh/verl/verl/trainer/main_ppo.py", line 54, in run_ppo
ray.get(runner.run.remote(config))
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/home/hanling.lh/.conda/envs/verl_py310/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ActorDiedError): ray::TaskRunner.run() (pid=121540, ip=33.103.193.204, actor_id=fc6d13134659af4c1c7165ee01000000, repr=<main_ppo.TaskRunner object at 0x7edc6a269660>)
File "/home/hanling.lh/verl/verl/trainer/main_ppo.py", line 190, in run
trainer.fit()
File "/home/hanling.lh/verl/verl/trainer/ppo/ray_trainer.py", line 981, in fit
gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
File "/home/hanling.lh/verl/verl/single_controller/ray/base.py", line 51, in __call__
output = ray.get(output)
ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
class_name: create_colocated_worker_cls.<locals>.WorkerDict
actor_id: 9afdd2e3ddd8a5f428b8254501000000
pid: 122894
name: LP5NYYWorkerDict_0:2
namespace: fbedfb08-6ed7-49dc-8853-532ee2d23ada
ip: 33.103.193.204
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
We can find the first sample has been successfully run, but every time before the second sample, this error will happen.
My package version is as follows:
Package Version Editable project location
---------------------------------------- ------------- --------------------------------
accelerate 1.8.1
aiohappyeyeballs 2.6.1
aiohttp 3.12.13
aiohttp-cors 0.8.1
aiosignal 1.3.2
airportsdata 20250622
annotated-types 0.7.0
anthropic 0.55.0
antlr4-python3-runtime 4.9.3
anyio 4.9.0
astor 0.8.1
asttokens 3.0.0
async-timeout 5.0.1
attrs 25.3.0
av 14.4.0
blake3 1.0.5
blessed 1.21.0
cachetools 5.5.2
certifi 2025.6.15
cffi 1.17.1
cfgv 3.4.0
charset-normalizer 3.4.2
click 8.2.1
cloudpickle 3.1.1
codetiming 1.4.0
colorful 0.5.6
compressed-tensors 0.9.3
cuda-bindings 12.9.0
cuda-python 12.9.0
cupy-cuda12x 13.4.1
datasets 3.6.0
decorator 5.2.1
decord 0.6.0
Deprecated 1.2.18
depyf 0.18.0
dill 0.3.8
diskcache 5.6.3
distlib 0.3.9
distro 1.9.0
dnspython 2.7.0
einops 0.8.1
email_validator 2.2.0
exceptiongroup 1.3.0
executing 2.2.0
fastapi 0.115.13
fastapi-cli 0.0.7
fastrlock 0.8.3
filelock 3.18.0
flash_attn 2.7.4.post1
flashinfer-python 0.2.5
frozenlist 1.7.0
fsspec 2025.3.0
gguf 0.17.1
gitdb 4.0.12
GitPython 3.1.44
google-api-core 2.25.1
google-auth 2.40.3
googleapis-common-protos 1.70.0
gpustat 1.1.1
grpcio 1.73.0
h11 0.16.0
hf_transfer 0.1.9
hf-xet 1.1.5
httpcore 1.0.9
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.33.0
hydra-core 1.3.2
identify 2.6.12
idna 3.10
importlib_metadata 8.0.0
iniconfig 2.1.0
interegular 0.3.3
ipython 8.37.0
jedi 0.19.2
Jinja2 3.1.6
jiter 0.10.0
jsonschema 4.24.0
jsonschema-specifications 2025.4.1
lark 1.2.2
liger_kernel 0.5.10
litellm 1.73.0
llguidance 0.7.30
llvmlite 0.44.0
lm-format-enforcer 0.10.11
markdown-it-py 3.0.0
MarkupSafe 3.0.2
mathruler 0.1.0
matplotlib-inline 0.1.7
mdurl 0.1.2
mistral_common 1.6.2
modelscope 1.27.1
mpmath 1.3.0
msgpack 1.1.1
msgspec 0.19.0
multidict 6.5.0
multiprocess 0.70.16
nanobind 2.7.0
nest-asyncio 1.6.0
networkx 3.4.2
ninja 1.11.1.4
nodeenv 1.9.1
numba 0.61.2
numpy 1.26.4
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-ml-py 12.575.51
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
omegaconf 2.3.0
openai 1.91.0
opencensus 0.11.4
opencensus-context 0.1.3
opencv-fixer 0.2.5
opencv-python 4.11.0.86
opencv-python-headless 4.11.0.86
opentelemetry-api 1.34.1
opentelemetry-exporter-otlp 1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc 1.26.0
opentelemetry-exporter-otlp-proto-http 1.26.0
opentelemetry-exporter-prometheus 0.55b1
opentelemetry-proto 1.26.0
opentelemetry-sdk 1.34.1
opentelemetry-semantic-conventions 0.55b1
opentelemetry-semantic-conventions-ai 0.4.9
optree 0.16.0
orjson 3.10.18
outlines 0.1.11
outlines_core 0.1.26
packaging 25.0
pandas 2.3.0
parso 0.8.4
partial-json-parser 0.2.1.1.post6
peft 0.15.2
pexpect 4.9.0
pillow 11.2.1
pip 25.1
platformdirs 4.3.8
pluggy 1.6.0
pre_commit 4.2.0
prometheus_client 0.22.1
prometheus-fastapi-instrumentator 7.1.0
prompt_toolkit 3.0.51
propcache 0.3.2
proto-plus 1.26.1
protobuf 4.25.8
psutil 7.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
py-cpuinfo 9.0.0
py-spy 0.4.0
pyarrow 20.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.2
pybind11 2.13.6
pycountry 24.6.1
pycparser 2.22
pydantic 2.11.7
pydantic_core 2.33.2
pyext 0.7
Pygments 2.19.2
pylatexenc 2.10
pynvml 12.0.0
pytest 8.4.1
python-dateutil 2.9.0.post0
python-dotenv 1.1.0
python-json-logger 3.3.0
python-multipart 0.0.20
pytz 2025.2
PyYAML 6.0.2
pyzmq 27.0.0
qwen-vl-utils 0.0.11
ray 2.47.1
referencing 0.36.2
regex 2024.11.6
requests 2.32.4
rich 14.0.0
rich-toolkit 0.14.7
rpds-py 0.25.1
rsa 4.9.1
ruff 0.12.0
safetensors 0.5.3
scipy 1.15.3
sentencepiece 0.2.0
sentry-sdk 2.30.0
setproctitle 1.3.6
setuptools 78.1.1
sgl-kernel 0.1.4
sglang 0.4.6.post5
shellingham 1.5.4
six 1.17.0
smart-open 7.1.0
smmap 5.0.2
sniffio 1.3.1
soundfile 0.13.1
stack-data 0.6.3
starlette 0.46.2
sympy 1.13.1
tensordict 0.6.2
tiktoken 0.9.0
tokenizers 0.21.1
tomli 2.2.1
torch 2.6.0
torch_memory_saver 0.0.8
torchao 0.11.0
torchaudio 2.6.0
torchdata 0.11.0
torchvision 0.21.0
tqdm 4.67.1
traitlets 5.14.3
transformers 4.51.1
triton 3.2.0
typer 0.16.0
typing_extensions 4.14.0
typing-inspection 0.4.1
tzdata 2025.2
urllib3 2.5.0
uvicorn 0.34.3
uvloop 0.21.0
verl 0.4.0.dev0 /home/hanling.lh/verl/tests/verl
virtualenv 20.31.2
vllm 0.8.5.post1
wandb 0.20.1
watchfiles 1.1.0
wcwidth 0.2.13
websockets 15.0.1
wheel 0.45.1
wrapt 1.17.2
xformers 0.0.29.post2
xgrammar 0.1.18
xxhash 3.5.0
yarl 1.20.1
zipp 3.23.0
I run on 4*A100 and have tried reducing batch_size, but it not worked. I've been stuck here for a long time, thank you very much for your solution.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels