-
Notifications
You must be signed in to change notification settings - Fork 43
Closed
Description
(This is a wiki-style issue)
Question: Someone sees the following error
[2025-04-01 23:22:06 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1999, in run_scheduler_process
scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 249, in __init__
self.tp_worker = TpWorkerClass(
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 74, in __init__
self.model_runner = ModelRunner(
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 169, in __init__
self.initialize(min_per_gpu_memory)
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 179, in initialize
self.load_model()
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 392, in load_model
self.model = get_model(
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
return loader.load_model(
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 370, in load_model
model.load_weights(self._get_all_weights(model_config, model))
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/models/llama.py", line 481, in load_weights
for name, loaded_weight in weights:
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 343, in _get_all_weights
yield from self._get_weights_iterator(primary_weights)
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 329, in <genexpr>
return ((source.prefix + name, tensor) for (name, tensor) in weights_iterator)
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/weight_utils.py", line 460, in pt_weights_iterator
torch.cuda.empty_cache()
File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 192, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: captures_underway.empty() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":2967, please report a bug to PyTorch.
Short Answer:
This is not a bug of torch_memory_saver, and you can either use a safetensors format checkpoint (instead of the legacy and unsafe pt format), or upgrade to torch 2.6.0.
Long answer:
- It errors on torch 2.5.1 and works well on torch 2.6.0
- I quickly hacked SGLang's torch_memory_saver_adapter.py into this https://gist.github.com/fzyzcjy/2a39268c5e4c70c6797e3288b731b85f, and completely uninstalled torch_memory_saver, and the situation remains the same.
Thus, it seems this is not a bug of torch_memory_saver, but more like an issue of torch 2.5.1's MemPool + empty_cache bug, and it is fixed in 2.6.0.
As for why this happens when loading some model, while it works well on other models in 2.5.1: I quickly glanced at code, and it seems when using pt format checkpoint, there will be an empty_cache, while there may be none when using safetensors. So only models with old pt format will trigger it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels