RuntimeError: captures_underway.empty() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":2967, please report a bug to PyTorch.

(This is a wiki-style issue)

Question: Someone sees the following error

```
[2025-04-01 23:22:06 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1999, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 249, in __init__
    self.tp_worker = TpWorkerClass(
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 74, in __init__
    self.model_runner = ModelRunner(
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 169, in __init__
    self.initialize(min_per_gpu_memory)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 179, in initialize
    self.load_model()
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 392, in load_model
    self.model = get_model(
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 370, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/models/llama.py", line 481, in load_weights
    for name, loaded_weight in weights:
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 343, in _get_all_weights
    yield from self._get_weights_iterator(primary_weights)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 329, in <genexpr>
    return ((source.prefix + name, tensor) for (name, tensor) in weights_iterator)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/weight_utils.py", line 460, in pt_weights_iterator
    torch.cuda.empty_cache()
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 192, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: captures_underway.empty() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":2967, please report a bug to PyTorch.
```

Short Answer:

This is not a bug of torch_memory_saver, and you can either use a safetensors format checkpoint (instead of the legacy and unsafe pt format), or upgrade to torch 2.6.0.

Long answer:

* It errors on torch 2.5.1 and works well on torch 2.6.0
* I quickly hacked SGLang's torch_memory_saver_adapter.py into this https://gist.github.com/fzyzcjy/2a39268c5e4c70c6797e3288b731b85f, and completely uninstalled torch_memory_saver, and the situation remains the same.

Thus, it seems this is not a bug of torch_memory_saver, but more like an issue of torch 2.5.1's MemPool + empty_cache bug, and it is fixed in 2.6.0.

As for why this happens when loading some model, while it works well on other models in 2.5.1: I quickly glanced at code, and it seems when using pt format checkpoint, there will be an empty_cache, while there may be none when using safetensors. So only models with old pt format will trigger it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: captures_underway.empty() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":2967, please report a bug to PyTorch. #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError: captures_underway.empty() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":2967, please report a bug to PyTorch. #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions