Skip to content

RuntimeError: captures_underway.empty() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":2967, please report a bug to PyTorch. #5

@fzyzcjy

Description

@fzyzcjy

(This is a wiki-style issue)

Question: Someone sees the following error

[2025-04-01 23:22:06 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1999, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 249, in __init__
    self.tp_worker = TpWorkerClass(
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 74, in __init__
    self.model_runner = ModelRunner(
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 169, in __init__
    self.initialize(min_per_gpu_memory)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 179, in initialize
    self.load_model()
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 392, in load_model
    self.model = get_model(
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 370, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/models/llama.py", line 481, in load_weights
    for name, loaded_weight in weights:
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 343, in _get_all_weights
    yield from self._get_weights_iterator(primary_weights)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 329, in <genexpr>
    return ((source.prefix + name, tensor) for (name, tensor) in weights_iterator)
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/sglang/srt/model_loader/weight_utils.py", line 460, in pt_weights_iterator
    torch.cuda.empty_cache()
  File "/data/gpu-use/verl-sglang/.venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 192, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: captures_underway.empty() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":2967, please report a bug to PyTorch.

Short Answer:

This is not a bug of torch_memory_saver, and you can either use a safetensors format checkpoint (instead of the legacy and unsafe pt format), or upgrade to torch 2.6.0.

Long answer:

Thus, it seems this is not a bug of torch_memory_saver, but more like an issue of torch 2.5.1's MemPool + empty_cache bug, and it is fixed in 2.6.0.

As for why this happens when loading some model, while it works well on other models in 2.5.1: I quickly glanced at code, and it seems when using pt format checkpoint, there will be an empty_cache, while there may be none when using safetensors. So only models with old pt format will trigger it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions