Skip to content

Out of Memory Error on a 12GB GPU #424

@issamu2k

Description

@issamu2k

🐛 Describe the bug

My home setup is a RTX 4070 Super 12 GB and 32 GB of RAM, Ubuntu 25.04.
The description says the model run on 12GB cards.

Even tried killing the GUI to release more VRAM.
set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True as the error mentioned.

I was running one of the examples:
python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmo-page-1.pdf

Received a torch.OutOfMemoryError

2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) return self.forward_static(
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) ^^^^^^^^^^^^^^^^^^^^
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) File "/home/samus/anaconda/envs/olmocr/lib/python3.11/site-packages/vllm/model_executor/layers/layernorm.py", line 169, in forward_static
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) x = x.to(orig_dtype)
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) ^^^^^^^^^^^^^^^^
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 11.61 GiB of which 87.06 MiB is free. Including non-PyTorch memory, this process has 10.88 GiB memory in use. Of the allocated memory 10.48 GiB is allocated by PyTorch, and 163.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
2026-01-17 20:30:51,299 - main - WARNING - Attempt 74: Please wait for vllm server to become ready...
2026-01-17 20:30:51,784 - vllm - INFO - [rank0]:[W117 20:30:51.849560838 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Versions

Python 3.11.14
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
annotated-doc==0.0.4
annotated-types==0.7.0
anthropic==0.71.0
anyio==4.12.1
apache-tvm-ffi==0.1.8.post2
astor==0.8.1
attrs==25.4.0
beaker-py==2.5.4
beautifulsoup4==4.14.3
blake3==1.0.8
bleach==6.3.0
boto3==1.42.27
botocore==1.42.27
cached_path==1.8.1
cachetools==6.2.4
cbor2==5.8.0
certifi==2026.1.4
cffi==2.0.0
charset-normalizer==3.4.4
click==8.3.1
cloudpickle==3.1.2
compressed-tensors==0.12.2
cryptography==46.0.3
cuda-bindings==13.1.1
cuda-pathfinder==1.3.3
cuda-python==13.1.1
cupy-cuda12x==13.6.0
depyf==0.20.0
dill==0.4.0
diskcache==5.6.3
distro==1.9.0
dnspython==2.8.0
docstring_parser==0.17.0
einops==0.8.1
email-validator==2.3.0
fastapi==0.128.0
fastapi-cli==0.0.20
fastapi-cloud-cli==0.10.1
fastar==0.8.0
fastrlock==0.8.3
filelock==3.20.3
flashinfer-python==0.5.2
frozenlist==1.8.0
fsspec==2026.1.0
ftfy==6.3.1
gguf==0.17.1
google-api-core==2.29.0
google-auth==2.47.0
google-cloud-core==2.5.0
google-cloud-storage==3.8.0
google-crc32c==1.8.0
google-resumable-media==2.8.0
googleapis-common-protos==1.72.0
grpcio==1.76.0
h11==0.16.0
hf-xet==1.2.0
httpcore==1.0.9
httptools==0.7.1
httpx==0.28.1
huggingface-hub==0.36.0
idna==3.11
interegular==0.3.3
Jinja2==3.1.6
jiter==0.12.0
jmespath==1.0.1
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
lark==1.2.2
lingua-language-detector==2.1.1
llguidance==1.3.0
llvmlite==0.44.0
lm-format-enforcer==0.11.3
loguru==0.7.3
markdown-it-py==4.0.0
markdown2==2.5.4
markdownify==1.2.2
MarkupSafe==3.0.3
mdurl==0.1.2
mistral_common==1.8.8
model-hosting-container-standards==0.1.13
mpmath==1.3.0
msgpack==1.1.2
msgspec==0.20.0
multidict==6.7.0
networkx==3.6.1
ninja==1.13.0
numba==0.61.2
numpy==2.2.6
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cudnn-frontend==1.17.0
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-cutlass-dsl==4.3.5
nvidia-ml-py==13.590.44
nvidia-nccl-cu12==2.27.5
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvshmem-cu12==3.3.20
nvidia-nvtx-cu12==12.8.90
olmocr==0.4.16
openai==2.15.0
openai-harmony==0.0.8
opencv-python-headless==4.12.0.88
orjson==3.11.5
outlines_core==0.2.11
packaging==25.0
partial-json-parser==0.2.1.1.post7
pillow==12.1.0
prometheus-fastapi-instrumentator==7.1.0
prometheus_client==0.24.1
propcache==0.4.1
proto-plus==1.27.0
protobuf==6.33.4
psutil==7.2.1
py-cpuinfo==9.0.0
pyasn1==0.6.1
pyasn1_modules==0.4.2
pybase64==1.4.3
pycountry==24.6.1
pycparser==2.23
pydantic==2.12.5
pydantic-extra-types==2.11.0
pydantic-settings==2.12.0
pydantic_core==2.41.5
Pygments==2.19.2
pypdf==6.6.0
pypdfium2==5.3.0
python-dateutil==2.9.0.post0
python-dotenv==1.2.1
python-json-logger==4.0.0
python-multipart==0.0.21
PyYAML==6.0.3
pyzmq==27.1.0
ray==2.53.0
referencing==0.37.0
regex==2026.1.14
requests==2.32.5
rich==13.9.4
rich-toolkit==0.17.1
rignore==0.7.6
rpds-py==0.30.0
rsa==4.9.1
s3transfer==0.16.0
safetensors==0.7.0
scipy==1.17.0
sentencepiece==0.2.1
sentry-sdk==2.49.0
setproctitle==1.3.7
shellingham==1.5.4
six==1.17.0
smart_open==7.5.0
sniffio==1.3.1
soupsieve==2.8.1
starlette==0.50.0
supervisor==4.3.0
sympy==1.14.0
tabulate==0.9.0
tiktoken==0.12.0
tokenizers==0.22.2
torch==2.9.0+cu128
torchaudio==2.9.0+cu128
torchvision==0.24.0+cu128
tqdm==4.67.1
transformers==4.57.3
triton==3.5.0
typer==0.21.1
typing-inspection==0.4.2
typing_extensions==4.15.0
urllib3==2.6.3
uvicorn==0.40.0
uvloop==0.22.1
vllm==0.11.2
watchfiles==1.1.1
wcwidth==0.2.14
webencodings==0.5.1
websockets==16.0
wrapt==2.0.1
xformers==0.0.33.post1
xgrammar==0.1.25
yarl==1.22.0
zstandard==0.25.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions