Out of Memory Error on a 12GB GPU

### 🐛 Describe the bug

My home setup is a RTX 4070 Super 12 GB and 32 GB of RAM, Ubuntu 25.04.
The description says the model run on 12GB cards.

Even tried killing the GUI to release more VRAM. 
set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True as the error mentioned.


I was running one of the examples:
python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmo-page-1.pdf

Received a torch.OutOfMemoryError

2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131)     return self.forward_static(
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131)            ^^^^^^^^^^^^^^^^^^^^
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131)   File "/home/samus/anaconda/envs/olmocr/lib/python3.11/site-packages/vllm/model_executor/layers/layernorm.py", line 169, in forward_static
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131)     x = x.to(orig_dtype)
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131)         ^^^^^^^^^^^^^^^^
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 11.61 GiB of which 87.06 MiB is free. Including non-PyTorch memory, this process has 10.88 GiB memory in use. Of the allocated memory 10.48 GiB is allocated by PyTorch, and 163.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
2026-01-17 20:30:51,299 - __main__ - WARNING - Attempt 74: Please wait for vllm server to become ready...
2026-01-17 20:30:51,784 - vllm - INFO - [rank0]:[W117 20:30:51.849560838 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

### Versions

Python 3.11.14
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
annotated-doc==0.0.4
annotated-types==0.7.0
anthropic==0.71.0
anyio==4.12.1
apache-tvm-ffi==0.1.8.post2
astor==0.8.1
attrs==25.4.0
beaker-py==2.5.4
beautifulsoup4==4.14.3
blake3==1.0.8
bleach==6.3.0
boto3==1.42.27
botocore==1.42.27
cached_path==1.8.1
cachetools==6.2.4
cbor2==5.8.0
certifi==2026.1.4
cffi==2.0.0
charset-normalizer==3.4.4
click==8.3.1
cloudpickle==3.1.2
compressed-tensors==0.12.2
cryptography==46.0.3
cuda-bindings==13.1.1
cuda-pathfinder==1.3.3
cuda-python==13.1.1
cupy-cuda12x==13.6.0
depyf==0.20.0
dill==0.4.0
diskcache==5.6.3
distro==1.9.0
dnspython==2.8.0
docstring_parser==0.17.0
einops==0.8.1
email-validator==2.3.0
fastapi==0.128.0
fastapi-cli==0.0.20
fastapi-cloud-cli==0.10.1
fastar==0.8.0
fastrlock==0.8.3
filelock==3.20.3
flashinfer-python==0.5.2
frozenlist==1.8.0
fsspec==2026.1.0
ftfy==6.3.1
gguf==0.17.1
google-api-core==2.29.0
google-auth==2.47.0
google-cloud-core==2.5.0
google-cloud-storage==3.8.0
google-crc32c==1.8.0
google-resumable-media==2.8.0
googleapis-common-protos==1.72.0
grpcio==1.76.0
h11==0.16.0
hf-xet==1.2.0
httpcore==1.0.9
httptools==0.7.1
httpx==0.28.1
huggingface-hub==0.36.0
idna==3.11
interegular==0.3.3
Jinja2==3.1.6
jiter==0.12.0
jmespath==1.0.1
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
lark==1.2.2
lingua-language-detector==2.1.1
llguidance==1.3.0
llvmlite==0.44.0
lm-format-enforcer==0.11.3
loguru==0.7.3
markdown-it-py==4.0.0
markdown2==2.5.4
markdownify==1.2.2
MarkupSafe==3.0.3
mdurl==0.1.2
mistral_common==1.8.8
model-hosting-container-standards==0.1.13
mpmath==1.3.0
msgpack==1.1.2
msgspec==0.20.0
multidict==6.7.0
networkx==3.6.1
ninja==1.13.0
numba==0.61.2
numpy==2.2.6
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cudnn-frontend==1.17.0
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-cutlass-dsl==4.3.5
nvidia-ml-py==13.590.44
nvidia-nccl-cu12==2.27.5
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvshmem-cu12==3.3.20
nvidia-nvtx-cu12==12.8.90
olmocr==0.4.16
openai==2.15.0
openai-harmony==0.0.8
opencv-python-headless==4.12.0.88
orjson==3.11.5
outlines_core==0.2.11
packaging==25.0
partial-json-parser==0.2.1.1.post7
pillow==12.1.0
prometheus-fastapi-instrumentator==7.1.0
prometheus_client==0.24.1
propcache==0.4.1
proto-plus==1.27.0
protobuf==6.33.4
psutil==7.2.1
py-cpuinfo==9.0.0
pyasn1==0.6.1
pyasn1_modules==0.4.2
pybase64==1.4.3
pycountry==24.6.1
pycparser==2.23
pydantic==2.12.5
pydantic-extra-types==2.11.0
pydantic-settings==2.12.0
pydantic_core==2.41.5
Pygments==2.19.2
pypdf==6.6.0
pypdfium2==5.3.0
python-dateutil==2.9.0.post0
python-dotenv==1.2.1
python-json-logger==4.0.0
python-multipart==0.0.21
PyYAML==6.0.3
pyzmq==27.1.0
ray==2.53.0
referencing==0.37.0
regex==2026.1.14
requests==2.32.5
rich==13.9.4
rich-toolkit==0.17.1
rignore==0.7.6
rpds-py==0.30.0
rsa==4.9.1
s3transfer==0.16.0
safetensors==0.7.0
scipy==1.17.0
sentencepiece==0.2.1
sentry-sdk==2.49.0
setproctitle==1.3.7
shellingham==1.5.4
six==1.17.0
smart_open==7.5.0
sniffio==1.3.1
soupsieve==2.8.1
starlette==0.50.0
supervisor==4.3.0
sympy==1.14.0
tabulate==0.9.0
tiktoken==0.12.0
tokenizers==0.22.2
torch==2.9.0+cu128
torchaudio==2.9.0+cu128
torchvision==0.24.0+cu128
tqdm==4.67.1
transformers==4.57.3
triton==3.5.0
typer==0.21.1
typing-inspection==0.4.2
typing_extensions==4.15.0
urllib3==2.6.3
uvicorn==0.40.0
uvloop==0.22.1
vllm==0.11.2
watchfiles==1.1.1
wcwidth==0.2.14
webencodings==0.5.1
websockets==16.0
wrapt==2.0.1
xformers==0.0.33.post1
xgrammar==0.1.25
yarl==1.22.0
zstandard==0.25.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of Memory Error on a 12GB GPU #424

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Out of Memory Error on a 12GB GPU #424

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions