-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
🐛 Describe the bug
My home setup is a RTX 4070 Super 12 GB and 32 GB of RAM, Ubuntu 25.04.
The description says the model run on 12GB cards.
Even tried killing the GUI to release more VRAM.
set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True as the error mentioned.
I was running one of the examples:
python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmo-page-1.pdf
Received a torch.OutOfMemoryError
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) return self.forward_static(
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) ^^^^^^^^^^^^^^^^^^^^
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) File "/home/samus/anaconda/envs/olmocr/lib/python3.11/site-packages/vllm/model_executor/layers/layernorm.py", line 169, in forward_static
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) x = x.to(orig_dtype)
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) ^^^^^^^^^^^^^^^^
2026-01-17 20:30:51,298 - vllm - INFO - (EngineCore_DP0 pid=13131) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB. GPU 0 has a total capacity of 11.61 GiB of which 87.06 MiB is free. Including non-PyTorch memory, this process has 10.88 GiB memory in use. Of the allocated memory 10.48 GiB is allocated by PyTorch, and 163.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
2026-01-17 20:30:51,299 - main - WARNING - Attempt 74: Please wait for vllm server to become ready...
2026-01-17 20:30:51,784 - vllm - INFO - [rank0]:[W117 20:30:51.849560838 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Versions
Python 3.11.14
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
annotated-doc==0.0.4
annotated-types==0.7.0
anthropic==0.71.0
anyio==4.12.1
apache-tvm-ffi==0.1.8.post2
astor==0.8.1
attrs==25.4.0
beaker-py==2.5.4
beautifulsoup4==4.14.3
blake3==1.0.8
bleach==6.3.0
boto3==1.42.27
botocore==1.42.27
cached_path==1.8.1
cachetools==6.2.4
cbor2==5.8.0
certifi==2026.1.4
cffi==2.0.0
charset-normalizer==3.4.4
click==8.3.1
cloudpickle==3.1.2
compressed-tensors==0.12.2
cryptography==46.0.3
cuda-bindings==13.1.1
cuda-pathfinder==1.3.3
cuda-python==13.1.1
cupy-cuda12x==13.6.0
depyf==0.20.0
dill==0.4.0
diskcache==5.6.3
distro==1.9.0
dnspython==2.8.0
docstring_parser==0.17.0
einops==0.8.1
email-validator==2.3.0
fastapi==0.128.0
fastapi-cli==0.0.20
fastapi-cloud-cli==0.10.1
fastar==0.8.0
fastrlock==0.8.3
filelock==3.20.3
flashinfer-python==0.5.2
frozenlist==1.8.0
fsspec==2026.1.0
ftfy==6.3.1
gguf==0.17.1
google-api-core==2.29.0
google-auth==2.47.0
google-cloud-core==2.5.0
google-cloud-storage==3.8.0
google-crc32c==1.8.0
google-resumable-media==2.8.0
googleapis-common-protos==1.72.0
grpcio==1.76.0
h11==0.16.0
hf-xet==1.2.0
httpcore==1.0.9
httptools==0.7.1
httpx==0.28.1
huggingface-hub==0.36.0
idna==3.11
interegular==0.3.3
Jinja2==3.1.6
jiter==0.12.0
jmespath==1.0.1
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
lark==1.2.2
lingua-language-detector==2.1.1
llguidance==1.3.0
llvmlite==0.44.0
lm-format-enforcer==0.11.3
loguru==0.7.3
markdown-it-py==4.0.0
markdown2==2.5.4
markdownify==1.2.2
MarkupSafe==3.0.3
mdurl==0.1.2
mistral_common==1.8.8
model-hosting-container-standards==0.1.13
mpmath==1.3.0
msgpack==1.1.2
msgspec==0.20.0
multidict==6.7.0
networkx==3.6.1
ninja==1.13.0
numba==0.61.2
numpy==2.2.6
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cudnn-frontend==1.17.0
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-cutlass-dsl==4.3.5
nvidia-ml-py==13.590.44
nvidia-nccl-cu12==2.27.5
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvshmem-cu12==3.3.20
nvidia-nvtx-cu12==12.8.90
olmocr==0.4.16
openai==2.15.0
openai-harmony==0.0.8
opencv-python-headless==4.12.0.88
orjson==3.11.5
outlines_core==0.2.11
packaging==25.0
partial-json-parser==0.2.1.1.post7
pillow==12.1.0
prometheus-fastapi-instrumentator==7.1.0
prometheus_client==0.24.1
propcache==0.4.1
proto-plus==1.27.0
protobuf==6.33.4
psutil==7.2.1
py-cpuinfo==9.0.0
pyasn1==0.6.1
pyasn1_modules==0.4.2
pybase64==1.4.3
pycountry==24.6.1
pycparser==2.23
pydantic==2.12.5
pydantic-extra-types==2.11.0
pydantic-settings==2.12.0
pydantic_core==2.41.5
Pygments==2.19.2
pypdf==6.6.0
pypdfium2==5.3.0
python-dateutil==2.9.0.post0
python-dotenv==1.2.1
python-json-logger==4.0.0
python-multipart==0.0.21
PyYAML==6.0.3
pyzmq==27.1.0
ray==2.53.0
referencing==0.37.0
regex==2026.1.14
requests==2.32.5
rich==13.9.4
rich-toolkit==0.17.1
rignore==0.7.6
rpds-py==0.30.0
rsa==4.9.1
s3transfer==0.16.0
safetensors==0.7.0
scipy==1.17.0
sentencepiece==0.2.1
sentry-sdk==2.49.0
setproctitle==1.3.7
shellingham==1.5.4
six==1.17.0
smart_open==7.5.0
sniffio==1.3.1
soupsieve==2.8.1
starlette==0.50.0
supervisor==4.3.0
sympy==1.14.0
tabulate==0.9.0
tiktoken==0.12.0
tokenizers==0.22.2
torch==2.9.0+cu128
torchaudio==2.9.0+cu128
torchvision==0.24.0+cu128
tqdm==4.67.1
transformers==4.57.3
triton==3.5.0
typer==0.21.1
typing-inspection==0.4.2
typing_extensions==4.15.0
urllib3==2.6.3
uvicorn==0.40.0
uvloop==0.22.1
vllm==0.11.2
watchfiles==1.1.1
wcwidth==0.2.14
webencodings==0.5.1
websockets==16.0
wrapt==2.0.1
xformers==0.0.33.post1
xgrammar==0.1.25
yarl==1.22.0
zstandard==0.25.0