-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
🐛 Describe the bug
The image/diagram/chart coordinates from the olmocr model seems to be off:
- Out of bounds for the input image size at 1288px for the longest side,
- Out of bounds sometimes when segmenting images from the original page render at 2048px at the longest side, matching that of the training data.
- Even when the image is within bound, it seems to be a bit off/shifted and does not match the imput's diagram.
Potential reason being:
- The prompt for GPT training data generation has image width and height included in the prompt, but olmocr pipeline does not include that in the prompt.
- The silver data prompt does not seem to specify the coordinate system. Where is (0,0)? Which axis is x or y? Are width and height supposed to be added, subtracted, or centered around startx and starty?
- The pipeline for olmocr defaults the longest side to be 1288px, while the silver training data is generated at a resolution where TARGET_IMAGE_DIM is 2048px. According to usage, olmocr seems to output image coordinates in the range of (0,2048), matching that of the training data, especially considering potential reason 1. However, even when extraciting image with a resolution of at least 2048 px, some of the Olmo outputs are still out of bounds on the shorter side, potentially due to the model not understanding the aspect ratio of the images.
here are the code to extract image from the jsonl output, generated with default configs using olmocr==0.4.25 and running model locally.
postparse.py
Versions
python --version && uv pip freeze
Python 3.12.12
accelerate==1.12.0
aiofiles==24.1.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
albucore==0.0.24
albumentations==2.0.8
annotated-doc==0.0.4
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.12.1
attrs==25.4.0
av==16.1.0
beaker-py==2.5.4
beautifulsoup4==4.14.3
bleach==6.3.0
boto3==1.42.42
botocore==1.42.42
brotli==1.2.0
cached-path==1.8.9
certifi==2026.1.4
cffi==2.0.0
charset-normalizer==3.4.4
click==8.3.1
coloredlogs==15.0.1
colorlog==6.10.1
contourpy==1.3.3
cryptography==46.0.4
cycler==0.12.1
datasets==4.5.0
dill==0.4.0
distro==1.9.0
doclayout-yolo==0.0.4
fast-langdetect==0.2.5
fastapi==0.128.1
fasttext-predict==0.9.2.4
ffmpy==1.0.0
filelock==3.20.3
flatbuffers==25.12.19
fonttools==4.61.1
frozenlist==1.8.0
fsspec==2025.10.0
ftfy==6.3.1
google-api-core==2.29.0
google-auth==2.48.0
google-cloud-core==2.5.0
google-cloud-storage==3.9.0
google-crc32c==1.8.0
google-resumable-media==2.8.0
googleapis-common-protos==1.72.0
gradio==5.49.1
gradio-client==1.13.3
gradio-pdf==0.0.22
groovy==0.1.2
grpcio==1.76.0
h11==0.16.0
hf-xet==1.2.0
httpcore==1.0.9
httpx==0.28.1
httpx-retries==0.4.5
huggingface-hub==0.36.1
humanfriendly==10.0
idna==3.11
imageio==2.37.2
jinja2==3.1.6
jiter==0.13.0
jmespath==1.1.0
json-repair==0.56.0
kiwisolver==1.4.9
lazy-loader==0.4
lingua-language-detector==2.1.1
loguru==0.7.3
magika==1.0.1
markdown-it-py==4.0.0
markdown2==2.5.4
markdownify==1.2.2
markupsafe==3.0.3
matplotlib==3.10.8
mdurl==0.1.2
mineru==2.7.5
mineru-vl-utils==0.1.22
mlx==0.30.5
mlx-lm==0.29.1
mlx-metal==0.30.5
mlx-vlm==0.3.9
modelscope==1.34.0
mpmath==1.3.0
multidict==6.7.1
multiprocess==0.70.18
networkx==3.6.1
numpy==2.2.6
olmocr==0.4.25
omegaconf==2.3.0
onnxruntime==1.20.1
openai==2.16.0
opencv-python==4.13.0.90
opencv-python-headless==4.13.0.90
orjson==3.11.7
packaging==26.0
pandas==2.3.3
pdfminer-six==20260107
pdftext==0.6.3
pillow==11.3.0
polars==1.38.0
polars-runtime-32==1.38.0
propcache==0.4.1
proto-plus==1.27.1
protobuf==6.33.5
psutil==7.2.2
py-cpuinfo==9.0.0
pyarrow==23.0.0
pyasn1==0.6.2
pyasn1-modules==0.4.2
pyclipper==1.4.0
pycparser==3.0
pydantic==2.11.10
pydantic-core==2.33.2
pydantic-settings==2.12.0
pydub==0.25.1
pygments==2.19.2
pymupdf==1.26.7
pyparsing==3.3.2
pypdf==6.6.2
pypdfium2==4.30.0
python-dateutil==2.9.0.post0
python-dotenv==1.2.1
python-multipart==0.0.22
pytz==2025.2
pyyaml==6.0.3
qwen-vl-utils==0.0.14
regex==2026.1.15
reportlab==4.4.9
requests==2.32.5
rich==13.9.4
robust-downloader==0.0.2
rsa==4.9.1
ruff==0.15.0
s3transfer==0.16.0
safehttpx==0.1.7
safetensors==0.7.0
scikit-image==0.26.0
scipy==1.17.0
seaborn==0.13.2
semantic-version==2.10.0
sentencepiece==0.2.1
setuptools==79.0.1
shapely==2.1.2
shellingham==1.5.4
simsimd==6.5.12
six==1.17.0
smart-open==7.5.0
sniffio==1.3.1
soundfile==0.13.1
soupsieve==2.8.3
starlette==0.50.0
stringzilla==4.6.0
sympy==1.14.0
thop==0.1.1.post2209072238
tifffile==2026.1.28
tokenizers==0.22.2
tomlkit==0.13.3
torch==2.8.0
torchvision==0.23.0
tqdm==4.67.3
transformers==4.57.3
typer==0.21.1
typing-extensions==4.15.0
typing-inspection==0.4.2
tzdata==2025.3
ultralytics==8.4.11
ultralytics-thop==2.0.18
urllib3==2.6.3
uvicorn==0.40.0
wcwidth==0.5.3
webencodings==0.5.1
websockets==15.0.1
wrapt==2.1.1
xxhash==3.6.0
yarl==1.22.0
zstandard==0.25.0