Skip to content

feat(ocr/azure-di): support Mistral-style pages param via analyze query string#25929

Merged
ishaan-berri merged 4 commits intolitellm_internal_stagingfrom
litellm_pages_support_for_ocr
Apr 18, 2026
Merged

feat(ocr/azure-di): support Mistral-style pages param via analyze query string#25929
ishaan-berri merged 4 commits intolitellm_internal_stagingfrom
litellm_pages_support_for_ocr

Conversation

@shivamrawat1
Copy link
Copy Markdown
Collaborator

Relevant issues

AzureDocumentIntelligenceOCRConfig.get_supported_ocr_params returned [], so LiteLLM dropped pages from OCR requests to azure/doc-intel. transform_ocr_request explicitly ignored it and get_complete_url never appended Azure DI's pages query param. Result: callers couldn't limit page ranges on Azure DI through /v1/ocr, despite Azure natively supporting it via ?pages=1-3,5,7-9.

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🐛 Bug Fix
✅ Test

Changes

Declare pages as a supported param.
Add map_ocr_params + _normalize_pages_param to translate Mistral-style list[int] (0-based) → Azure's 1-based comma/range string, with passthrough for native strings ("3-9") and list[str] tokens; validate and raise on bad input.
Append &pages=... to the analyze URL in get_complete_url; keep pages out of the JSON body.
Add unit tests in tests/ocr_tests/test_ocr_azure_document_intelligence.py (no Azure creds needed) covering param mapping, URL construction, body exclusion, and end-to-end shape.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 17, 2026 2:30am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This PR adds support for the pages parameter in Azure Document Intelligence OCR requests by (1) declaring it as a supported param, (2) adding map_ocr_params + _normalize_pages_param to translate Mistral-style 0-based list[int] into Azure's 1-based comma/range query string, and (3) appending &pages=... to the analyze URL in get_complete_url while keeping it out of the JSON body. The implementation is end-to-end wired correctly through litellm/ocr/main.py and is accompanied by comprehensive unit tests that don't require Azure credentials.

Confidence Score: 5/5

Safe to merge — the feature is correctly implemented end-to-end with no blocking issues.

All prior review concerns (inline import of urllib.parse, all() vs any() bool guard) were resolved in prior commits. The pages translation logic is correct, URL encoding uses safe=',-' appropriately, the param is correctly excluded from the request body, and the pipeline in ocr/main.py already calls both map_ocr_params and get_complete_url. Remaining findings are P2 style observations only.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/azure_ai/ocr/document_intelligence/transformation.py Adds pages param support: get_supported_ocr_params now returns ["pages"], map_ocr_params and _normalize_pages_param handle Mistral→Azure translation, and get_complete_url appends the query string. Prior review concerns (inline import, all() vs any() bool guard) have been addressed in prior commits.
tests/ocr_tests/test_ocr_azure_document_intelligence.py Adds TestAzureDocumentIntelligencePagesParam with 12 pure unit tests covering int-list conversion, dedup/sort, empty-list passthrough, native Azure string passthrough, list-of-string tokens, validation errors, URL construction, body exclusion, and end-to-end shape. No real network calls.

Reviews (3): Last reviewed commit: "Merge branch 'litellm_internal_staging' ..." | Re-trigger Greptile

Comment thread litellm/llms/azure_ai/ocr/document_intelligence/transformation.py Outdated
Comment thread litellm/llms/azure_ai/ocr/document_intelligence/transformation.py
Use any() instead of all() for bool check so lists like [True, 1, 2]
raise ValueError; bool is a subclass of int so all(int) alone was insufficient.

Made-with: Cursor
Remove inline import in get_complete_url; quote is stdlib with no circular
import risk per project style.

Made-with: Cursor
@shivamrawat1
Copy link
Copy Markdown
Collaborator Author

@greptile review with the new two commits that resolved the raised p1 and p2

@shivamrawat1 shivamrawat1 temporarily deployed to integration-postgres April 18, 2026 02:29 — with GitHub Actions Inactive
@shivamrawat1 shivamrawat1 temporarily deployed to integration-postgres April 18, 2026 02:29 — with GitHub Actions Inactive
@shivamrawat1 shivamrawat1 temporarily deployed to integration-postgres April 18, 2026 02:29 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri merged commit d042b44 into litellm_internal_staging Apr 18, 2026
94 of 98 checks passed
@ishaan-berri ishaan-berri deleted the litellm_pages_support_for_ocr branch April 18, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants