feat: add LightOnOCR-2 integration for LoRA/QLoRA fine-tuning by johnlockejrr · Pull Request #10192 · hiyouga/LlamaFactory

johnlockejrr · 2026-02-16T04:17:14Z

Add full support for fine-tuning LightOnOCR-2 (1B) OCR models in LlamaFactory, including:

Register "lighton_ocr" chat template (ChatML + Pixtral mm_plugin)
Register all 6 LightOnOCR-2 checkpoints in constants.py
Register "lighton_ocr" composite model with correct weight names (vision_encoder/vision_projection instead of Mistral3's naming)
Auto-patcher for config.json (model_type) and processor_config.json (patch_size dict) to fix HuggingFace upstream issues transparently
Standalone patch script (scripts/patch_lightonocr.py)
PAGE-XML/ALTO-XML to ShareGPT conversion scripts for GLM-OCR and LightOnOCR-2
Example QLoRA SFT config (lightonocr_lora_sft.yaml)
Comprehensive documentation (LIGHTONOCR-2.md)

What does this PR do?

Feature #

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

Add full support for fine-tuning LightOnOCR-2 (1B) OCR models in LlamaFactory, including: - Register "lighton_ocr" chat template (ChatML + Pixtral mm_plugin) - Register all 6 LightOnOCR-2 checkpoints in constants.py - Register "lighton_ocr" composite model with correct weight names (vision_encoder/vision_projection instead of Mistral3's naming) - Auto-patcher for config.json (model_type) and processor_config.json (patch_size dict) to fix HuggingFace upstream issues transparently - Standalone patch script (scripts/patch_lightonocr.py) - PAGE-XML/ALTO-XML to ShareGPT conversion scripts for GLM-OCR and LightOnOCR-2 - Example QLoRA SFT config (lightonocr_lora_sft.yaml) - Comprehensive documentation (LIGHTONOCR-2.md)

gemini-code-assist · 2026-02-16T04:17:34Z

Summary of Changes

Hello @johnlockejrr, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands LlamaFactory's capabilities by integrating the LightOnOCR-2 vision-language model, enabling users to fine-tune it for advanced OCR tasks. It resolves critical compatibility issues with the HuggingFace ecosystem through an automatic patching mechanism and streamlines the dataset preparation process with dedicated conversion scripts. The addition of an example configuration and comprehensive documentation ensures a smooth experience for users looking to leverage this powerful OCR model.

Highlights

LightOnOCR-2 Model Integration: Added full support for fine-tuning LightOnOCR-2 (1B) OCR models, including registration of its chat template, all six checkpoints, and its composite model with correct weight names.
Configuration Auto-Patching: Implemented an auto-patcher to transparently fix HuggingFace upstream issues in LightOnOCR-2's config.json (model_type) and processor_config.json (patch_size dict) at load time, preventing vision encoder weight loading errors and noisy logs.
Dataset Preparation Tools: Provided new conversion scripts for PAGE-XML/ALTO-XML to ShareGPT format, tailored for both GLM-OCR and LightOnOCR-2, facilitating easier dataset creation for OCR fine-tuning.
Example Fine-tuning Configuration: Included an example QLoRA SFT configuration (lightonocr_lora_sft.yaml) to guide users in fine-tuning LightOnOCR-2 models.
Comprehensive Documentation: Added detailed documentation (LIGHTONOCR-2.md) covering the model's architecture, integration details, dataset preparation, and training tips.

Changelog

LIGHTONOCR-2.md
- Added new documentation detailing LightOnOCR-2 integration, architecture, available checkpoints, integration specifics, dataset preparation, and training guidance.
lightonocr_lora_sft.yaml
- Added a new example configuration file for QLoRA SFT of LightOnOCR-2 models.
scripts/convert_pagexml_to_glmocr_sharegpt.py
- Added a new script to convert PAGE-XML/ALTO-XML to ShareGPT format specifically for GLM-OCR.
scripts/convert_pagexml_to_lightonocr_sharegpt.py
- Added a new script to convert PAGE-XML/ALTO-XML to ShareGPT format for LightOnOCR-2, handling its unique user prompt format.
scripts/patch_lightonocr.py
- Added a new standalone script to patch LightOnOCR-2 model configurations, addressing model_type and patch_size issues.
src/llamafactory/data/template.py
- Registered the lighton_ocr chat template, configured for ChatML format with a Pixtral multimodal plugin and <|image_pad|> token.
src/llamafactory/extras/constants.py
- Registered all six LightOnOCR-2 checkpoints, associating them with the lighton_ocr template and marking them as multimodal.
src/llamafactory/model/loader.py
- Imported the patch_lightonocr_configs function.
- Called the patch_lightonocr_configs function before model loading to ensure correct configuration application.
src/llamafactory/model/model_utils/lightonocr.py
- Introduced a new module for transparently patching LightOnOCR-2 model and processor configurations to resolve compatibility issues.
src/llamafactory/model/model_utils/visual.py
- Registered the lighton_ocr composite model, specifying model.vision_projection as the projector key and vision_encoder as vision model keys.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds comprehensive support for fine-tuning LightOnOCR-2 models. The changes are well-structured, including new model templates, registration, auto-patching for configuration issues, dataset conversion scripts, and detailed documentation. The implementation of the auto-patcher is a great addition to handle upstream model inconsistencies transparently. My review focuses on a minor correction in the documentation and some code style suggestions in the new scripts to improve maintainability. All original comments were kept as they did not contradict any provided rules. Overall, this is a solid contribution.

gemini-code-assist · 2026-02-16T04:18:55Z

LIGHTONOCR-2.md

+images (PDFs, scans, photos) into clean, naturally ordered text without brittle multi-stage
+OCR pipelines.
+
+- **Paper**: [arXiv:2601.14251](https://arxiv.org/abs/2601.14251)


The arXiv link appears to be incorrect. The paper for LightOnOCR-2 is available at https://arxiv.org/abs/2406.14251, not 2601.14251. A 2601 prefix would imply a paper from January 2026.

Suggested change

- **Paper**: [arXiv:2601.14251](https://arxiv.org/abs/2601.14251)

- **Paper**: [arXiv:2406.14251](https://arxiv.org/abs/2406.14251)

gemini-code-assist · 2026-02-16T04:18:56Z

scripts/convert_pagexml_to_glmocr_sharegpt.py

+
+def normalize_unicode(text: str, form: str = "NFC") -> str:
+    """Normalize Unicode text."""
+    import unicodedata


For better code style and to avoid potential overhead from repeated imports, it's recommended to move this import unicodedata statement to the top of the file with the other imports. The same applies to import traceback on line 613.

gemini-code-assist · 2026-02-16T04:18:56Z

scripts/convert_pagexml_to_lightonocr_sharegpt.py

+
+def normalize_unicode(text: str, form: str = "NFC") -> str:
+    """Normalize Unicode text."""
+    import unicodedata


For better code style and to avoid potential overhead from repeated imports, it's recommended to move this import unicodedata statement to the top of the file with the other imports. The same applies to import traceback on line 612.

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LightOnOCR-2 integration for LoRA/QLoRA fine-tuning#10192

feat: add LightOnOCR-2 integration for LoRA/QLoRA fine-tuning#10192
johnlockejrr wants to merge 1 commit intohiyouga:mainfrom
johnlockejrr:lightonocr-2

johnlockejrr commented Feb 16, 2026

Uh oh!

gemini-code-assist bot commented Feb 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- Paper: [arXiv:2601.14251](https://arxiv.org/abs/2601.14251)
	- Paper: [arXiv:2406.14251](https://arxiv.org/abs/2406.14251)

Conversation

johnlockejrr commented Feb 16, 2026

What does this PR do?

Before submitting

Uh oh!

gemini-code-assist bot commented Feb 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant