VLM Enhanced Query by LarFii · Pull Request #86 · HKUDS/RAG-Anything

LarFii · 2025-08-15T12:18:12Z

Description

This pull request introduces VLM Enhanced Query mode to RAGAnything, enabling automatic multimodal analysis when documents contain images. The system can now pass images directly to Vision Language Models (VLM) alongside text context for comprehensive analysis.

Related Issues

N/A

Changes Made

Added VLM Enhanced Query Mode: New query type that automatically processes images in retrieved context
Updated vision_model_func signature: Added messages parameter to support multimodal VLM communication format
Enhanced README documentation:
- Updated both English and Chinese READMEs with VLM enhanced query examples
- Changed query types from "two types" to "three types"
- Added comprehensive usage examples for VLM enhanced queries
Updated example code: Modified raganything_example.py to demonstrate the new VLM functionality
Version bump: Updated version from 1.2.6 to 1.2.7 in __init__.py
Backward compatibility: Maintained support for traditional single image format and pure text queries

Key Features Added:

Automatic image detection and base64 encoding from retrieved context
Support for both automatic VLM enhancement (when vision_model_func is available) and manual control via vlm_enhanced parameter
Comprehensive multimodal analysis combining text context and images
Fallback to normal queries when no images are found

Checklist

Additional Notes

This feature significantly enhances RAGAnything's multimodal capabilities by enabling seamless integration of visual content analysis within the RAG pipeline. Users can now ask questions about charts, diagrams, and other visual elements in documents without additional preprocessing steps. The implementation maintains full backward compatibility with existing functionality while providing powerful new capabilities for multimodal document understanding.

VLM Enhanced Query

LarFii added 3 commits August 12, 2025 15:59

vlm_enhanced_query

dfd9ec8

update debug log

801f276

Update query.py

d031468

LarFii merged commit 79078b2 into main Aug 15, 2025
1 check passed

LarFii deleted the vlm_enhanced_query branch August 15, 2025 12:18

Kirky-X pushed a commit to Kirky-X/RAG-Anything that referenced this pull request Dec 18, 2025

Merge pull request HKUDS#86 from HKUDS/vlm_enhanced_query

694fd4b

VLM Enhanced Query

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM Enhanced Query#86

VLM Enhanced Query#86
LarFii merged 3 commits intomainfrom
vlm_enhanced_query

LarFii commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LarFii commented Aug 15, 2025

Description

Related Issues

Changes Made

Key Features Added:

Checklist

Additional Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant