This project is a learning platform for agentic programming with:
- LLM reasoning using Google Gemini with optional Ollama fallback
- LangGraph multi-agent orchestration
- Chest X-ray classifier (Hugging Face/torch) with generic fallback
- BLIP (Hugging Face) vision-language tool
- DuckDuckGo web search tool
- Streamlit UI
It is designed for experimenting with tool calling, shared memory, reflection loops, and dynamic routing.
This is an educational engineering project and not a medical diagnostic system.
Streamlit UI
|
v
LangGraph Orchestration
|
+--> Gemini API (primary)
+--> Ollama Gemma (fallback, optional)
+--> CNN Tool (medical chest X-ray classifier, in-process)
+--> VLM Tool (BLIP, in-process)
+--> Research Tool (DuckDuckGo)
PlannerAgentImageDecisionAgentCNNToolNodeVLMToolNodeResearchAgentCriticAgentFinalResponseAgent
The graph includes a reflection loop from CriticAgent back to ImageDecisionAgent for retries.
MedicalAgent/
app.py
requirements.txt
.env.example
src/medical_agent/
config.py
llm.py
state.py
agents/nodes.py
graph/workflow.py
tools/cnn_tool.py
tools/vlm_tool.py
tools/search_tool.py
- Python 3.10+
- A valid Google Gemini API key
- (Optional) Ollama running locally with Gemma for fallback
From the MedicalAgent folder:
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txtOptional environment variables:
set GEMINI_API_KEY=your_api_key_here
set GEMINI_MODEL=gemini-2.0-flash
set GEMINI_BASE_URL=https://generativelanguage.googleapis.com
set OLLAMA_FALLBACK_ENABLED=true
set OLLAMA_BASE_URL=http://localhost:11434
set OLLAMA_MODEL=gemma3:4b
set CRITIC_CONFIDENCE_THRESHOLD=0.65
set MAX_RETRY_LOOPS=2
set MEDICAL_AGENT_LOG_LEVEL=INFO
set CHEST_XRAY_MODEL=dima806/chest_xray_pneumonia_detection
set BLIP_CAPTION_MODEL=Salesforce/blip-image-captioning-base
set BLIP_VQA_MODEL=Salesforce/blip-vqa-basestreamlit run app.pyThen open the local Streamlit URL in your browser.
- TensorFlow and BLIP pretrained weights are downloaded automatically on first use.
- Subsequent runs use local cache and are faster.
- PlannerAgent interprets user goal and sets strategy.
- ImageDecisionAgent dynamically selects the next tool.
- Tool nodes run CNN / VLM / Research as callable capabilities.
- CriticAgent reflects on confidence and can trigger retries.
- FinalResponseAgent synthesizes a single user-facing explanation.
All agents read and write a shared memory state in LangGraph.