Skip to content

gmelli/OpenAI_DeepResearch-aget

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenAI Deep Research Implementation

A comprehensive research system implementing two complementary OpenAI approaches for automated research: custom OpenAI Agents orchestration and native Deep Research API integration.

πŸ“– Introduction

This project demonstrates how to build production-ready research systems using OpenAI's latest capabilities. We've implemented and compared two distinct approaches:

πŸ€– OpenAI Agents System: Build your own research workflow using OpenAI's official agents SDK with custom orchestration, agent handoffs, and real-time streaming. Perfect for scenarios requiring fine-grained control over the research process.

πŸ”¬ Native Deep Research API: Leverage OpenAI's specialized research models (o3-deep-research, o4-mini-deep-research) that provide professional-grade research reports with comprehensive citations and web search integration.

🎯 Unified OpenAI Intelligence: Both OpenAI approaches work together through an intelligent interface that automatically selects the best method based on query complexity, or allows manual override for specific use cases.

Why This Matters

  • Flexibility: Choose between speed (OpenAI Agents) and comprehensiveness (Deep Research API)
  • Production Ready: Includes error handling, fallbacks, and real-world deployment considerations
  • Extensible: Clean architecture allows easy addition of new research methods
  • Educational: Complete implementation with lessons learned and best practices

Key Achievements

βœ… Dual OpenAI Implementation: Both OpenAI Agents and native Deep Research API working seamlessly
βœ… Intelligent Routing: Auto-selects optimal OpenAI method based on query characteristics
βœ… Professional Citations: Rich metadata with exact text positions and excerpts
βœ… Error Resilience: Graceful handling of API limitations and organization requirements
βœ… Real-time Streaming: Progress indicators and transparent research process
βœ… Production Considerations: Security, scalability, cost optimization, and monitoring

πŸ“š Table of Contents

πŸš€ How to Use

Quick Start

  1. Install and Configure

    pip install openai-agents>=0.0.19 openai>=1.88 pydantic
    export OPENAI_API_KEY="your-api-key-here"
  2. Run Validation Test

    python validation_test.py
  3. Start Researching

    python openai_research_interface.py

Basic Usage

Option 1: Let the system choose the best OpenAI method (Recommended)

from openai_research_interface import ResearchInterface

# Initialize with auto-selection
interface = ResearchInterface()

# Ask any research question
result = await interface.research(
    "What are the latest developments in LLM orchestration frameworks?"
)

print(f"Method used: {result.method_used}")
print(f"Citations: {result.metadata.get('citations_count', 0)}")
print(result.result)

Option 2: Use specific OpenAI research method

from openai_research_interface import ResearchInterface, ResearchMethod

interface = ResearchInterface()

# Force OpenAI Agents for quick technical questions
quick_result = await interface.research(
    "How to handle errors in LangChain?",
    method=ResearchMethod.OPENAI_AGENTS
)

# Force Deep Research API for comprehensive analysis
deep_result = await interface.research(
    "Analyze the competitive landscape of AI agent frameworks",
    method=ResearchMethod.DEEP_RESEARCH_API
)

Advanced Usage

Custom Configuration

# Configure specific models and parameters
interface = ResearchInterface(
    method=ResearchMethod.DEEP_RESEARCH_API,
    deep_research_model="o4-mini-deep-research-2025-06-26"  # Faster model
)

# Add custom system instructions
result = await interface.research(
    "Research question here",
    system_message="You are a technical analyst. Focus on implementation details and code examples.",
    summary="detailed"  # For comprehensive reasoning steps
)

Access Individual OpenAI Systems

# Use OpenAI Agents system directly
from openai_agents_research import DeepResearchSystem
openai_agents_system = DeepResearchSystem()
result = await openai_agents_system.basic_research("Your query")

# Use Deep Research API directly  
from openai_deep_research_api import OpenAIDeepResearchAPI
api = OpenAIDeepResearchAPI(model="o3-deep-research-2025-06-26")
result = await api.research("Your query")

When to Use Each Method

Use Case Recommended OpenAI Method Why
Quick technical questions OpenAI Agents System Faster response (30-60s), good for specific answers
Comprehensive research Deep Research API Professional reports with rich citations (2-5 min)
Exploratory research Auto-selection Let the system choose based on query complexity
Production applications Unified Interface Consistent API with intelligent fallbacks

πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              OpenAI Research Interface                  β”‚
β”‚           (openai_research_interface.py)                β”‚
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Auto-Selection β”‚    β”‚     Method Override         β”‚ β”‚
β”‚  β”‚     Logic       β”‚    β”‚      Capability             β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚                β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                    β”‚      β”‚                   β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OpenAI Agents    β”‚  β”‚ Deep Research API   β”‚
β”‚ System           β”‚  β”‚ (Native OpenAI)     β”‚
β”‚                  β”‚  β”‚                     β”‚
β”‚ Triage Agent     β”‚  β”‚ o3-deep-research    β”‚
β”‚ Clarify Agent    β”‚  β”‚ o4-mini-deep-researchβ”‚
β”‚ Instruction Agentβ”‚  β”‚                     β”‚
β”‚ Research Agent   β”‚  β”‚ Professional Reportsβ”‚
β”‚                  β”‚  β”‚ Rich Citations      β”‚
β”‚ Real-time Stream β”‚  β”‚ Web Search         β”‚
β”‚ OpenAI SDK       β”‚  β”‚ Optimized Service   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ Functional Specifications

Core Components

1. OpenAI Agents Research System (openai_agents_research.py)

Purpose: Custom orchestration using OpenAI's official agents SDK

Architecture:

  • Triage Agent (gpt-4o-mini): Routes queries based on complexity
  • Clarifying Agent (gpt-4o-mini): Asks follow-up questions with structured output
  • Instruction Agent (gpt-4o-mini): Converts queries to detailed research briefs
  • Research Agent (gpt-4o): Performs comprehensive web-based research

Features:

  • Real-time event streaming with progress indicators
  • Custom agent handoff logic with conditional routing
  • WebSearchTool integration for current information
  • Structured output using Pydantic models
  • Citation extraction and formatting

Input/Output:

# Input: Natural language query
query = "How do LLM orchestration frameworks handle production deployment?"

# Output: Structured research result
result = {
    "text": "Comprehensive research findings...",
    "agent_flow": ["Triage", "Instruction", "Research"],
    "sources": ["url1", "url2", ...],
    "streaming_events": [...]
}

2. Deep Research API (openai_deep_research_api.py)

Purpose: Native OpenAI Deep Research service integration

Models:

  • o3-deep-research-2025-06-26: Most capable, comprehensive analysis
  • o4-mini-deep-research-2025-06-26: Faster, efficient for simpler queries

Features:

  • Professional-grade research reports
  • Rich citation metadata with text excerpts and positions
  • Web search integration with query tracking
  • Reasoning step extraction (requires verified organization)
  • Automatic fallback handling for organization verification

API Structure:

response = client.responses.create(
    model="o3-deep-research-2025-06-26",
    input=[
        {"role": "developer", "content": [{"type": "input_text", "text": system_message}]},
        {"role": "user", "content": [{"type": "input_text", "text": query}]}
    ],
    reasoning={"summary": "auto"},  # Optional for verified orgs
    tools=[
        {"type": "web_search_preview"},
        {"type": "code_interpreter", "container": {"type": "auto", "file_ids": []}}
    ]
)

3. OpenAI Research Interface (openai_research_interface.py)

Purpose: Intelligent routing and consistent API across both OpenAI methods

Auto-Selection Logic:

def _auto_select_method(self, query: str) -> ResearchMethod:
    # Complex, comprehensive queries β†’ Deep Research API
    complex_keywords = ["landscape", "comprehensive", "analysis", "trends"]
    
    # Specific technical queries β†’ OpenAI Agents System  
    technical_keywords = ["how to", "implement", "specific", "technical"]
    
    # Fallback to available method

Usage Patterns:

# Auto-selection
interface = ResearchInterface()
result = await interface.research("Your query")

# Method override
result = await interface.research("Query", method=ResearchMethod.DEEP_RESEARCH_API)

# Configuration
interface = ResearchInterface(
    method=ResearchMethod.AUTO,
    deep_research_model="o4-mini-deep-research-2025-06-26"
)

Data Models

Citation Structure

class Citation(BaseModel):
    title: str                 # Source title
    url: str                   # Source URL
    start_index: int          # Character position in text
    end_index: int            # End character position  
    excerpt: str              # Extracted text snippet

Unified Result Format

class UnifiedResearchResult(BaseModel):
    query: str                # Original query
    method_used: str          # "multi_agent" | "deep_research_api"
    result: str               # Research findings
    metadata: Dict[str, Any]  # Method-specific metadata

πŸ› οΈ Technical Specifications

Dependencies

pip install openai-agents>=0.0.19 openai>=1.88 pydantic python-dotenv

Environment Configuration

# Required
export OPENAI_API_KEY="your-api-key-here"

# Optional
export OPENAI_AGENTS_DISABLE_TRACING="1"  # For zero data retention

File Structure

β”œβ”€β”€ test_deep_research.py          # Original mock test suite
β”œβ”€β”€ openai_agents_research.py      # OpenAI Agents SDK implementation  
β”œβ”€β”€ openai_deep_research_api.py    # Deep Research API wrapper
β”œβ”€β”€ openai_research_interface.py   # Unified OpenAI interface with auto-selection
β”œβ”€β”€ validation_test.py             # Component validation tests
β”œβ”€β”€ .env.example                   # Environment template
β”œβ”€β”€ IMPLEMENTATION_SUMMARY.md      # Technical achievements summary
└── README.md                      # This documentation

Performance Characteristics

Metric OpenAI Agents System Deep Research API
Response Time 30-60 seconds 2-5 minutes
API Calls 4-6 (agent pipeline) 1 (optimized service)
Citation Quality Basic extraction Professional with excerpts
Customization High (full agent control) Medium (API parameters)
Cost Multiple model calls Single service call
Reliability Depends on agent handoffs Native service stability

πŸ”§ Installation and Setup

1. Clone and Install

git clone <repository>
cd OpenAI_DeepResearch
pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env with your OpenAI API key
export OPENAI_API_KEY="your-key-here"

3. Validate Installation

python validation_test.py

4. Run Research

# OpenAI unified interface (recommended)
python openai_research_interface.py

# OpenAI Agents system only
python openai_agents_research.py

# Deep Research API only  
python openai_deep_research_api.py

🎯 Usage Examples

Basic Research Query

from openai_research_interface import ResearchInterface

interface = ResearchInterface()
result = await interface.research(
    "What are the best practices for deploying LLM applications in production?"
)

print(f"Method used: {result.method_used}")
print(f"Result: {result.result}")
print(f"Citations: {result.metadata.get('citations_count', 0)}")

Force Specific OpenAI Method

from openai_research_interface import ResearchInterface, ResearchMethod

# Use OpenAI Agents for technical questions
result = await interface.research(
    "How to implement error handling in LangChain?",
    method=ResearchMethod.OPENAI_AGENTS
)

# Use Deep Research API for comprehensive analysis
result = await interface.research(
    "Analyze the competitive landscape of LLM orchestration tools",
    method=ResearchMethod.DEEP_RESEARCH_API,
    summary="detailed"
)

Custom Configuration

# Configure with specific OpenAI model
interface = ResearchInterface(
    method=ResearchMethod.DEEP_RESEARCH_API,
    deep_research_model="o4-mini-deep-research-2025-06-26"
)

# Add custom system message
result = await interface.research(
    query="Your research question",
    system_message="You are a technical analyst focusing on implementation details..."
)

πŸ“š Non-Functional Specifications

Lessons Learned

1. API Integration Complexity

Challenge: OpenAI agents SDK has different patterns than expected

  • Method signatures differ from documentation (starting_agent vs agent)
  • Event streaming requires defensive programming with hasattr() checks
  • Async patterns need careful handling (some methods sync, others async)

Solution: Implement robust error handling and API exploration

# Defensive event handling
if hasattr(event, 'type') and hasattr(event, 'new_agent'):
    print(f"Agent: {event.new_agent.name}")

2. Deep Research API Organization Requirements

Challenge: Reasoning summaries require verified organizations Impact: 400 errors for unverified accounts

Solution: Graceful fallback mechanism

try:
    response = client.responses.create(reasoning={"summary": "auto"}, ...)
except Exception as e:
    if "must be verified" in str(e):
        response = client.responses.create(...)  # Without reasoning

3. Performance vs Quality Trade-offs

Finding: Deep Research API provides superior quality but takes 2-5 minutes OpenAI Agents: Faster (30-60s) but requires more orchestration complexity

Best Practice: Use auto-selection based on query complexity

# Route complex queries to Deep Research API
if any(word in query.lower() for word in ["comprehensive", "landscape", "analysis"]):
    return ResearchMethod.DEEP_RESEARCH_API

Best Practices

1. Architecture Design

Unified Interface Pattern:

  • Single entry point for multiple implementations
  • Consistent result format regardless of underlying method
  • Intelligent routing based on query characteristics
  • Method override capability for specific use cases

Benefits:

  • Future-proof (easy to add new research methods)
  • User-friendly (simple API regardless of complexity)
  • Optimized (automatic best-method selection)

2. Error Handling Strategy

Layered Approach:

# Level 1: API-specific errors
try:
    response = client.responses.create(...)
except OrganizationVerificationError:
    # Fallback without reasoning
    
# Level 2: Method-level errors  
except DeepResearchAPIError:
    # Fall back to multi-agent system
    
# Level 3: System-level errors
except Exception as e:
    # Return error response with context

3. Citation Management

Professional Standards:

  • Extract exact text positions (start_index, end_index)
  • Include source metadata (title, URL, excerpt)
  • Preserve citation context for verification
  • Format for both human reading and programmatic use

4. Configuration Management

Environment-First Approach:

# Prefer environment variables
api_key = os.getenv("OPENAI_API_KEY")

# Support parameter override
def __init__(self, api_key: Optional[str] = None):
    self.api_key = api_key or os.getenv("OPENAI_API_KEY")

5. Testing Strategy

Component Validation:

  • Test each system independently
  • Validate integration points
  • Test error conditions and fallbacks
  • Performance benchmarking for method selection

Production Considerations

1. Scalability

OpenAI Agents System: Scales with OpenAI API limits, manage concurrent agent calls Deep Research API: Natural rate limiting due to research complexity

2. Cost Optimization

Strategy: Use cheaper models for routing decisions

# Use gpt-4o-mini for triage, gpt-4o for research
triage_agent = Agent(model="gpt-4o-mini", ...)
research_agent = Agent(model="gpt-4o", ...)

3. Monitoring and Observability

Key Metrics:

  • Response times by method
  • Citation quality scores
  • Error rates and fallback usage
  • Cost per research query

4. Security Considerations

Data Handling:

  • Zero data retention mode: OPENAI_AGENTS_DISABLE_TRACING=1
  • API key management through environment variables
  • Input sanitization for research queries

Performance Optimization

1. Caching Strategy

# Cache research results for repeated queries
@lru_cache(maxsize=100)
def research_cached(query_hash: str) -> UnifiedResearchResult:
    return self.research(query)

2. Streaming Implementation

OpenAI Agents: Real-time progress updates Deep Research: Background processing with status checks

3. Resource Management

Timeout Configuration:

client = OpenAI(timeout=600.0)  # 10 minutes for deep research

πŸš€ Future Enhancements

1. MCP Integration

Add Model Context Protocol support for internal document search:

tools=[
    {"type": "web_search_preview"},
    {"type": "mcp", "server_url": "https://your-mcp-server/sse/"}
]

2. Query Enhancement Pipeline

Implement clarification and rewriting workflow:

# Clarification β†’ Rewriting β†’ Research
clarified = await clarify_query(original_query)
enhanced = await rewrite_query(clarified)
result = await research(enhanced)

3. Advanced Routing

Machine learning-based method selection:

# Train classifier on query characteristics vs optimal method
method = ml_classifier.predict(query_features)

4. Result Synthesis

Combine results from multiple methods:

# Run both methods for critical queries
multi_result = await multi_agent_research(query)
deep_result = await deep_research_api(query)
synthesized = synthesize_results([multi_result, deep_result])

πŸ“ž Support and Troubleshooting

Common Issues

  1. Organization not verified for reasoning: Expected for new accounts, system handles gracefully
  2. Timeout errors: Deep Research API can take 2-5 minutes, adjust timeouts accordingly
  3. Agent handoff failures: Check API key permissions and rate limits
  4. Import errors: Ensure openai-agents>=0.0.19 and openai>=1.88

Debugging

# Enable verbose logging
result = await interface.research(query, verbose=True)

# Test individual components
python validation_test.py

# Check API connectivity  
python -c "from openai import OpenAI; print(OpenAI().models.list())"

System Status: βœ… Production Ready - OpenAI-Focused Implementation Last Updated: 2025-06-30 Implementation: Complete with dual OpenAI research capability

About

Cognitive agent for OpenAI_DeepResearch dual-implementation research system

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages