Distributed Search Service

A production-ready document search service built with FastAPI, Elasticsearch, and Redis. Features multi-tenant isolation, rate limiting, caching, and comprehensive error handling.

Features

Full-Text Search: Powered by Elasticsearch with support for fuzzy matching and multi-field search
Multi-Tenant Isolation: Complete data isolation between tenants using tenant IDs
Redis Caching: Fast search result caching with automatic invalidation
Rate Limiting: Per-tenant rate limiting to prevent abuse
Pagination: Efficient pagination for large result sets
Error Handling: Comprehensive error handling with retry logic
Health Checks: Real-time monitoring of dependencies
Logging: Structured logging for debugging and monitoring
Input Validation: Pydantic models for request/response validation
Docker Support: Full Docker Compose setup for local development

Architecture

├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI application and endpoints
│   ├── config.py            # Configuration management
│   ├── logger.py            # Logging setup
│   ├── models.py            # Pydantic models
│   ├── search_client.py     # Elasticsearch client
│   └── cache.py             # Redis cache and rate limiting
├── docker-compose.yml       # Docker services configuration
├── Dockerfile              # Application container
├── requirements.txt        # Python dependencies
└── .env.example           # Environment variables template

Quick Start

Prerequisites

Docker and Docker Compose
Python 3.11+ (for local development)

Using Docker Compose (Recommended)

Clone and navigate to the project:
```
cd search_service
```
Create environment file:
```
cp .env.example .env
```
Start all services:
```
docker-compose up -d
```
Check service health:
```
curl http://localhost:8000/health
```

Local Development

Install dependencies:
```
pip install -r requirements.txt
```

Start Elasticsearch and Redis:

docker-compose up -d elasticsearch redis

Set environment variables:

export ELASTICSEARCH_URL=http://localhost:9200
export REDIS_URL=redis://localhost:6379

Run the application:
```
uvicorn app.main:app --reload
```

API Documentation

Once running, access the interactive API docs at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

API Endpoints

Health Check

GET /health

Index Document

POST /documents
Headers: X-Tenant-ID: tenant-123
Body: {
  "title": "My Document",
  "content": "Document content here",
  "tags": ["tag1", "tag2"],
  "metadata": {"key": "value"}
}

Search Documents

GET /search?q=query&page=1&size=10&fuzzy=false
Headers: X-Tenant-ID: tenant-123

Query Parameters:

q: Search query (required)
page: Page number (default: 1)
size: Results per page (default: 10, max: 100)
fields: Comma-separated fields to search (default: title,content)
fuzzy: Enable fuzzy matching (default: false)

Get Document

GET /documents/{doc_id}
Headers: X-Tenant-ID: tenant-123

Delete Document

DELETE /documents/{doc_id}
Headers: X-Tenant-ID: tenant-123

Configuration

All configuration is managed through environment variables. See .env.example for available options:

Variable	Default	Description
ELASTICSEARCH_URL	http://elasticsearch:9200	Elasticsearch connection URL
REDIS_URL	redis://redis:6379	Redis connection URL
RATE_LIMIT_REQUESTS	100	Max requests per window
RATE_LIMIT_WINDOW	60	Rate limit window in seconds
REDIS_CACHE_TTL	300	Cache TTL in seconds
SEARCH_DEFAULT_SIZE	10	Default search results per page
LOG_LEVEL	INFO	Logging level

Security Features

Multi-Tenant Isolation: All operations enforce tenant boundaries
Rate Limiting: Prevents abuse with configurable per-tenant limits
Input Validation: Pydantic models validate all inputs
Error Handling: Secure error messages without information leakage

Monitoring

Health Check

curl http://localhost:8000/health

Returns status of all dependencies:

{
  "status": "healthy",
  "dependencies": {
    "elasticsearch": "up",
    "redis": "up"
  }
}

Logs

View application logs:

docker-compose logs -f app

Testing

Example API Requests Using curl

1. Health Check

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "dependencies": {
    "elasticsearch": "up",
    "redis": "up"
  }
}

2. Index a Document

curl -X POST http://localhost:8000/documents \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: tenant-123" \
  -d '{
    "title": "Test Document",
    "content": "This is a test document with some content",
    "tags": ["test", "example"],
    "metadata": {"author": "John Doe", "category": "testing"}
  }'

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Test Document",
  "content": "This is a test document with some content",
  "tags": ["test", "example"],
  "metadata": {"author": "John Doe", "category": "testing"},
  "tenant_id": "tenant-123",
  "created_at": "2026-01-23T10:30:00Z"
}

3. Search Documents (Basic)

curl "http://localhost:8000/search?q=test&page=1&size=10" \
  -H "X-Tenant-ID: tenant-123"

Response:

{
  "results": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "score": 2.45,
      "document": {
        "title": "Test Document",
        "content": "This is a test document with some content",
        "tags": ["test", "example"],
        "tenant_id": "tenant-123",
        "created_at": "2026-01-23T10:30:00Z"
      }
    }
  ],
  "total": 1,
  "page": 1,
  "size": 10,
  "total_pages": 1
}

4. Search with Fuzzy Matching

curl "http://localhost:8000/search?q=docment&fuzzy=true&page=1&size=10" \
  -H "X-Tenant-ID: tenant-123"

5. Search Specific Fields

curl "http://localhost:8000/search?q=test&fields=title&page=1&size=10" \
  -H "X-Tenant-ID: tenant-123"

6. Get Specific Document

# Replace {doc_id} with actual document ID from previous responses
curl http://localhost:8000/documents/550e8400-e29b-41d4-a716-446655440000 \
  -H "X-Tenant-ID: tenant-123"

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Test Document",
  "content": "This is a test document with some content",
  "tags": ["test", "example"],
  "metadata": {"author": "John Doe", "category": "testing"},
  "tenant_id": "tenant-123",
  "created_at": "2026-01-23T10:30:00Z"
}

7. Delete Document

# Replace {doc_id} with actual document ID
curl -X DELETE http://localhost:8000/documents/550e8400-e29b-41d4-a716-446655440000 \
  -H "X-Tenant-ID: tenant-123"

Response:

{
  "status": "deleted",
  "id": "550e8400-e29b-41d4-a716-446655440000"
}

8. Error Examples

Missing Tenant ID:

curl -X POST http://localhost:8000/documents \
  -H "Content-Type: application/json" \
  -d '{"title": "Test", "content": "Content"}'

Response (403):

{
  "detail": "X-Tenant-ID header is required"
}

Invalid Search Query:

curl "http://localhost:8000/search?q=&page=1&size=10" \
  -H "X-Tenant-ID: tenant-123"

Response (422):

{
  "detail": [
    {
      "loc": ["query", "q"],
      "msg": "ensure this value has at least 1 characters",
      "type": "value_error.any_str.min_length"
    }
  ]
}

Troubleshooting

Elasticsearch connection failed

Ensure Elasticsearch is running: docker-compose ps
Check Elasticsearch logs: docker-compose logs elasticsearch
Wait for Elasticsearch to be ready (can take 30-60 seconds on first start)

Redis connection failed

Ensure Redis is running: docker-compose ps
Check Redis logs: docker-compose logs redis

Rate limiting issues

Check rate limit configuration in .env
Disable rate limiting: RATE_LIMIT_ENABLED=false

Performance Tuning

Elasticsearch:
- Adjust heap size: ES_JAVA_OPTS=-Xms1g -Xmx1g
- Increase shards for large datasets
Redis:
- Increase max connections: REDIS_MAX_CONNECTIONS=100
- Adjust cache TTL: REDIS_CACHE_TTL=600
Rate Limiting:
- Adjust limits per tenant needs
- Use sliding window for smoother rate limiting

Production Deployment

For production deployment, refer to the comprehensive production readiness analysis:

Documentation

Architecture: ARCHITECTURE.md - Complete technical architecture
Production Readiness: ARCHITECTURE.md Section 9 - Detailed production considerations
Quick Reference: ARCHITECTURE_SUMMARY.md - At-a-glance production summary

Key Production Features

Scalability:

Horizontal scaling to handle 100x growth
Multi-AZ deployment across 3 availability zones
Auto-scaling policies based on CPU and request rate

Resilience:

Circuit breakers and retry mechanisms
Multi-region failover (RTO: 15 min, RPO: 5 min)
Graceful degradation when dependencies fail

Security:

JWT-based authentication (OAuth2/OIDC)
End-to-end encryption (TLS 1.3, mTLS)
Compliance ready (GDPR, SOC 2, HIPAA)

Observability:

Full metrics stack (Prometheus + Grafana)
Distributed tracing (OpenTelemetry + Jaeger)
Centralized logging (ELK/CloudWatch)
24/7 alerting with on-call rotation

SLA:

99.95% availability target (~22 min downtime/month)
p95 latency < 500ms
Error rate < 0.1%

Deployment Options

Kubernetes (Recommended):

# Deploy with Helm
helm install search-service ./helm-chart \
  --set replicaCount=3 \
  --set elasticsearch.nodes=3 \
  --set redis.sentinel.enabled=true

# Blue-green deployment
kubectl apply -f k8s/blue-green/

# Canary deployment (progressive)
kubectl apply -f k8s/canary/

Docker Compose (Development Only):

docker-compose -f docker-compose.prod.yml up -d

Monitoring

Health Endpoints:

Liveness: GET /health/live
Readiness: GET /health/ready

Metrics Endpoint:

Prometheus: GET /metrics

Key Metrics to Monitor:

Request rate, error rate, latency (RED)
Cache hit rate (target: >70%)
Elasticsearch query latency (target: <200ms p95)
Circuit breaker state

Backup and Recovery

Automated Backups:

Elasticsearch snapshots every 6 hours to S3
Redis RDB + AOF persistence
30-day retention policy

Disaster Recovery:

Monthly DR drills
Documented runbooks
Multi-region replication

Cost Estimate

Production deployment (100x scale):

~$21,000/month for infrastructure
See ARCHITECTURE_SUMMARY.md for detailed breakdown

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
ARCHITECTURE_SUMMARY.md		ARCHITECTURE_SUMMARY.md
DEMO.md		DEMO.md
DEMO_CHECKLIST.md		DEMO_CHECKLIST.md
Dockerfile		Dockerfile
README.md		README.md
REQUIREMENTS_CHECKLIST.md		REQUIREMENTS_CHECKLIST.md
demo.ps1		demo.ps1
demo.sh		demo.sh
docker-compose.yml		docker-compose.yml
quick-start.ps1		quick-start.ps1
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Distributed Search Service

Features

Architecture

Quick Start

Prerequisites

Using Docker Compose (Recommended)

Local Development

API Documentation

API Endpoints

Health Check

Index Document

Search Documents

Get Document

Delete Document

Configuration

Security Features

Monitoring

Health Check

Logs

Testing

Example API Requests Using curl

1. Health Check

2. Index a Document

3. Search Documents (Basic)

4. Search with Fuzzy Matching

5. Search Specific Fields

6. Get Specific Document

7. Delete Document

8. Error Examples

Troubleshooting

Elasticsearch connection failed

Redis connection failed

Rate limiting issues

Performance Tuning

Production Deployment

Documentation

Key Production Features

Deployment Options

Monitoring

Backup and Recovery

Cost Estimate

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages