A complete open-source AI voice assistant with automatic language detection supporting Hindi, Punjabi, Tamil, Bengali, and English. Built with FastAPI, React, and modern AI models.
- 🗣️ Real-time Voice Interaction - Speak naturally and get instant responses
- 🌐 Multilingual Support - Hindi, Punjabi, Tamil, Bengali, English
- 🤖 Auto Language Detection - Automatically detects the language you're speaking
- 🎧 Advanced Audio Processing - VAD, noise reduction, and audio enhancement
- 💬 Text & Voice Input - Type or speak your messages
- 🔊 Natural TTS - High-quality text-to-speech in multiple languages
- 📱 Modern Web UI - Responsive React interface with real-time updates
- 🐳 Docker Ready - One-command deployment with Docker Compose
- 🔒 Privacy First - Runs completely offline after setup
# Clone the repository
git clone https://github.com/your-username/AI-Voice-Agent.git
cd AI-Voice-Agent
# Deploy with Docker (recommended)
./deploy.shThat's it! Open http://localhost in your browser and start talking! 🎉
If you prefer manual setup or development:
# 1. Setup environment
cp .env.example .env
# 2. Start services
docker-compose up -d
# 3. Setup models
docker-compose exec ollama ollama pull llama3.1
# 4. Access the app
open http://localhost- RAM: 8GB minimum, 16GB recommended
- CPU: 4 cores minimum, 8 cores recommended
- Storage: 20GB free space
- GPU: Optional NVIDIA GPU for better performance
- Docker & Docker Compose
- Modern web browser with microphone support
- Internet connection (for initial model downloads)
graph TD
A[Web Browser] --> B[React Frontend]
B --> C[FastAPI Backend]
C --> D[Audio Processor]
C --> E[Language Detector]
C --> F[STT Service]
C --> G[LLM Service]
C --> H[TTS Service]
F --> I[Whisper Models]
G --> J[Ollama/LLaMA]
H --> K[Piper/IndicTTS]
| Language | Native | Code | STT | LLM | TTS | Status |
|---|---|---|---|---|---|---|
| English | English | en | ✅ | ✅ | ✅ | Full Support |
| Hindi | हिन्दी | hi | ✅ | ✅ | ✅ | Full Support |
| Tamil | தமிழ் | ta | ✅ | ✅ | ✅ | Full Support |
| Bengali | বাংলা | bn | ✅ | ✅ | ✅ | Full Support |
| Punjabi | ਪੰਜਾਬੀ | pa | ✅ | ✅ | ✅ | Full Support |
- Open http://localhost in your browser
- Allow microphone permissions when prompted
- Select language or use "Auto Detect"
- Click the microphone button
- Speak naturally in any supported language
- Listen to the AI response
- Click "Or type a message" below the microphone
- Type your message in any supported language
- Press Enter or click send
- Receive text and audio responses
- Set language to "Auto" for automatic detection
- System detects language from speech and text
- Responses generated in the same detected language
- Real-time language switching supported
Edit .env file to customize:
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS=http://localhost:3000,http://localhost
# Model Configuration
OLLAMA_HOST=http://ollama:11434
LLM_MODEL=llama3.1
INDIC_LLM_MODEL=indic-llama
# Audio Settings
SAMPLE_RATE=16000
VAD_AGGRESSIVENESS=2
MAX_FILE_SIZE=10485760Customize models in config/config.json:
{
"models": {
"whisper": {
"model_size": "base"
},
"ollama": {
"default_model": "llama3.1",
"indic_model": "indic-llama"
}
}
}# Backend
cd backend
python -m pip install -r requirements.txt
python main.py
# Frontend
cd frontend
npm install
npm start
# Ollama (separate terminal)
ollama serve
ollama pull llama3.1AI-Voice-Agent/
├── backend/ # FastAPI application
│ ├── app/
│ │ ├── services/ # AI services (STT, LLM, TTS)
│ │ ├── models/ # Data models
│ │ └── utils/ # Utilities
│ └── main.py # Main application
├── frontend/ # React application
│ ├── src/
│ │ ├── components/ # React components
│ │ └── utils/ # Frontend utilities
│ └── package.json
├── docker/ # Docker configurations
├── config/ # Configuration files
├── scripts/ # Setup and utility scripts
├── models/ # AI model storage
└── docker-compose.yml # Docker orchestration
GET /health- Service health checkGET /languages- Supported languages listPOST /transcribe- Audio to text transcriptionPOST /chat- Text-based chat completionWS /ws/{client_id}- WebSocket for real-time voice
Microphone Not Working
# Ensure HTTPS in production
# Check browser permissions
# Verify microphone access in browser settingsModels Not Loading
# Check disk space
df -h
# Re-download models
docker-compose exec ollama ollama pull llama3.1
# Check model directory
ls -la models/Connection Issues
# Check service status
docker-compose ps
# View logs
docker-compose logs backend
docker-compose logs frontend
docker-compose logs ollamaPerformance Issues
# Check resource usage
docker stats
# Restart services
docker-compose restart
# Use smaller models for lower-end hardware
# Edit .env: LLM_MODEL=llama3.1:8b# Backend health
curl http://localhost:8000/health
# Frontend accessibility
curl http://localhost
# Ollama status
curl http://localhost:11434/api/version
# Full system test
./scripts/test_system.sh- Response Time: < 2 seconds average
- Language Detection: < 500ms
- Audio Processing: Real-time streaming
- Memory Usage: 4-8GB depending on models
- CPU Usage: 2-4 cores active processing
- Use GPU for faster model inference
- Adjust
VAD_AGGRESSIVENESSfor your environment - Use smaller models on resource-constrained systems
- Enable Redis caching for better performance
We welcome contributions! Here's how:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow Python PEP 8 style guidelines
- Use TypeScript for new React components
- Add tests for new features
- Update documentation for API changes
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper - Speech recognition
- Ollama - LLM inference
- Piper TTS - Text-to-speech synthesis
- AI4Bharat - Indic language models
- FastAPI - Backend framework
- React - Frontend framework
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📧 Email: support@yourproject.com
- 📚 Documentation: Full Docs
- More Languages: Marathi, Gujarati, Telugu
- Voice Cloning: Custom voice synthesis
- Mobile App: React Native application
- API Keys: External model integration
- Cloud Deploy: One-click cloud deployment
- Multi-turn Conversations: Context awareness
- Function Calling: Tool integration
- Speech Translation: Real-time translation
- Offline Mode: Complete offline operation
- Plugin System: Extensible functionality
Built with ❤️ for multilingual AI interaction