A comprehensive automated exploratory data analysis platform designed for MVP investor demos. This platform provides end-to-end data analysis capabilities through 14 sequential phases.
- Backend: FastAPI + Python 3.11+
- Frontend: React 18+ + TypeScript + shadcn/ui
- Infrastructure: Docker + Docker Compose
- Storage: Local filesystem (no database required for MVP)
eda-platform/
├── backend/ # FastAPI backend service
├── frontend/ # React frontend application
├── docker-compose.yml # Multi-container setup
├── .env.example # Environment variables template
├── .gitignore # Git ignore rules
└── README.md # This file
- Docker & Docker Compose
- Python 3.11+ (for local development)
- Node.js 18+ (for frontend development)
# Clone repository
git clone <repository-url>
cd eda-platform
# Start all services
docker-compose up --build
# Access applications
# Backend API: http://localhost:8000
# Frontend: http://localhost:3000
# API Docs: http://localhost:8000/api/docs# Backend
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reload
# Frontend (separate terminal)
cd frontend
npm install
npm run dev| Phase | Name | Description | Status |
|---|---|---|---|
| 0 | Foundation & Architecture | Project setup and infrastructure | ✅ Complete |
| 1 | Goal & KPIs Definition | Define business objectives | ⏳ Pending |
| 2 | Data Ingestion | Upload and validate data files | ⏳ Pending |
| 3 | Schema Discovery | Analyze data structure | ⏳ Pending |
| 4 | Data Profiling | Generate comprehensive statistics | ⏳ Pending |
| 5 | Missing Data Analysis | Identify missing data patterns | ⏳ Pending |
| 6 | Data Standardization | Clean and standardize formats | ⏳ Pending |
| 7 | Feature Engineering | Create derived features | ⏳ Pending |
| 7.5 | Encoding & Scaling | Encode categorical variables | ⏳ Pending |
| 8 | Data Merging | Combine multiple datasets | ⏳ Pending |
| 9 | Correlation Analysis | Analyze variable relationships | ⏳ Pending |
| 9.5 | Business Validation | Validate against business rules | ⏳ Pending |
| 10 | Data Packaging | Prepare final dataset | ⏳ Pending |
| 10.5 | Train/Test Split | Split data for modeling | ⏳ Pending |
| 11 | Advanced Analytics | Perform advanced statistical analysis | ⏳ Pending |
| 11.5 | Feature Selection | Select optimal features | ⏳ Pending |
| 12 | Text Analysis | NLP analysis for text data | ⏳ Pending |
| 13 | Monitoring & Reporting | Generate reports and monitoring | ⏳ Pending |
- Finance
- Healthcare
- Retail
- Manufacturing
- Technology
- Education
- Government
- General
- CSV
- Excel (XLSX)
- Parquet
- JSON
- Swagger UI: http://localhost:8000/api/docs
- ReDoc: http://localhost:8000/api/redoc
- Health Check: http://localhost:8000/health
- Full docs: see
docs/README.mdfor Getting Started, Architecture, Phases, and more.
- Each phase must be implemented sequentially
- Run validation script before proceeding to next phase
- All phases must pass validation checks
- Follow the established project structure
# Validate current phase
python backend/validation_scripts/validate_phase0.py- Follow the phase-by-phase implementation approach
- Ensure all validation checks pass
- Maintain backward compatibility
- Document all changes
[Add your license here]
For questions or issues, please refer to the project documentation or create an issue in the repository.