An end-to-end MLOps project that trains and deploys a machine learning model to predict diabetes based on patient health data. Built with FastAPI, Docker, and Kubernetes, this project demonstrates a real-world ML pipeline from training to deployment.
- Machine Learning:
scikit-learn,pandas,joblib - API Framework:
FastAPI - Containerization:
Docker - Orchestration:
Kubernetes - Dataset: Pima Indians Diabetes Dataset
The API provides two main endpoints and interactive documentation powered by Swagger UI.
A simple health check endpoint to confirm the API is running.
The interactive documentation allows you to easily test the /predict endpoint.
You can send the patient's data as a JSON object to the /predict endpoint and receive a boolean prediction.
This project uses a supervised machine learning model to perform a binary classification task (diabetic or not diabetic).
The model is trained on the Pima Indians Diabetes Dataset, a standard dataset from the UCI Machine Learning Repository. It contains health data for female patients of Pima Indian heritage.
To keep the model lightweight and fast for an API, we use the following 5 features:
Pregnancies: Number of times pregnantGlucose: Plasma glucose concentrationBloodPressure: Diastolic blood pressure (mm Hg)BMI: Body mass index (weight in kg/(height in m)^2)Age: Age (years)
The model is a Random Forest Classifier from scikit-learn.
A Random Forest is an ensemble learning method. Instead of relying on a single decision tree, it builds many individual decision trees during training. When making a prediction, it collects the "vote" from each tree and chooses the class (diabetic or not) that receives the most votes. This approach is highly effective, more accurate than a single tree, and helps prevent overfitting.
.
├── app
│ ├── __init__.py # Makes 'app' a Python package
│ ├── main.py # FastAPI app logic
│ ├── model.py # Model loading and prediction logic
│ ├── schema.py # Pydantic data models
│ └── training
│ ├── __init__.py # Makes 'training' a sub-package
│ └── trainer.py # Model training script
├── .gitignore
├── Dockerfile # Multi-stage Dockerfile
├── k8s-deploy.yml # Kubernetes deployment manifest
├── README.md
└── requirements.txt # Python dependenciesgit clone https://github.com/your-username/mlops-diabetes-predictor.git
cd mlops-diabetes-predictorpython -m venv .venv
source .venv/bin/activatepip install -r requirements.txtThis will run the training script and save diabetes_model.pkl to the project root.
The Docker build performs this step automatically.
python -m app.training.traineruvicorn app.main:app --reloadNow access the interactive API docs at 👉 http://127.0.0.1:8000/docs
This section explains how to build, push, and deploy the API in the cloud.
The multi-stage Dockerfile handles both model training and packaging.
#Replace 'your-username' with your Docker Hub username
docker build -t your-username/mlops-diabetes-predictor:latest .Before deploying to Kubernetes, push the image to your registry.
# Log in (if not already)
docker login
# Push image
docker push your-username/mlops-diabetes-predictor:latestEdit k8s-deploy.yml to set your correct Docker image: name, then apply:
kubectl apply -f k8s-deploy.ymlCheck the external IP for your LoadBalancer service:
kubectl get service diabetes-predictor-service --watchWait until the EXTERNAL-IP changes from <pending> to a real IP (e.g., 20.123.45.67).
Your API will then be publicly available at: http://<YOUR-EXTERNAL-IP>.
Access the Swagger UI docs at: http://<YOUR-EXTERNAL-IP>/docs


