Souritra Banerjee SouRitra01

Souritra Banerjee

Data Scientist · Supply Chain ML & AI Systems · Amazon

Building production-grade ML and simulation systems for large-scale supply chain operations — not just notebooks.

What I Work On

At Amazon, I own end-to-end systems that drive real planning decisions:

Demand Forecasting — Replaced per-lane time-series models with multivariate XGBoost/LightGBM ensembles across 10+ regions. Features engineered from shipment flows, lead time distributions (mean + P95), and event proximity signals. MAPE improved 5.2% → 3.1%, informing $450K+ in inventory allocation decisions.

THANOS — Network Simulation — Monte Carlo simulation platform modelling stochastic demand, capacity distributions, and routing logic across hundreds of supply chain lanes. Produces risk distributions over network outcomes — not point estimates — to stress-test strategic plans. Informed 3 FC expansion decisions (~$2M+ projected impact).

RAG-Based Operational Intelligence — Built over 100K+ heterogeneous operational documents using hybrid retrieval (dense embeddings + BM25) and hierarchical chunking. Reduced decision error rate by 40%. Designed three-dimensional evaluation framework (retrieval precision, stakeholder adoption rate, decision velocity) — adopted org-wide as the standard for LLM system validation.

Workforce Spillover Optimization — Reformulated shift staffing as a stochastic allocation problem. Applied asymmetric cost weighting to derive quantile-based targets for high-variance periods.

Pinned Projects

email-task-assignment

End-to-end NLP pipeline for task detection and responsible-person assignment in email threads. BERT sentence-pair classifier fine-tuned on the Enron EPA dataset. LR baseline F1=0.925, DistilBERT F1=0.703. Served via FastAPI. PyTorch BERT HuggingFace FastAPI Scikit-learn

middle-mile-forecasting-simulator

Multi-lane demand forecasting simulator — XGBoost/LightGBM vs baselines, with shipment flow, lead time (mean + P95), and event proximity features. Walk-forward evaluation at daily and weekly (cycle-level) granularity. XGBoost LightGBM Statsmodels Python

thanos-network-optimization-demo

Monte Carlo simulation engine for supply chain network planning. Models stochastic demand (log-normal), transit disruptions, and capacity constraints. Compares constrained vs unconstrained scenarios via cost and spillover risk distributions. Monte Carlo NumPy SciPy NetworkX

Tech Stack

Domain	Tools
ML & Modeling	XGBoost, LightGBM, PyTorch, TensorFlow, Scikit-learn
LLMs & AI	RAG Pipelines, LangChain, HuggingFace Transformers, Prompt Engineering
Simulation	Monte Carlo, SciPy, NumPy, NetworkX
MLOps	SageMaker, MLflow, Docker, Model Monitoring
Data	Python, SQL (PostgreSQL, Redshift), A/B Testing, Causal Inference
Cloud	AWS (SageMaker, Lambda, S3, Redshift, EC2)

Impact


$2M+	Projected annual impact from THANOS network planning
$450K+	Inventory allocation decisions informed by forecasting system
40%	MAPE improvement (5.2% → 3.1%) on demand forecasting
40%	Error rate reduction on RAG-based decision support
8+	Production ML models owned end-to-end
10K+	Weekly supply chain events handled via automated NLP triaging

Connect

Building ML systems is easy. Building ML systems people trust is the real work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly