Skip to content
View SouRitra01's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report SouRitra01

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SouRitra01/README.md

Souritra Banerjee

Data Scientist · Supply Chain ML & AI Systems · Amazon

Building production-grade ML and simulation systems for large-scale supply chain operations — not just notebooks.


What I Work On

At Amazon, I own end-to-end systems that drive real planning decisions:

Demand Forecasting — Replaced per-lane time-series models with multivariate XGBoost/LightGBM ensembles across 10+ regions. Features engineered from shipment flows, lead time distributions (mean + P95), and event proximity signals. MAPE improved 5.2% → 3.1%, informing $450K+ in inventory allocation decisions.

THANOS — Network Simulation — Monte Carlo simulation platform modelling stochastic demand, capacity distributions, and routing logic across hundreds of supply chain lanes. Produces risk distributions over network outcomes — not point estimates — to stress-test strategic plans. Informed 3 FC expansion decisions (~$2M+ projected impact).

RAG-Based Operational Intelligence — Built over 100K+ heterogeneous operational documents using hybrid retrieval (dense embeddings + BM25) and hierarchical chunking. Reduced decision error rate by 40%. Designed three-dimensional evaluation framework (retrieval precision, stakeholder adoption rate, decision velocity) — adopted org-wide as the standard for LLM system validation.

Workforce Spillover Optimization — Reformulated shift staffing as a stochastic allocation problem. Applied asymmetric cost weighting to derive quantile-based targets for high-variance periods.


Pinned Projects

End-to-end NLP pipeline for task detection and responsible-person assignment in email threads. BERT sentence-pair classifier fine-tuned on the Enron EPA dataset. LR baseline F1=0.925, DistilBERT F1=0.703. Served via FastAPI. PyTorch BERT HuggingFace FastAPI Scikit-learn

Multi-lane demand forecasting simulator — XGBoost/LightGBM vs baselines, with shipment flow, lead time (mean + P95), and event proximity features. Walk-forward evaluation at daily and weekly (cycle-level) granularity. XGBoost LightGBM Statsmodels Python

Monte Carlo simulation engine for supply chain network planning. Models stochastic demand (log-normal), transit disruptions, and capacity constraints. Compares constrained vs unconstrained scenarios via cost and spillover risk distributions. Monte Carlo NumPy SciPy NetworkX


Tech Stack

Domain Tools
ML & Modeling XGBoost, LightGBM, PyTorch, TensorFlow, Scikit-learn
LLMs & AI RAG Pipelines, LangChain, HuggingFace Transformers, Prompt Engineering
Simulation Monte Carlo, SciPy, NumPy, NetworkX
MLOps SageMaker, MLflow, Docker, Model Monitoring
Data Python, SQL (PostgreSQL, Redshift), A/B Testing, Causal Inference
Cloud AWS (SageMaker, Lambda, S3, Redshift, EC2)

Impact

$2M+ Projected annual impact from THANOS network planning
$450K+ Inventory allocation decisions informed by forecasting system
40% MAPE improvement (5.2% → 3.1%) on demand forecasting
40% Error rate reduction on RAG-based decision support
8+ Production ML models owned end-to-end
10K+ Weekly supply chain events handled via automated NLP triaging

Connect


Building ML systems is easy. Building ML systems people trust is the real work.

Pinned Loading

  1. email-task-assignment email-task-assignment Public

    End-to-end NLP pipeline for assigning responsible persons to tasks detected in emails · BERT-based classifier · PyTorch · FastAPI inference

    Jupyter Notebook

  2. middle-mile-forecasting-simulator middle-mile-forecasting-simulator Public

    Multi-lane demand forecasting simulator — XGBoost/LightGBM vs baselines, with shipment flow, lead time, and event signal features

    Python

  3. thanos-network-optimization-demo thanos-network-optimization-demo Public

    Monte Carlo simulation engine for supply chain network planning — risk distributions over cost, spillover, and SLA across constrained/unconstrained capacity scenarios

    Python

  4. IPL_2026 IPL_2026 Public

    End-to-end cricket analytics pipeline — ball-by-ball ML scoring, win probability, LLM narrative generation, and social media content from live match data

    Jupyter Notebook