Machine Learning API Service

Production ML serving stack exposing REST endpoints with model version management, A/B splits, and monitoring hooks. Supports both batch and real-time inference workloads.

FastAPITensorFlowDockerKubernetesPostgreSQLMLflow

Machine Learning API Service

A production-grade serving layer for deploying, versioning, and monitoring machine learning models.

Highlights

Model registry powered by MLflow with automated promotion gates.
REST endpoints for both synchronous and batch prediction workflows.
Traffic splitting rules support A/B testing and gradual rollouts.
Autoscaling driven by custom metrics and queue depth.
Drift monitoring hooks emit alerts when performance degrades.

Technical Stack

FastAPI exposes the API surface with async inference.
TensorFlow and ONNX runtimes are dynamically loaded and warmed.
PostgreSQL stores metadata; Redis caches preprocessed features.
Kubernetes handles orchestration with KEDA for scaling decisions.

Workflow

Package a model artifact with metadata and log it to MLflow.
Invoke the deployment CLI to promote a version to staging or production.
Monitor performance dashboards and compare canary vs. stable models.

Security & Compliance

Requests are authenticated using API keys and optional JWT claims.
PII is scrubbed before logging, and audit trails are retained for compliance.

Project Links

Live Demo

Highlights

Model registry with rollbacks
Traffic splitting for experimentation
Autoscaling with KEDA signals
Metrics and drift monitoring
Batch + real-time inference

Key Challenge

Drove inference latency under 100ms through async I/O and warmed model caches.