About this course
The gap between an accurate model in a notebook and a reliable model in production is where most ML projects fail. This course is about closing that gap systematically — covering the engineering decisions that determine whether a model ships once or runs reliably for years. Feature stores, training pipelines, serving infrastructure, monitoring, and the economics of inference are all on the table.
What you will learn
- The ML production lifecycle and the specific failure modes teams overlook
- Feature store design: versioning, point-in-time correctness, and low-latency serving
- Experiment tracking and full reproducibility with MLflow and DVC
- Automated training pipelines with Kubeflow Pipelines and Metaflow
- Model serving patterns: REST, gRPC, streaming, and batch inference
- Production monitoring: detecting data drift and concept drift before users notice
- Canary deployments and shadow mode for safe ML model releases
- Cost-aware inference: quantisation, batching, and hardware selection
Your instructor
Ayodele Ajayi
Principal Engineer
Principal Engineer based in Kent, UK, with extensive experience across cloud-native security, platform engineering, and distributed systems. Ayodele has led engineering teams at scale and writes about what he learns — with a bias towards things that actually work in production.
Related courses
Continue building your skills in this area.