About this course

The gap between an accurate model in a notebook and a reliable model in production is where most ML projects fail. This course is about closing that gap systematically — covering the engineering decisions that determine whether a model ships once or runs reliably for years. Feature stores, training pipelines, serving infrastructure, monitoring, and the economics of inference are all on the table.

What you will learn

  • The ML production lifecycle and the specific failure modes teams overlook
  • Feature store design: versioning, point-in-time correctness, and low-latency serving
  • Experiment tracking and full reproducibility with MLflow and DVC
  • Automated training pipelines with Kubeflow Pipelines and Metaflow
  • Model serving patterns: REST, gRPC, streaming, and batch inference
  • Production monitoring: detecting data drift and concept drift before users notice
  • Canary deployments and shadow mode for safe ML model releases
  • Cost-aware inference: quantisation, batching, and hardware selection

Your instructor

Ayodele Ajayi

Principal Engineer

Principal Engineer based in Kent, UK, with extensive experience across cloud-native security, platform engineering, and distributed systems. Ayodele has led engineering teams at scale and writes about what he learns — with a bias towards things that actually work in production.