Introduction
MLOps is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operations (Ops).
MLOps is the natural progression of DevOps in the context of AI… and emphasizes consistent and smooth development of models and their scalability.
In simple words, MLOps refers to applying DevOps principles to ML systems.
Practicing MLOps means advocating automation and monitoring at all steps (integration
, testing
, releasing
, deployment
, infra mngt
, etc.) of ML system construction.
The goal of MLOps is to build an integrated ML system that can continuously operate in production. As summarized by Google, only a small fraction of a real-world ML system is composed of the actual ML code, and the required surrounding elements are vast and complex, as shown below.
MLOps concepts
CI/CD/CT
- CI: Continuous Integration
- CI in ML no longer only about testing and validating code and components, but also testing and validating data, data schemas, and models.
- CD: Continuous Delivery
- CD in ML no longer only about a single software package or a service, but a system/pipeline that should automatically deploy another service e.g. model prediction service
- CT: Continuous Training
- CT is a new property that is concerned with automatically retraining and serving the models (with new/updated data or data stream)
Example
Consider the typical steps for training and evaluating an ML model to serve as a prediction service.
After defined the use cases and established the success criteria, the process of delivering an ML model to production involves:
- Data extraction: get relevant data from various sources for the ML task.
- Data analysis: perform EDA to understand the extracted data (e.g. schema, characteristics) and identify the data preparation and feature engineering that are needed.
- Data preparation/preprocessing: preprocess/clean the extracted data. Typically involves split the data into training/validating/test sets, data transformations, feature engineering. The output are the data splits in the prepared format.
- Model training: ML researchers implement algorithms to train various models, perform hyper-parameter tuning, etc. The output is a trained ML model.
- Model evaluation: evaluate the trained model quality on a holdout test set. The output is a set of metrics that assess the model quality.
- Model validation: confirm that the model is adequate for deployment. In our case this means confirm that its predictive performance is better than a certain baseline.
- Model serving: deploy the validated model to a target environment (e.g. as micro services in a k8s cluster, as an embedded model in an edge device, or as part of a batch prediction system) to serve predictions.
- Model monitoring: monitor the deployed model’s prediction performance and trigger new iteration in the system
The above steps can be completed manually by a single team or splitter across different teams (e.g. algorithm team, operation team, etc.), or it can be done by an automatic pipeline.
We want to bring automation to the process so that we can benefit from shortened development cycles, increased deployment velocity, and dependable releases, etc.
The level of automation of these steps defines the maturity of the ML process and reflects the velocity of model iterations (e.g. triggered by new data or new implementations).
MLOps automation levels
Level 0: manual labor
The below picture shows the typical workflow of this level.
Characteristics:
- Manual, script-driven, and interactive process
- Disconnected b/w ML and operations
- Infrequent release iterations
- No CI/CD
- Deployment is a single service (e.g. prediction) rather than the entire ML system
- No active performance monitoring
This approach may be sufficient when models are rarely changed/re-trained. But real-world environment is full of dynamics and models that fail to quickly adapt to changes may decrease in value rapidly.
Level 1: ML pipeline automation
The goal of this level is to enable continuous training of the model by automating the ML pipeline, and thus achieve continuous delivery of model prediction service for users.
This level of automation typically involves:
- Automated data validation
- Automated model validation
- Pipeline triggers for another iteration
- Metadata management (explained later)
The following figure is a schematic representation of an automated ML pipeline for CT.
Characteristics:
- Rapid experiment
- CT of the model in production with fresh data based on live triggers
- Experimental-Operational symmetry (as seen in the above diagram)
- Modularized code for components and pipelines
- CD of models (and thus predictive services)
- Deployment is a ML pipeline rather than only a prediction service
Transition from level 0 to level 1
To transition to level 1, we need to add new components to the architecture:
- Automated data and model validation
- Optional feature store: a centralized repo where we standardize the
definition
,storage
, andaccess
of features for training and serving. This is the data source for experimentation, CT, and online serving. - Metadata management: we record information about each execution of the pipeline in order to help with data and artifacts lineage, reproducibility, comparisons, debugging, anomaly detection, etc. Metadata can include:
- versioning: of pipeline, or of individual components in the pipeline
- timing: start/end date, duration time, etc.
- pipeline executor(s)
- parameter args
- pointer to artifacts produced by each pipeline step (e.g. location of prepared data, computed statistics, etc.)
- pointer to previous trained model (this enables model roll-back)
- model evaluation metrics (can be thought of as part of the produced pipeline artifacts), which enable model comparison and benchmarking
- etc.
- Pipeline triggers: e.g. on-data-availability, on-demand, on-schedule, on-model-perf-degradation, on-data-drift, etc.
This approach is sufficient if new pipeline implementations are rare deployed and only a few pipelines are managed.
The pipeline and its components are usually manually tested and deployed. This is not a good solution if you want to deploy new models based on new ML ideas since manual labor still involved to deploy the pipeline itself, or you are managing many ML pipelines in production.
You need a CI/CD setup to automate the build/test/deployment of ML pipelines.
Level 2: CI/CD pipeline automation
A robust automated CI/CD system allows data scientists rapidly explore new ML ideas around feature engineering, model architecture, hyper-parameters, etc.
Data scientists can implement new ideas and the new pipeline will be automatically built, tested, and deployed to the target environment.
We can see the updated diagram with CI/CD added for the pipeline.
This level typically involves:
- Source control
- Test and build services
- Deployment services
- Model registry
- Feature store
- Metadata management
- Pipeline orchestrator
Characteristics:
Stages for CI/CD automation pipeline:
Manual labors cannot be eliminated for the data analysis and model analysis steps.
CI
The pipeline and its components are built, tested, and packaged when new code is committed or pushed to the VCS.
E.g. unit testing for the feature engineering logic, for different implemented methods; testing that the model training converges; testing that each component in the pipeline produces the expected artifacts, etc.
CD
The new pipeline implementation is continuously deployed to the target environment, and in turn delivers new/updated prediction services.
Rapid and reliable pipeline delivery usually involves:
- verification of the model compatibility with the target infrastructure before deployment actually happens
- test the prediction service with expected inputs and make sure you get expected response within the expected time
- test the service performance e.g. QPS, latency, etc.
- automated deployment to a test environment
- semi-automated deployment to a pre-production environment
- manual deployment to a production environment after several successful runs of the pipeline on the pre-production environment
The following diagram shows the relationship b/w the CI/CD pipeline and the CT pipeline in a ML system:
Given new model implementation (e.g. new ML ideas/architecture), a successful CI/CD pipeline deploys a new CT pipeline.
Given new data, a successful CT pipeline should serve a new model prediction service.
Example: Architecture for MLOps using TFX, Kubeflow Pipelines, and Cloud Build
TFX stands for “TensorFlow Extended” and is an integrated ML platform for developing and deploying production ML systems.
A TFX pipeline is a sequence of components that implement an ML system (modeling, training, validation, serving inference, deployment management, etc.).
Key libraries of TFX including:
- TFT (TensorFlow Transform): data preparation, feature engineering tasks
- TFDV (TensorFlow Data Validation): data anomaly detection
- TensorFlow Estimators and Keras: model building and training
- TFMA (TensorFlow Model Analysis): model evaluation and analysis
- TFServing (TensorFlow Serving): serve model in the target environment (e.g. as REST and gRPC APIs)
The following diagram shows the architecture of an integrated ML system built from the various TFX libraries (i.e. the design of a TFX-based integrated ML system).
With the designed architecture, the next question is how to run each component of the system at scale. Commercial cloud platforms like GCP can help us run the system at scale in a reliable fashion with managed cloud services (e.g. cloud storage, AI hub, dataflow).
With the individual components mapped to a managed service in the cloud platform, the next question is how to connect these two pieces together and automate the entire pipeline. An orchestrator performs such tasks and glues our high-level architecture and the underlying individual components. It’s useful for both dev and production phases as it facilitates automation and reduces manual labors.
The orchestrator runs the pipeline in sequence and automatically move forward based on the defined conditions (e.g. execute the model serving step after model evaluation finished and the metrics meet predefined thresholds).
Kubeflow is the ML Toolkit for Kubernetes. Kubeflow Pipeline is a Kubeflow service that lets you compose, orchestrate, and automate ML systems, where each component of the system can run on various infrastructures (e.g. GCP, local, etc.). Sounds familiar? Yes! It is an orchestrator that we want.
A Kubeflow pipeline involves:
- A set of containerized tasks/components packed as a docker image. These components can execute any data-related and compute-related services, e.g. Dataproc for SparkML jobs, AutoML, etc.
- A sequence of tasks defined by a Python DSL, i.e. the topology of the workflow
- A set of pipeline input parameters
The above diagram shows a high-level overview of integrating CI/CD with Kubeflow pipelines in GCP. At the heart of this architecture is Cloud Build, a managed service that executes your builds on GCP. Essentially, the cloud build process performs the required CI/CD for our integrated ML system.
The build can be triggered manually or through automated build triggers.
For a comprehensive Cloud Build example that covers most of these steps, see A Simple CI/CD Example with Kubeflow Pipelines and Cloud Build.