MLOps

Introduction

MLOps is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operations (Ops).

MLOps is the natural progression of DevOps in the context of AI… and emphasizes consistent and smooth development of models and their scalability.

In simple words, MLOps refers to applying DevOps principles to ML systems.

Practicing MLOps means advocating automation and monitoring at all steps (integration, testing, releasing, deployment, infra mngt, etc.) of ML system construction.

The goal of MLOps is to build an integrated ML system that can continuously operate in production. As summarized by Google, only a small fraction of a real-world ML system is composed of the actual ML code, and the required surrounding elements are vast and complex, as shown below.

MLOps concepts

CI/CD/CT

Example

Consider the typical steps for training and evaluating an ML model to serve as a prediction service.

After defined the use cases and established the success criteria, the process of delivering an ML model to production involves:

  1. Data extraction: get relevant data from various sources for the ML task.
  2. Data analysis: perform EDA to understand the extracted data (e.g. schema, characteristics) and identify the data preparation and feature engineering that are needed.
  3. Data preparation/preprocessing: preprocess/clean the extracted data. Typically involves split the data into training/validating/test sets, data transformations, feature engineering. The output are the data splits in the prepared format.
  4. Model training: ML researchers implement algorithms to train various models, perform hyper-parameter tuning, etc. The output is a trained ML model.
  5. Model evaluation: evaluate the trained model quality on a holdout test set. The output is a set of metrics that assess the model quality.
  6. Model validation: confirm that the model is adequate for deployment. In our case this means confirm that its predictive performance is better than a certain baseline.
  7. Model serving: deploy the validated model to a target environment (e.g. as micro services in a k8s cluster, as an embedded model in an edge device, or as part of a batch prediction system) to serve predictions.
  8. Model monitoring: monitor the deployed model’s prediction performance and trigger new iteration in the system

The above steps can be completed manually by a single team or splitter across different teams (e.g. algorithm team, operation team, etc.), or it can be done by an automatic pipeline.

We want to bring automation to the process so that we can benefit from shortened development cycles, increased deployment velocity, and dependable releases, etc.

The level of automation of these steps defines the maturity of the ML process and reflects the velocity of model iterations (e.g. triggered by new data or new implementations).

MLOps automation levels

Level 0: manual labor

The below picture shows the typical workflow of this level.

Characteristics:

  1. Manual, script-driven, and interactive process
  2. Disconnected b/w ML and operations
  3. Infrequent release iterations
  4. No CI/CD
  5. Deployment is a single service (e.g. prediction) rather than the entire ML system
  6. No active performance monitoring

This approach may be sufficient when models are rarely changed/re-trained. But real-world environment is full of dynamics and models that fail to quickly adapt to changes may decrease in value rapidly.

Level 1: ML pipeline automation

The goal of this level is to enable continuous training of the model by automating the ML pipeline, and thus achieve continuous delivery of model prediction service for users.

This level of automation typically involves:

The following figure is a schematic representation of an automated ML pipeline for CT.

Characteristics:

  1. Rapid experiment
  2. CT of the model in production with fresh data based on live triggers
  3. Experimental-Operational symmetry (as seen in the above diagram)
  4. Modularized code for components and pipelines
  5. CD of models (and thus predictive services)
  6. Deployment is a ML pipeline rather than only a prediction service

Transition from level 0 to level 1

To transition to level 1, we need to add new components to the architecture:

This approach is sufficient if new pipeline implementations are rare deployed and only a few pipelines are managed.

The pipeline and its components are usually manually tested and deployed. This is not a good solution if you want to deploy new models based on new ML ideas since manual labor still involved to deploy the pipeline itself, or you are managing many ML pipelines in production.

You need a CI/CD setup to automate the build/test/deployment of ML pipelines.

Level 2: CI/CD pipeline automation

A robust automated CI/CD system allows data scientists rapidly explore new ML ideas around feature engineering, model architecture, hyper-parameters, etc.

Data scientists can implement new ideas and the new pipeline will be automatically built, tested, and deployed to the target environment.

We can see the updated diagram with CI/CD added for the pipeline.

This level typically involves:

Characteristics:

Stages for CI/CD automation pipeline:

Manual labors cannot be eliminated for the data analysis and model analysis steps.

CI

The pipeline and its components are built, tested, and packaged when new code is committed or pushed to the VCS.

E.g. unit testing for the feature engineering logic, for different implemented methods; testing that the model training converges; testing that each component in the pipeline produces the expected artifacts, etc.

CD

The new pipeline implementation is continuously deployed to the target environment, and in turn delivers new/updated prediction services.

Rapid and reliable pipeline delivery usually involves:

The following diagram shows the relationship b/w the CI/CD pipeline and the CT pipeline in a ML system:

Given new model implementation (e.g. new ML ideas/architecture), a successful CI/CD pipeline deploys a new CT pipeline.

Given new data, a successful CT pipeline should serve a new model prediction service.

Example: Architecture for MLOps using TFX, Kubeflow Pipelines, and Cloud Build

TFX stands for “TensorFlow Extended” and is an integrated ML platform for developing and deploying production ML systems.

A TFX pipeline is a sequence of components that implement an ML system (modeling, training, validation, serving inference, deployment management, etc.).

Key libraries of TFX including:

The following diagram shows the architecture of an integrated ML system built from the various TFX libraries (i.e. the design of a TFX-based integrated ML system).

With the designed architecture, the next question is how to run each component of the system at scale. Commercial cloud platforms like GCP can help us run the system at scale in a reliable fashion with managed cloud services (e.g. cloud storage, AI hub, dataflow).

With the individual components mapped to a managed service in the cloud platform, the next question is how to connect these two pieces together and automate the entire pipeline. An orchestrator performs such tasks and glues our high-level architecture and the underlying individual components. It’s useful for both dev and production phases as it facilitates automation and reduces manual labors.

The orchestrator runs the pipeline in sequence and automatically move forward based on the defined conditions (e.g. execute the model serving step after model evaluation finished and the metrics meet predefined thresholds).

Kubeflow is the ML Toolkit for Kubernetes. Kubeflow Pipeline is a Kubeflow service that lets you compose, orchestrate, and automate ML systems, where each component of the system can run on various infrastructures (e.g. GCP, local, etc.). Sounds familiar? Yes! It is an orchestrator that we want.

A Kubeflow pipeline involves:

The above diagram shows a high-level overview of integrating CI/CD with Kubeflow pipelines in GCP. At the heart of this architecture is Cloud Build, a managed service that executes your builds on GCP. Essentially, the cloud build process performs the required CI/CD for our integrated ML system.

The build can be triggered manually or through automated build triggers.

For a comprehensive Cloud Build example that covers most of these steps, see A Simple CI/CD Example with Kubeflow Pipelines and Cloud Build.

References