Exploring the Essentials of Machine Learning Operations (MLOps)

Chapter 1: Introduction to MLOps

In recent years, I have been involved in deploying Machine Learning systems within real-world settings for clients in the Consumer Packaged Goods (CPG) and Healthcare sectors. While deciphering business needs, designing models, and working with raw data present significant challenges, the most captivating aspect has been the process of industrializing these projects.

A persistent issue we face is determining the best practices to adopt for maintaining a machine learning system in a production environment. A recent study by Kreuzberger et al. provides a thorough examination of Machine Learning Operations (MLOps), detailing its foundational principles, architectural components, necessary roles, and potential system architectures. This work also offers valuable academic insights into best practices in the field.

Section 1.1: Core Principles of MLOps

To grasp the essence of MLOps and its objectives, it's vital to recognize several key principles that a project should adhere to:

P1 - CI/CD Automation: This principle facilitates rapid building, testing, and deploying of code, enhancing overall team productivity.
P2 - Workflow Orchestration: Essential for coordinating the various steps in an ML workflow, such as processing raw data, training/testing models, and deployment.
P3 - Reproducibility: The ability to replicate past experiments (both code and models) is crucial.
P4 - Versioning: Tracking versions of code, data, and models is imperative for reproducibility.
P5 - Collaboration: Effective communication between technical teams and business stakeholders is essential to align on objectives and expectations.
P6 - Continuous ML Training & Evaluation: The system should facilitate regular retraining and evaluation of models.
P7 - ML Metadata Tracking and Logging: Each model should be associated with metadata, including evaluation metrics and code versions to manage production effectively.
P8 - Continuous Monitoring: Monitoring is vital to assess model performance and determine when retraining or new models are needed.
P9 - Feedback Loops: The iterative nature of the process allows for continuous improvement based on feedback.

Subsection 1.1.1: Technical Components Supporting MLOps Principles

Each principle is supported by specific technical components:

C1 - CI/CD Component: Enables continuous integration and delivery while supporting ongoing ML training and evaluation.
C2 - Source Code Repository: Essential for collaboration and versioning.
C3 - Workflow Orchestration Component: Facilitates pipeline orchestration, ensuring reproducibility and continuous training.
C4 - Feature Store System: Consists of offline and online databases crucial for training and production predictions.
C5 - Model Training Infrastructure: Necessary resources (CPUs, GPUs) for continual model training and evaluation.
C6 - Model Registry: Stores model images and associated metadata, aiding in deployment and distribution.
C7 - ML Metadata Stores: Manages the diverse metadata generated from various components.
C8 - Model Serving Component: Responds to requests, typically via REST API, and should be scalable.
C9 - Monitoring Component: Tracks model performance, enabling feedback loops.

Visual representation of MLOps principles and components

Section 1.2: Roles in an MLOps Project

Implementing an MLOps project can be complex, but it facilitates agile methodologies by delineating various engineering roles that foster rapid experimentation and iteration. Key roles include:

R1 - Business Stakeholder: Defines business goals and acts as a liaison with company stakeholders.
R2 - Solution Architect: Determines the technologies to be employed.
R3 - Data Scientist: Converts business requirements into analytical needs and develops ML models.
R4 - Data Engineer: Constructs and manages data and feature pipelines.
R5 - Software Engineer: Applies best practices in software design to the overall project.
R6 - DevOps Engineer: Builds and maintains pipelines, ensuring effective CI/CD automation and workflow orchestration.
R7 - ML / MLOps Engineer: A cross-functional role that automates ML infrastructure, workflows, and model deployment.

The subsequent visual summarizes these roles and their interactions.

Overview of roles involved in an MLOps project

Chapter 2: MLOps Architecture and Workflow

Following the insights from Kreuzberger et al., a general, technology-agnostic architecture for MLOps is proposed. The workflow aligns with the Team Data Science Process (TDSP) and comprises several key steps.

Video: Introduction to Machine Learning Operations | MLOPs - YouTube

This video offers a foundational understanding of MLOps, discussing its significance and core components.

A) Initiating an MLOps Project

In the initiation phase, also referred to as Business Understanding in TDSP, the Business Stakeholder defines the project goals. The Solution Architect identifies suitable technologies, while the Data Scientist collaborates with the Product Owner and others to clarify the business problem and data availability.

B) Feature Engineering Pipeline

Kreuzberger et al. describe a Feature Engineering Pipeline where Data Engineers and Data Scientists work together to identify features, ingest, preprocess, and transform data—mirroring the Data Acquisition phase of TDSP.

C) Experimentation

During this phase, the Data Scientist analyzes and prepares the data, develops a model, and eventually exports it to the model registry.

D) Deployment

Once a model is trained, it is deployed to the serving layer.

Putting it all together, we can visualize how these components interact within the architecture.

General architecture of a Machine Learning project

As the project progresses, data sources are shared with the Data Scientist. Connecting to raw data can be challenging due to varying data storage systems, constituting a crucial step in the Data Ingestion / Feature Engineering Pipeline.

Once processed, data is stored in a feature store system for offline training and online predictions. An event-based approach or scheduled retraining can trigger model updates, guided by the monitoring component which assesses model performance in production.

Conclusion

This research highlights the importance of understanding Machine Learning models in real-world settings, providing a comprehensive overview of what a machine learning system should entail and what MLOps encompasses. It serves as one of the initial toolkits to enhance the success of Machine Learning projects. However, numerous other factors can lead to project failures, and organizational changes required to adapt processes can be daunting.

If you enjoyed this content, please consider giving it a clap! For more articles, follow me on Medium.

Main Reference:

[1] Dominik Kreuzberger, Niklas Kühl, Sebastian Hirschl, "Machine Learning Operations (MLOps): Overview, Definition, and Architecture"

Further Readings on Leading Cloud Providers:

For those interested in kickstarting an MLOps project, Microsoft offers accelerators to assist in the process.

Video: A Primer on Machine Learning Operations (MLOps) - YouTube

This video delves deeper into the principles and practices surrounding MLOps, offering insights into its implementation.

diet-okikae.com

Exploring the Essentials of Machine Learning Operations (MLOps)

Chapter 1: Introduction to MLOps

Section 1.1: Core Principles of MLOps

Subsection 1.1.1: Technical Components Supporting MLOps Principles

Section 1.2: Roles in an MLOps Project

Chapter 2: MLOps Architecture and Workflow

Share the page:

Recent Post:

Navigating Economic Challenges in a Technologically Advanced World

Mastering iPhone Portraits: Simple Tricks for Stunning Shots

Efficient Digital Nomad Lifestyle in Bangkok: A Personal Insight

Reflecting on What You Truly Possess: A Deep Dive

How to Retrieve the ID of a Clicked Element in JavaScript

Transform Your Sketches into Stunning Art with a Click!

# Did Hobbits Actually Exist? An Intriguing Discovery in Indonesia

The Art of Earth Moving: The Life of a Heavy Equipment Operator