Part 4: DevOps for AI/ML

Liming Tsai
5 min readJan 25, 2021

In this final part of my data science series, we will begin to explore MLOps and how can we use Open Data Hub on Red Hat OpenShift Container Platform to achieve this.

Image by Markus Winkler from Pixabay

DevOps for AI/ML?

In the previous post, I wrote about how we can deploy your trained model onto Kubernetes. But how can we leverage on a pipeline to help us build and train our model in a consistent manner and then deploying them onto Kubernetes?

Developing, deploying and continuously improving the model is very different from traditional web or mobile applications, however, you can still take advantage of Continuous Integration and Continuous Deployment, with some additional steps to achieve this.

Iterative Process

Challenge #1: Tracking your Experiments

In the data science world, we focused on the quality of what a model can achieve based on certain inputs. We are interested in certain scores such as the F1, Accuracy, Precision, Recall, etc. These scores help us to determine whether a model is producing an acceptable result based on the provided input range.

We may try new techniques or tweak some of hyperparameter and run these models against the new set of inputs and track the results to determine which is the optimal model that we want.

We would want to track these and there are open source tools out there that can help to achieve this:

They help to log your results to allow us to easily review them as we run our experiments to look for the best performing model.

MLflow Experiments

Challenge #2: Model Tracking

You are probably committing your source code into a Git repository (hopefully!). It is going to kick off a pipeline to 1) Build 2) Run your experiments and 3) Log your results.

But what about the container image that you have just built and pushed into a registry. How are you going to relate that particular image to your experiment(s) aka model provenance?

Using MLflow, you could upload your image and store it together with the experiment(s). As such, you can always be sure that you are pulling out the right model that produces the exact result that you wanted.

MLflow Run

Challenge #3: Observability and Feedback Loop

After all the hard work and finding the optimal model, you have deployed your model into production. Congrats!

However, data is not constant and will change. Events can trigger a major shift in consumer behavior or data will get stale over time. Our model accuracy will also change over time. We need to be able to see that there is a drift and this is vastly very different from traditional application monitoring.

We could be capturing inputs and predictions using tools like ELK stack, Prometheus/Grafana, or keeping them into a data warehouse. It is only by observing the performance of our models, we can begin to do A/B testing.

With the feedback loop, we can then start evaluating new models with curated new data to keep our model updated.

Open Data Hub

The Red Hat OpenShift Container Platform makes it possible to provision and manage shared research infrastructure for the discovery phases of the machine learning workflow. The Open Data Hub provides a reference architecture for a data lake with a shared discovery environment with an open-source stack on OpenShift.

It comprises of many open source tools such as

  • Apache Airflow
  • Apache Kafka
  • Apache Spark
  • Apache Superset
  • Argo
  • Grafana
  • JupyterHub
  • Prometheus
  • Seldon

Open Data hub is packaged into an Operator that can be installed easily from OperatorHub on OpenShift.

Open Data Hub in OperatorHub

Once installed, the user can install the packages using Kustomize by referencing the components that are required.

https://github.com/opendatahub-io/odh-manifests/blob/master/kfdef/kfctl_openshift.yaml

Tools in the Toolbox

  1. JupyterHub: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. Jupyter Notebook can be used to build, train and test models.
  2. CodeReady Workspaces: As you begin to productize your code, this will allow you to quickly develop and test your model within a browser IDE that is running OpenShift.
  3. Kubeflow Pipelines, Apache Airflow, Argo Workflow, or Tekton: This allows you to create a machine learning workflow to meet your requirements.
  4. ArgoCD: Argo CD is a declarative, GitOps continuous delivery tool to deploy applications to Kubernetes.
  5. Seldon: Seldon helps to deploy your model easily on Kubernetes by wrapping them into REST/GRPC microservices.
  6. Prometheus and Grafana: This allows you to monitor your model and plot the results onto a dashboard.

Summary

Over the last few posts, we explored containers and we started building our very first training image. Subsequently, we deployed an image onto Kubernetes and we now looked at how Open Data Hub on OpenShift helps you on MLOps.

Thank you for reading and I hope you now have a better understanding of using containers and Kubernetes for AI/ML.

--

--