Road to MLOps

This is the last article of our series about MLOps, in the lasts articles we went from MLOps lvl 0 and 1, if you want to catch up, here are the links of the previous articles:

Introducing MLOps lvl 0: https://www.picsellia.com/post/what-is-mlops-and-why-its-important

Introducing MLOps lvl 1: https://www.picsellia.com/post/road-to-mlops-part-2

In this third part, we will cover the lvl 2 of MLOps and present some tools and techniques to achieve it.

What is MLOps Level 2?

Level 2 goes a step further in terms of what we can do and the kind of system we are now able to operate. By that we mean large-scale, high-frequency systems. Data scientists can now focus on analyzing data and becoming data-centric, they can also spend more time on testing new techniques, algorithms, and analyses.

You have developed tools to package and containerize their code, so when they are done, the whole pipeline is automatically tested and deployed. This means that we have achieved fully automated Continuous Integration and Delivery.

The triggers of the pipeline renewal are multiple, as we monitor everything in a lot of ways at different stages, every metric monitored can trigger the pipeline. Or everything can be scheduled, it's your choice. The few manual steps that remain compulsory are the data and error analysis, to ensure that the training data and the models meet our high standards.

Now, let's talk about some tools to help you progress in your path to level 2 (or at least level 1) of MLOps Maturity.

We are going to cover some of the most important steps that come to mind when we deal with MLOps.

The first step (one of the most important ones) is deployment, because it’s the one that should finally solve your business case or project (at least in the beginning).

Then it comes to monitoring, there are no MLOps without proper ways to monitor the underlying systems.

And finally, all the different steps that characterize the different levels of MLOps: the feedback loop, the Continuous Training, and the Continuous Delivery of models.

The last points depend on the company or project requirements, organization, teams, etc.

Cloud Model Deployment

Deployment is one of the most exciting stages of the project, where you can finally see your model in action, study its performance, but also pray that it never breaks!

Here is a diagram of what we consider the bare minimum of configuration and packaging for deploying a model into ‘production’. Everything less complete than this can’t be considered MLOps ready.

‍

‍

But we are still going to see the different ways that allow you to deploy models and see how they compare to each other.

Deployment Options

Here are (in short) our options for deploying Machine learning models today. If you can make predictions using a web-connected system there are a few options at your disposal.

The first one that you may have done is deploy a Simple Web server to handle the requests and make the predictions.
The second one, more advanced, is to use some complete and powerful engines such as Tensorflow Serving, and then in the Other part, we'll see some Cloud provider services.
The last one is to deploy a model on edge. I will not cover it as the logic to integrate those options into a pipeline is not different from the first ones.

Simple Web Server Deployment

Flask Deployment shows you how easy it is to ‘deploy’ a machine learning model on an API endpoint and here is a short working example of a model deployed on a Flask server.

‍

The key issue with this method is that it mixes the logic of the model and predictions with the API logic, which leads to more complicated stack traces and less clean code. API can be inconsistent. For example, you might have to change a route, you can’t create dynamic routing to new model versions as you add it to the servers. And finally, it’s inefficient and dangerous to manage your compute resources.

What if your model has to scale but the web server does not, how can you decorrelate the needs for performance of both sides? How do you mitigate the costs of such a solution?

It can only scale with pain.

Tensorflow Serving — Pros & Cons

Let’s move to a more robust approach of serving: Tensorflow serving.

First, it solves our earlier problem of mixed logic. Even if it is more complicated than a Flask server, it is still quite easy, and spending time on is worth it. You can do batch predictions now, which means parallelizing your predictions.

You have an API that doesn’t change, as it is devoted to model management and predictions. In addition, it's quite standardized, and you can use either the REST API or the gRPC API (for example at Picsellia we use gRPC as it improves performance and robustness of the APIs — but it comes with a bigger learning curve so choose what is best for you).

It natively supports different versions for the same model, standardizing the way API routes are written.

But, does TensorFlow serving can only run TensorFlow?

It does only run TensorFlow and Keras models for now but some improvements are on the way and as we will see later, there are some workarounds. We can recommend this medium article made by Analytics India on converting Pytorch models to Tensorflow Serving.

Tensorflow Serving - Run

Just for you to see how easy it is to run TensorFlow serving and deploy a model, we took those screenshots.

‍

‍

The first one shows you the minimal command to launch the serving engine and the other so you can see what it looks like when correctly launched. Beautiful, isn’t it?

Tensorflow Serving - gRPC Example

Here we wrote a minimal example on how to request the Tensorflow serving server, using gRPC protocol.

‍

‍

We can tell that it’s less intuitive than the Flask example, but It’s not even that long and I can guarantee you that this will give you a lot more confidence in your serving engine.

Other Options

Graphpipe solves the multi-framework model problem. You can deploy Tensorflow models, and MXnet and Pytorch models can be converted to ONNX format, that can then be deployed to the same serving. It also comes with a lot of clients for different programming languages so it can be included in nearly any stack.

Kubeflow is a heavyweight solution, based on Kubernetes, that allows you to orchestrate experiments, serving and supporting pipelines at scale on any architecture in a good way. Even if it is very powerful, it can be complicated to take control of the solution and maintain it properly.

Simple Tensorflow serving supports nearly all existing frameworks including Pytorch and MXnet and still seems quite easy to implement.

Finally, we have Cloud deployment solutions like the Cloud providers platform (AWS, GCP, Azure…) and solutions like ours; Picsellia. The great thing about these solutions is that everything is managed, maintained, and monitored for you, working as powerful tools for your ML projects. But they have to be compatible with your existing workflows, policies and they come with a price.

If you'd like to learn more about our MLOps solution, Picsellia, feel free to book a quick intro call and we'll get you started in no time!

Road to MLOps — Level 2: Part 3