Setting up a ML stack requires lots of tools, analyzing data, and training a model in the ML pipeline. But it is even harder to set up the same stack in multi-cloud environments. This is when Kubeflow comes into the picture and makes it easy to develop, deploy, and manage ML pipelines.
In this article, we are going to learn how to install Kubeflow on Kubernetes (GKE), train a ML model on Kubernetes and publish the results. This introductory guide will be helpful for anyone who wants to understand how to use Kubernetes to run a ML pipeline in a simple, portable and scalable way.
Kubeflow Installation on GKE
You can install Kubeflow onto any Kubernetes cluster no matter which cloud it is, but the cluster needs to fulfill the following minimum requirements:
- 4 CPU
- 50 GB storage
- 12 GB memory
The recommended Kubernetes version is 1.14 and above.
You need to download kfctl from the Kubeflow website and untar the file:
tar -xvf kfctl_v1.0.2_<platform>.tar.gz -C /home/velotio/kubeflow</platform>
Also, install kustomize using these instructions.
Start by exporting the following environment variables:
After we’ve exported these variables, we can build the kubebuilder and customize everything according to our needs. Run the following command:
This will download the file kfctl_k8s_istio.v1.0.2.yaml and a kustomize folder. If you want to expose the UI with LoadBalancer, change the file $KF_DIR/kustomize/istio-install/base/istio-noauth.yaml and edit the service istio-ingressgateway from NodePort to LoadBalancer.
Now, you can install KubeFlow using the following commands:
This will install a bunch of services that are required to run the ML workflows.
Once successfully deployed, you can access the Kubeflow UI dashboard on the istio-ingressgateway service. You can find the IP using following command:
Developing your ML application consists of several stages:
- Gathering data and data analysis
- Researching the model for the type of data collected
- Training and testing the model
- Tuning the model
- Deploy the model
These are multi-stage models for any ML problem you’re trying to solve, but where does Kubeflow fit in this model?
Kubeflow provides its own pipelines to solve this problem. The Kubeflow pipeline consists of the ML workflow description, the different stages of the workflow, and how they combine in the form of graph.
Kubeflow provides an ability to run your ML pipeline on any hardware be it your laptop, cloud or multi-cloud environment. Wherever you can run Kubernetes, you can run your ML pipeline.
Training your ML Model on Kubeflow
Once you’ve deployed Kubeflow in the first step, you should be able to access the Kubeflow UI, which would look like:
The first step is to upload your pipeline. However, to do that, you need to prepare your pipeline in the first place. We are going to use a financial series database and train our model. You can find the example code here:
This command above will build the docker images, and we will create the bucket to store our data and model artifacts.
Once we have our image ready on the GCR repo, we can start our training job on Kubernetes. Please have a look at the tfjob resource in CPU/tfjob1.yaml and update the image and bucket reference.
Kubeflow Pipelines needs our pipeline file into a domain-specific-language. We can compile our python3 file with a tool called dsl-compile that comes with the Python3 SDK, which compile our pipeline into DSL. So, first, install that SDK:
Next, inspect the ml_pipline.py and update the ml_pipeline.py with the CPU image path that you built in the previous steps. Then, compile the DSL, using:
Now, a file ml_pipeline.py.tar_gz is generated, which we can upload to the Kubeflow pipelines UI.
Once the pipeline is uploaded, you can see the stages in a graph-like format.
Next, we can click on the pipeline and create a run. For each run, you need to specify the params that you want to use. When the pipeline is running, you can inspect the logs:
Run Jupyter Notebook in your ML Pipeline
You can also interactively define your pipeline from the Jupyter notebook:
1. Navigate to the Notebook Servers through the Kubeflow UI
2. Select the namespace and click on “new server.”
3. Give the server a name and provide the docker image for the TensorFlow on which you want to train your model. I took the TensorFlow 1.15 image.
4. Once a notebook server is available, click on “connect” to connect to the server.
5. This will open up a new window and a Jupyter terminal.
6. Input the following command: pip install -U kfp.
7. Download the notebook using following command:
8. Now that you have notebook, you can replace the environment variables like WORKING_DIR, PROJECT_NAME and GITHUB_TOKEN. Once you do that, you can run the notebook step-by-step (one cell at a time) by pressing shift+enter, or you can run the whole notebook by clicking on menu and run all options.
The ML world has its own challenges; the environments are tightly coupled and the tools you needed to deploy to build an ML stack was extremely hard to set up and configure. This becomes harder in production environments because you have to be extremely cautious you are not breaking the components that are already present.
Kubeflow makes getting started on ML highly accessible. You can run your ML workflows anywhere you can run Kubernetes. Kubeflow made it possible to run your ML stack on multi cloud environments, which enables ML engineers to easily train their models at scale with the scalability of Kubernetes.