CI-CD Pipeline for Machine Learning (MLOPS)

Mohit Agarwal
6 min readJun 26, 2020

--

Automation applied to an efficient operation will magnify the efficiency

Above quote shows the basic motive behind this Article, because in this article I will be showing you the process of Automating the process of Hyper Parameter used in Machine Learning to get the best accuracy of our model.

Overview :-

This Article mainly focuses on the practical aspects of integrating Deep Learning with DevOps tools like Git, Github, Jenkins, Docker. In my case, I’ve trained my Convolutional Neural Network (CNN) in order to detect whether the person is sufffering from Pneumonia or not. For this, I’ve trained my DL Model with more than 5000 images and this model is having the accuracy of 90% of correct prediction. I am also providing my GitHub Repository Link where I’ve uploaded my CNN Trained Model and all the files required to train and re tuning our Hyper Parameters.

CI-CD Pipeline Overview :-

We are having two Docker Images, One is having Python + Traditional ML Modules Packages Installed in it and Other one is having Python + Deep Learning Modules in it. So that we can launch respective containers as per modules used in the main File.

We are also using the Web Hook Feature of Github for Notifying Jenkins so that it can download the updated code from Github, when developer pushes it to his Git Remote Repo.

Jenkins here acts as a Intermediater here which downloads code from Github and determine which modules are used in the file and start the respective Docker Container for Training purposes.

Python Script also checks the accuracy after training the model and if accuracy is less than 8 0% then Script will add one more CRP Layer and tweak the Hyper Parameter accordingly and Re-Train it to get the best accuracy.

CNN Architecture on Developer Side

CNN Architecture

After preparing this architecture we have to train our model using Chest X Ray images. But for training first we have to upload it on Github, so that we can use the Github Webhook. Now here I would like to mention the work of DevOps because Jenkins will download the code from Github as soon as it receives signal from Github Webhook feature and after downloading, it will start the respective Docker Container for Training Purpose in my case, I’m having Deep Learning Modules like keras, so it will start Docker with Keras Libraries installed in it.

Git Use Case in this Project

Git is acting as an intermediater between the developer and the Jenkins, it notify Jenkins whenever there is an Update in the code from Developer Side. For some practical idea of Git and Github, I’m providing the Some Screenshots of my work.

In above image, I’ve commited my work locally and pushed it on my remote branch of Github. On Github, I’ve created a webhook that will notify Jenkins as soon as Developer Pushes the code on Github, following is the Screenshot of my Github WebHook.

Docker Use Case in this Project

Here Docker is mainly use for the launching the Operating System for training Deep Learning Model of CNN. DockerFile is created in order to create the image of the Container. Following I am providing the screenshots of my DockerFile.

Other DockerFile which creates the image of Deep Learning.

In order to create an image using this Dockerfile just use the following command.

docker build -t “ImageName” “location of directory which contains Dockerfile” for example

docker build -t deep_learning_image /my_folder/

Jenkins Use Case in this Project

In Jenkins, we have to make 5 Jobs/Items following are the description of these Jobs for an overview of my Jenkins Pipeline, I’m providing a visualization of Pipeline created using Build Pipeline Plugin in Jenkins.

  • Job1(Git Job1) will automatically Pull Data from Github and will store data in the workspace of Jenkins as soon as the developer pushes the updated code on the Github. I’ve also used copy_artifacts plugin for copying data from one Job to another Job, so following are the screenshots of this Job.
  • Job2(ML_Type_Job) will determine the type of packages used in the main file of python. As per the packages used in the files, the respective docker image will start for example if it contains the code which uses Deep Learning Code then Jenkins will launch that container which is having all the Deep Learning Packages of python installed in it. and if it is using the Traditional Learning Packages then it will launch that container that has only Traditional Learning Packages like sklearn, pillow etc.
  • Job3(Train_ML_Model) will Start the Training Process of our Deep Learning Model or CNN Model in the respective container which was launched by the Previous Job.
  • Job4(Tweaker Job) will determine the accuracy of the model trained in the previous step and check if the accuracy is less than 80% then it will automatically change Hyper Parameters in the Python File and again Push it on the Github so that this cycle from Job1 again starts. If the accuracy of the model is > 80% then it will notify the developer and will not change the Hyper Parameter. Job4 will run a Python script that will check the accuracy and do Hyper Parameter Tuning for us.
  • Job5(ML_Model_Monitoring_Job) will run on a scheduled basis to check whether the containers are running fine or not, if my deep_learning_os container goes down then it will again launch that container. This job will start after the Job2 and it check the container every minute.

Conclusion

In this article, we’ve automated the work of Hyper Parameter Tuning using DevOps tools like Docker, Jenkins, Git and we’ve built a model to detect Pneumonia. All of the scripts related to this file can be found on my Github Repository.

--

--