Model Training using Google Cloud AI Platform

5 min readJul 18, 2020

This story demonstrates how to use AI Platform to train a simple classification model using scikit-learn framework.

Before we begin, lets see what google cloud platform services we would be using in this story:

AI Platform is a managed service that enables users to easily build machine learning models. Its a separate service from the AI Notebook service.
Cloud Storage is a unified object storage for storing any form of data.
Cloud SDK is a command line tool which allows users to interact with Google Cloud services. This notebook introduces several gcloud and gsutil commands, which are part of the Cloud SDK. Note that shell commands in a notebook must be prepended with a !.

Let’s begin. In this story, we would not focus much on what algorithm we have used for model training but the focus would be more on how to use AI Platform Job service to train the model.

For training data, we would use IRIS dataset and upload it to Google Cloud Storage(GCS) bucket.

Let’s download the IRIS dataset and rename it to CSV.

!wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
!mv iris.data iris.csv

Now let’s create a GCS bucket and upload the iris.csv file to it.

BUCKET_NAME = 'demo-scikit-learn-ai-platform'
!gsutil mb gs://$BUCKET_NAME
!gsutil cp ./iris.csv gs://$BUCKET_NAME

First , we will write import the libraries that we plan to use for this story and write the complete code for a sample model training.

# pip install scikit-learn==0.20.3 for this demo to run succesfully
# Libraries 
import datetime
import os
import subprocess
import sys
import pandas as pd
from sklearn import svm
from sklearn.externals import joblib
from google.cloud import storage
import sklearn
print('sklearn: {}'.format(sklearn.__version__))# Create a Cloud Storage client to download the data a nd upload the model
storage_client = storage.Client()# Download the data
public_bucket = storage_client.bucket('demo-scikit-learn-ai-platform')
blob = public_bucket.blob('iris.csv')
blob.download_to_filename('iris.csv')#Read the training data from the file
iris_data = pd.read_csv('./iris.csv',sep=',',names=["sepal_length", "sepal_width", "petal_length","petal_width","species"])#Assigning the classes and removing the target variable 
iris_label = iris_data.pop('species')#We're going to be using the SVC (support vector classifier) SVM (support vector machine)
classifier = svm.SVC(gamma='auto')#Training the model
classifier.fit(iris_data, iris_label)#Saving the data locally
model_filename = 'model.joblib'
joblib.dump(classifier, model_filename)# Create a Cloud Storage client to upload the model
storage_client = storage.Client()
bucket = storage_client.bucket('demo-scikit-learn-ai-platform')
blob = bucket.blob(model_filename)
blob.upload_from_filename(model_filename)

We can run the above code and observe that the model gets created successfully and gets uploaded to the google cloud storage bucket. We can use that model to do either batch or online predictions. The above example is a simple and completes very quickly. However when we have huge data or complex pre-processing, feature engineering etc steps in our workload it can take hours to complete the model training. Hence it becomes important to execute the same via AI Platform Job service so that it takes care of complete execution and we are free to sleep, work , take a break etc

In order to submit the above code to AI platform , we just need to make a python package structure and submit the same to AI Platform Job for execution. The python package consists of an empty initialisation file “__init__.py” and the file that will contain the above code i.e. “train.py” within a folder. So let’s create a folder “ai_platform_training” and create the both the files.

!mkdir ai_platform_training
!touch ./ai_platform_training/__init__.py

Now in order to copy the above code into a file , add a line at the top of the code as shown below and execute the code. You will observe that now the code doesn’t execute but gets saved to a file train.py within the ai_platform_training folder.

%%writefile ./ai_platform_training/train.py
# pip install scikit-learn==0.20.3 for this demo to run succesfully
# Libraries 
import datetime
import os
import subprocess
import sys
import pandas as pd
from sklearn import svm
from sklearn.externals import joblib
from google.cloud import storage
import sklearn
print('sklearn: {}'.format(sklearn.__version__))# Create a Cloud Storage client to download the data a nd upload the model
storage_client = storage.Client()# Download the data
public_bucket = storage_client.bucket('demo-scikit-learn-ai-platform')
blob = public_bucket.blob('iris.csv')
blob.download_to_filename('iris.csv')#Read the training data from the file
iris_data = pd.read_csv('./iris.csv',sep=',',names=["sepal_length", "sepal_width", "petal_length","petal_width","species"])#Assigning the classes and removing the target variable 
iris_label = iris_data.pop('species')#We're going to be using the SVC (support vector classifier) SVM (support vector machine)
classifier = svm.SVC(gamma='auto')#Training the model
classifier.fit(iris_data, iris_label)#Saving the data locally
model_filename = 'model.joblib'
joblib.dump(classifier, model_filename)# Create a Cloud Storage client to upload the model
storage_client = storage.Client()
bucket = storage_client.bucket('demo-scikit-learn-ai-platform')
blob = bucket.blob(model_filename)
blob.upload_from_filename(model_filename)

The folder structure should look like this now.

ai_platform_training
|-__init__.py
|-train.py

Now , we are ready to submit the package to AI Platform Job. We will generate a job name using time factor to make it unique and execute the command to submit the job.

import time# Define a timestamped job name
JOB_NAME = "demo_scikit_learn_ai_platform_{}".format(int(time.time()))# Submit the training job:
!gcloud ai-platform jobs submit training $JOB_NAME \
  --job-dir gs://demo-scikit-learn-ai-platform/ai_platform_training \
  --package-path ./ai_platform_training \
  --module-name ai_platform_training.train \
  --region us-central1 \
  --runtime-version=1.14 \
  --python-version=3.5 \
  --scale-tier BASIC

You can actually monitor the job status via the below command

!gcloud ai-platform jobs describe $JOB_NAME

It will give you an output as shown below where you can see the job state and other details.

createTime: '2020-07-18T17:56:27Z'
endTime: '2020-07-18T18:00:14Z'
etag: xnQTVAjI-vY=
jobId: demo_scikit_learn_ai_platform_1595094985
startTime: '2020-07-18T17:57:11Z'
state: SUCCEEDED
trainingInput:
  jobDir: gs://demo-scikit-learn-ai-platform/ai_platform_training
  packageUris:
  - gs://demo-scikit-learn-ai-platform/ai_platform_training/packages/aacabf6587d47e4f2678f0f9d8368cc862da58021b2eeef7e1ffb7b0443fce53/ai_platform_training-0.0.0.tar.gz
  pythonModule: ai_platform_training.train
  pythonVersion: '3.5'
  region: us-central1
  runtimeVersion: '1.14'
trainingOutput:
  consumedMLUnits: 0.06

One the job is completed, the model file will be uploaded to the bucket. Now let’s understand the used parameters of the submit job command. You can find all the parameters here

$JOB_NAME → The name of the job. It could be as per the application requirements.

job-dir → Google Cloud Storage path in which to store training outputs and other data needed for training.

package-path → Path to a Python package to build.

module-name → Name of the module to run.

region → The region in which the job needs to be run.

runtime-version → AI Platform runtime version for this job. Must be specified unless — master-image-uri is specified instead.Google provides pre built runtime with specific frameworks installed for job execution. If the frameworks you need are not available in the pre built runtime provided by google cloud, we will have to build a custom container and pass the same via master-image-uri. You can see the pre built runtime version list here .

python-version → Version of Python used during training. If not set, the default version is 2.7. Python 3.5 is available when --runtime-version is set to 1.4 and above. Python 2.7 works with all supported runtime versions.

scale-tier → Specify the machine types, the number of replicas for workers, and parameter servers. You can read more here .

Hope you found this useful. Our next story will be to train a model on custom containers. Till then, happy reading.

Model Training using Google Cloud AI Platform

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sourabh Jain

Responses (1)