Vertex AI Feature Store

Google Cloud recently launched VertexAI and as part of that FeatureStore was released. As per official definition “Vertex Feature Store (Feature Store) provides a centralised repository for organising, storing, and serving ML features.” You can read more here.

Feature Store uses a time series data model to store a series of values for features, enabling Feature Store to maintain feature values as they change over time. Feature Store organizes resources hierarchically in the following order: Featurestore -> EntityType -> Feature. You must define and create these resources before you can ingest data into Feature Store.

Let’s take an example and…

BigQuery with Google Cloud Storage

Data Analysts within organisations often have use case where they receive files i.e. CSV,Parquet etc at a scheduled frequency i.e. daily/weekly etc and are required to analyse them. They often face challenge to build an ingestion pipeline to load the data into Data Warehouse and make it available in a tabular format for further analysis.

BigQuery supports query externally partitioned data available in Google Cloud Storage without having to actually load the data into BigQuery. We will see how easy it is to achieve the same.

UseCase : Let’s assume that a stock broker receives daily files for each of…

Hyperpatameter Tuning

In the previous articles, we have seen how to use Google Cloud AI Platform to train model

In this article , we would look into how to use Google Cloud AI Platform to perform Hyperparameter tuning.

Before we see an example of how to perform Hyperparameter tuning, let’s understand the fundamentals of it.

Hyperparameters contain the data that govern the training process itself.

Your training application handles three categories of data as it trains your model:

  • Your input data (also called training data) is a collection of individual records (instances) containing the…

Google Cloud Platform Dataproc

Dataproc is a managed Apache Spark and Apache Hadoop service that lets user take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps user create clusters quickly, manage them easily, and save money by turning clusters off when you don’t need them. With less time and money spent on administration, you can focus on your jobs and your data. You can read more about DataProc here.

There are 2 modes of cluster management:

  • Long Running Clusters : These are the clusters that are up and running 24 * 7 and jobs are…

AI Platform Job using Custom Container

In the previous article, we have seen how to use Google Cloud AI Platform to train a model by submitting a job. In that article we used a pre-built runtime by Google Cloud Platform. The pre-built runtime currently supports 3 ML frameworks as mentioned below. Please refer this link for more information

  • scikit-learn
  • XGBoost
  • tensorflow

However there would be scenarios where we may be needing a different framework to train our model or may be a different version compared to the version available for above 3 framework in the runtime list.

In such scenarios, we will build a custom container…

This story demonstrates how to use AI Platform to train a simple classification model using scikit-learn framework.

Before we begin, lets see what google cloud platform services we would be using in this story:

  • AI Platform is a managed service that enables users to easily build machine learning models. Its a separate service from the AI Notebook service.
  • Cloud Storage is a unified object storage for storing any form of data.
  • Cloud SDK is a command line tool which allows users to interact with Google Cloud services. This notebook introduces several gcloud and gsutil commands, which are part of the…

Cloud DataProc

In this story, we will see how Google Cloud Platform’s managed service Cloud DataProc can be leveraged to read and parse the AVRO data file. As a simple use-case , we will read an AVRO file available on Google Cloud Storage(GCS) and convert it to parquet format and store it back on Google Cloud Storage(GCS).

Log into the Google Cloud Console at

Start the Cloud Shell environment

We will now create a bucket to host our avro file. Execute the below command to create a bucket on GCS. Replace the bucket name appropriately for your environment

gsutil mb gs://dataproc-spark-convert

Jupyter Lab Service

In the previous article , we have seen how to install jupyter lab on a virtual machine in Google Cloud Platform. In this article, we will see , how we can enable the jupyter lab as a service so that a user doesn’t have to run the command manually to start the jupyter lab service.

Login to the Virtual Machine via SSH and execute the below commands:

sudo mkdir -p /opt/jupyterlab/etc/systemd
sudo touch /opt/jupyterlab/etc/systemd/jupyterlab.service

Open the file “/opt/jupyterlab/etc/systemd/jupyterlab.service” and add below content

ExecStart=/usr/local/bin/jupyter lab --ip --port 8888 --no-browser --allow-root



Google Cloud Platform very easy option to access Jupyter Lab instance via the Google Cloud’s AI Platform.

Google Cloud Platform Security

In order to access online Google Cloud Platform APIs via Servers, we need an access token for authorization. In this story , we will see how to generate the same.


  1. Generate access token for accessing the google cloud platform APIs.


  1. Create a service account with the required role.
  2. Generate the access token using the scope and service account created above.

For our understanding , we would consider a use case where a custom model is uploaded on the Google Cloud Platform -> AI Platform and inference needs to be made using the REST API.

Step 1:

  • First we need to…

Sourabh Jain

All Views are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store