Google Cloud recently launched VertexAI and as part of that FeatureStore was released. As per official definition “Vertex Feature Store (Feature Store) provides a centralised repository for organising, storing, and serving ML features.” You can read more here.
Feature Store uses a time series data model to store a series of values for features, enabling Feature Store to maintain feature values as they change over time. Feature Store organizes resources hierarchically in the following order:
Featurestore -> EntityType -> Feature. You must define and create these resources before you can ingest data into Feature Store.
Let’s take an example and…
Data Analysts within organisations often have use case where they receive files i.e. CSV,Parquet etc at a scheduled frequency i.e. daily/weekly etc and are required to analyse them. They often face challenge to build an ingestion pipeline to load the data into Data Warehouse and make it available in a tabular format for further analysis.
BigQuery supports query externally partitioned data available in Google Cloud Storage without having to actually load the data into BigQuery. We will see how easy it is to achieve the same.
UseCase : Let’s assume that a stock broker receives daily files for each of…
In the previous articles, we have seen how to use Google Cloud AI Platform to train model
In this article , we would look into how to use Google Cloud AI Platform to perform Hyperparameter tuning.
Before we see an example of how to perform Hyperparameter tuning, let’s understand the fundamentals of it.
Hyperparameters contain the data that govern the training process itself.
Your training application handles three categories of data as it trains your model:
Dataproc is a managed Apache Spark and Apache Hadoop service that lets user take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps user create clusters quickly, manage them easily, and save money by turning clusters off when you don’t need them. With less time and money spent on administration, you can focus on your jobs and your data. You can read more about DataProc here.
There are 2 modes of cluster management:
In the previous article, we have seen how to use Google Cloud AI Platform to train a model by submitting a job. In that article we used a pre-built runtime by Google Cloud Platform. The pre-built runtime currently supports 3 ML frameworks as mentioned below. Please refer this link for more information
However there would be scenarios where we may be needing a different framework to train our model or may be a different version compared to the version available for above 3 framework in the runtime list.
In such scenarios, we will build a custom container…
This story demonstrates how to use AI Platform to train a simple classification model using scikit-learn framework.
Before we begin, lets see what google cloud platform services we would be using in this story:
In this story, we will see how Google Cloud Platform’s managed service Cloud DataProc can be leveraged to read and parse the AVRO data file. As a simple use-case , we will read an AVRO file available on Google Cloud Storage(GCS) and convert it to parquet format and store it back on Google Cloud Storage(GCS).
Log into the Google Cloud Console at https://console.cloud.google.com/
Start the Cloud Shell environment
We will now create a bucket to host our avro file. Execute the below command to create a bucket on GCS. Replace the bucket name appropriately for your environment
gsutil mb gs://dataproc-spark-convert
In the previous article , we have seen how to install jupyter lab on a virtual machine in Google Cloud Platform. In this article, we will see , how we can enable the jupyter lab as a service so that a user doesn’t have to run the command manually to start the jupyter lab service.
Login to the Virtual Machine via SSH and execute the below commands:
sudo mkdir -p /opt/jupyterlab/etc/systemd
sudo touch /opt/jupyterlab/etc/systemd/jupyterlab.service
Open the file “/opt/jupyterlab/etc/systemd/jupyterlab.service” and add below content
ExecStart=/usr/local/bin/jupyter lab --ip 0.0.0.0 --port 8888 --no-browser --allow-root[Install]
Google Cloud Platform very easy option to access Jupyter Lab instance via the Google Cloud’s AI Platform.
In order to access online Google Cloud Platform APIs via Servers, we need an access token for authorization. In this story , we will see how to generate the same.
For our understanding , we would consider a use case where a custom model is uploaded on the Google Cloud Platform -> AI Platform and inference needs to be made using the REST API.
All Views are my own.