FeatureStore in VertexAI -Google Cloud Platform

Sourabh Jain
6 min readJun 13, 2021
Vertex AI Feature Store

Google Cloud recently launched VertexAI and as part of that FeatureStore was released. As per official definition “Vertex Feature Store (Feature Store) provides a centralised repository for organising, storing, and serving ML features.” You can read more here.

Feature Store uses a time series data model to store a series of values for features, enabling Feature Store to maintain feature values as they change over time. Feature Store organizes resources hierarchically in the following order: Featurestore -> EntityType -> Feature. You must define and create these resources before you can ingest data into Feature Store.

Let’s take an example and see how we can setup the featurestore using VertexAI. We will see how these features can be:

  1. Populated within the featurestore.
  2. Served online from featurestore with low latency.
  3. Batch exported for model training.
  4. Searched within the featurestore.

In our post, we will take an example of gaming company having features extracted and calculated for an user. Before we start with populating featurestore , it’s important to identify entity type, entity and features. Our source structure of data is as below:

user_pseudo_id
country
operating_system
language
cnt_user_engagement
cnt_level_start_quickplay
cnt_level_end_quickplay
cnt_level_complete_quickplay
cnt_level_reset_quickplay
cnt_post_score
cnt_spend_virtual_currency
cnt_ad_reward
cnt_challenge_a_friend
cnt_completed_5_levels
cnt_use_extra_steps
user_first_engagement
month
julianday
dayofweek
timestamp

Note : The above data can be populated into BigQuery by referring this notebook.

In our example, entityType is Users and entityID will be the individual user record identified by user_pseudo_id. Features will be all the attributes above except user_pseudo_id and timestamp. For our example , we will load only 2 features in the featurestore i.e. cnt_user_engagement & cnt_level_start_quickplay.

Let’s start by creating the featurestore.

%%bash
LOCATION="REGIONID"
PROJECT="PROJECTID"
FEATURESTORE_ID="gamingusers"
cat <<EOM > request.json
{
"online_serving_config": {
"fixed_node_count": 1
},
"labels": {
"environment": "gamingusers"
}
}
EOM
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/featurestores?featurestoreId=${FEATURESTORE_ID}

On successful creation of the featurestore, you would get a success response as below:

{
"name": "projects/<<projectnumber>>/locations/us-central1/featurestores/gamingusers/operations/5740181047589470208",
"metadata": {
"@type": "type.googleapis.com/google.cloud.aiplatform.v1beta1.CreateFeaturestoreOperationMetadata",
"genericMetadata": {
"createTime": "2021-06-13T19:49:41.432926Z",
"updateTime": "2021-06-13T19:49:41.432926Z"
}
}
}

It takes a few minutes for featurestore to appear. Now we will create an entityType called as users.

%%bash
LOCATION="REGIONID"
PROJECT="PROJECTID"
FEATURESTORE_ID="gamingusers"
ENTITY_TYPE_ID="users"
cat <<EOM > request.json
{
"description": "Users Entity Type",
"monitoringConfig": {
"snapshotAnalysis": {
"monitoringInterval": "3600s"
}
}
}
EOM
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/featurestores/${FEATURESTORE_ID}/entityTypes?entityTypeId=${ENTITY_TYPE_ID}

On successful creation of the entitytype, you would get a success response as below:

{
"name": "projects/<<projectid>>/locations/us-central1/featurestores/gamingusers/entityTypes/users/operations/8100067252331610112",
"metadata": {
"@type": "type.googleapis.com/google.cloud.aiplatform.v1beta1.CreateEntityTypeOperationMetadata",
"genericMetadata": {
"createTime": "2021-06-13T19:56:56.466253Z",
"updateTime": "2021-06-13T19:56:56.466253Z"
}
}
}

Now we will add features to our entityType users via Batch Method.

%%bash
LOCATION="REGIONID"
PROJECT="PROJECTID"
FEATURESTORE_ID="gamingusers"
ENTITY_TYPE_ID="users"
cat <<EOM > request.json
{
"requests": [
{
"feature": {
"description": "User Engagement",
"valueType": "INT64",
"monitoringConfig": {
"snapshotAnalysis": {
"monitoringInterval": "3600s"
}
}
},
"featureId": "cnt_user_engagement"
},
{
"feature": {
"description": "Level start quickplay",
"valueType": "INT64",
"monitoringConfig": {
"snapshotAnalysis": {
"monitoringInterval": "3600s"
}
}
},
"featureId": "cnt_level_start_quickplay"
}
]
}
EOM
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/featurestores/${FEATURESTORE_ID}/entityTypes/${ENTITY_TYPE_ID}/features:batchCreate

On successful creation of features, you will get success response as below:

{
"name": "projects/<<projectid>>/locations/us-central1/featurestores/gamingusers/operations/6888598952568946688",
"metadata": {
"@type": "type.googleapis.com/google.cloud.aiplatform.v1beta1.BatchCreateFeaturesOperationMetadata",
"genericMetadata": {
"createTime": "2021-06-13T20:07:08.755864Z",
"updateTime": "2021-06-13T20:07:08.755864Z"
}
}
}

Now let’s ingest data from our BigQuery table. Important thing to note below is that we have used user_pseudo_id for entityIdField and timestamp for featureTimeField.

%%bash
LOCATION="REGIONID"
PROJECT="PROJECTID"
FEATURESTORE_ID="gamingusers"
ENTITY_TYPE_ID="users"
cat <<EOM > request.json
{
"entityIdField": "user_pseudo_id",
"featureTimeField": "timestamp",
"bigquerySource": { "inputUri": "bq://<<projectid>>.<<dataset>>.train_tab" },
"featureSpecs": [{
"id": "cnt_level_start_quickplay",
"sourceField": "cnt_level_start_quickplay"
},{
"id": "cnt_user_engagement",
"sourceField": "cnt_user_engagement"
}],
"workerCount": 1
}
EOM
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/featurestores/${FEATURESTORE_ID}/entityTypes/${ENTITY_TYPE_ID}:importFeatureValues

This will start an ingestion job for pulling the data from BigQuery into featurestore. Log into the Google Cloud Console and goto Vertex AI and click on View Ingestion Jobs. You will see as below:

Moving features from BigQuery to Featurestore

Now let’s serve features from this featurestore. We will use one of the user_pseudo_id values to fetch the features from the featurestore.

%%bash
LOCATION="REGIONID"
PROJECT="PROJECTID"
FEATURESTORE_ID="gamingusers"
ENTITY_TYPE_ID="users"
cat <<EOM > request.json
{
"entityId": "4AEEE533D8FAED4AD0A9227618CD296B",
"featureSelector": {
"idMatcher": {
"ids": ["cnt_level_start_quickplay", "cnt_user_engagement"]
}
}
}
EOM
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/featurestores/${FEATURESTORE_ID}/entityTypes/${ENTITY_TYPE_ID}:readFeatureValues

On successful execution , you will see the response below where both the requested feature values are served:

{
"header": {
"entityType": "projects/<<projectid>>/locations/us-central1/featurestores/gamingusers/entityTypes/users",
"featureDescriptors": [
{
"id": "cnt_level_start_quickplay"
},
{
"id": "cnt_user_engagement"
}
]
},
"entityView": {
"entityId": "4AEEE533D8FAED4AD0A9227618CD296B",
"data": [
{
"value": {
"int64Value": "111",
"metadata": {
"generateTime": "2021-06-08T20:40:03.847Z"
}
}
},
{
"value": {
"int64Value": "294",
"metadata": {
"generateTime": "2021-06-08T20:40:03.847Z"
}
}
}
]
}
}

Now let’s see how this data can be extracted back to BigQuery from featurestore for Machine Learning Model training.

We need to create a CSV file that has the details of the entityIds for which we want to retrieve the features and corresponding point in time timestamp for which the feature values needs to be retrieved. The header name for the column should be the entityType and timestamp.

We create the sample CSV file with the user_pseudo_id for which we want to fetch the features.

%%bash
cat <<EOM > csvReadInstances.csv
users,timestamp
"4AEEE533D8FAED4AD0A9227618CD296B",2021-07-15T08:28:14Z
"1BE4F29852B390FC94D2A4E7382CCEBD",2021-07-15T08:28:14Z
EOM

Now we upload it to the Google Cloud Storage bucket

!gsutil cp csvReadInstances.csv gs://<<bucket>>/vertex-ai-pipelines/featurestore/readcsvinstances

Now let’s extract the data from featurestore to Bigquery. We need to provide the BQ path for table where the data will be exported as destination and also CSV path which we created above as csvReadInstances.

%%bash
LOCATION="REGIONID"
PROJECT="PROJECTID"
FEATURESTORE_ID="gamingusers"
ENTITY_TYPE_ID="users"
cat <<EOM > request.json
{
"destination": {
"bigqueryDestination": {
"outputUri": "bq://<<projectid>>.<<dataset>>.export_features"
}
},
"csvReadInstances": {
"gcsSource": {
"uris": ["gs://<<bucket>>/vertex-ai-pipelines/featurestore/readcsvinstances/csvReadInstances.csv"]
}
},
"entityTypeSpecs": [
{
"entityTypeId": "users",
"featureSelector": {
"idMatcher": {
"ids": ["cnt_level_start_quickplay", "cnt_user_engagement"]
}
}
}
]
}
EOM
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/featurestores/${FEATURESTORE_ID}:batchReadFeatureValues

On successful execution, this will create a BigQuery table in the respective dataset as shown below:

Exported features.

Over time featurestore will grow and data scientist would need to search the features. Now let’s search features in our featurestore. In the below exampel we are searching for features that has “cnt” in its name and is of type “INT64”.

%%bash
LOCATION="REGIONID"
PROJECT="PROJECTID"
curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT}/locations/${LOCATION}/featurestores:searchFeatures?query="featureId:cnt%20AND%20valueType=INT64"

On successful execution, you would find the output as below:

{
"features": [
{
"name": "projects/<<projectid>>/locations/us-central1/featurestores/gamingusers/entityTypes/users/features/cnt_level_start_quickplay",
"description": "Level start quickplay",
"createTime": "2021-06-13T20:07:09.206472Z",
"updateTime": "2021-06-13T21:15:39.596002Z"
},
{
"name": "projects/<<projectid>>/locations/us-central1/featurestores/gamingusers/entityTypes/users/features/cnt_user_engagement",
"description": "User Engagement",
"createTime": "2021-06-13T20:07:09.205050Z",
"updateTime": "2021-06-13T21:15:39.594700Z"
}
]
}

That’s it for this story. Stay tuned for more in VertexAI features.

Sign up to discover human stories that deepen your understanding of the world.

Sourabh Jain
Sourabh Jain

Responses (2)

Write a response