Dynamic JSON Update with BigQuery

Sourabh Jain
5 min readJan 12, 2022
Dynamic JSON Update with BigQuery

In the previous story we saw how semi structured data i.e. JSON can be loaded into BigQuery and also how you can append to nested arrays. You can visit the story here.

In this story , we will extend to a common use case where we have data within the BigQuery but we receive JSON with random attributes that needs to be updated into the main table. The challenge here is that there will be random attributes in each of the row that we receive.

Let’s start by loading a sample data as shown below:

{"id":1,"name":"Sourabh","region":"Mumbai","salary":100}
{"id":2,"name":"Rajesh","region":"Mumbai","salary":200}
{"id":3,"name":"Nishit","region":"Mumbai","salary":300}
{"id":4,"name":"Rakesh","region":"Mumbai","salary":400}
{"id":5,"name":"Vivek","region":"Mumbai","salary":500}

We will upload the above data into a GCS bucket and then load it. The file is uploaded into gs://bigquery_json_processing/target_table.json

!bq load --source_format NEWLINE_DELIMITED_JSON --autodetect \
--ignore_unknown_values sourabhjainceanalytics:demo.target_table gs://bigquery_json_processing/target_table.json

Let’s validate the loaded data:

Sample data

Now let us say that we receive JSON with records for update to region and salary. Every JSON may not have all the fields that needs to be updated. Let us say that we receive the below JSON where each of the row has only the changes that needs to be applied. For the id=1, only salary has changes whereas for id=2 , region has changed and for id=3 both attributes have changed.

{"id":1,"salary":1000}
{"id":2,"region":"Ahmedabad"}
{"id":3,"region":"Chennai","salary":3000}

Let us first load this data into BigQuery and achieve the update via SQL. We will then look into how to make it dynamic.

!bq load --source_format NEWLINE_DELIMITED_JSON --autodetect \
--ignore_unknown_values sourabhjainceanalytics:demo.change_table gs://bigquery_json_processing/change_table.json

Let’s validate the change_table. output.

%%bigquery
SELECT * FROM `sourabhjainceanalytics.demo.change_table`
Data to be updated

Now let’s create a MERGE statement in BigQuery to get our record updated in BigQuery’s main table i.e. target_table. You will observe that we are taking both fields but doing a CASE check to see if the change_table has a value or not. If the value is NULL, the update for that column is skipped else the column value gets updated.

%%bigquery
MERGE demo.target_table T
USING demo.change_table S
ON T.id = S.id
WHEN MATCHED THEN
UPDATE SET T.region = CASE WHEN S.region IS NULL THEN T.region ELSE S.region END
,T.salary = CASE WHEN S.salary IS NULL THEN T.salary ELSE S.salary END

Let’s confirm the same in our BigQuery output:

%%bigquery
SELECT * FROM `sourabhjainceanalytics.demo.target_table`
Updated data.

Now we have seen that the value are updated appropriately. However we will not be sure of the columns that would come in the JSON file and hence we need to make the below clause dynamic based on the attributes in the JSON file.

T.region = CASE WHEN S.region IS NULL THEN T.region ELSE S.region END
,T.salary = CASE WHEN S.salary IS NULL THEN T.salary ELSE S.salary END

BigQuery provides metadata information about the tables and we can leverage that to find the columns. In the below query we provide name of the table for which we want to find the columns and also ignore the key column in the output. We can use this information and write a custom script to prepare the clause dynamically.

%%bigquery
SELECT column_name
FROM sourabhjainceanalytics.demo.INFORMATION_SCHEMA.COLUMNS
WHERE table_name IN ('change_table')
and column_name NOT IN ('id')

The output will be all the columns that have come in the incoming JSON.

columns with updated values

Now let’s write the script in NodeJs to create this clause dynamically and execute this query. You can use any supported scripting language of your choice.

'use strict';function main() {
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();
// [END bigquery_client_default_credentials]
async function query() {
// Include the table that has the incremental data and remove the attributes that are the keys for join and are not needed for the update clause.
const query = `SELECT column_name FROM sourabhjainceanalytics.resulticks.INFORMATION_SCHEMA.COLUMNS WHERE table_name IN ('change_table') and column_name NOT IN ('id')`;//Prepare the options for submission of job to bigquery.
const options = {
query: query,
location: 'US',
};
// Run the query as a job
const [job] = await bigquery.createQueryJob(options);
// Wait for the query to finish
const [rows] = await job.getQueryResults();
let update_clause;for(let i=0;i<rows.length;i++){
let interim_variable = 'T.'+ rows[i].column_name +' = CASE WHEN S.'+ rows[i].column_name +' IS NULL THEN T.'+ rows[i].column_name +' ELSE S.'+ rows[i].column_name +' END,'
if(update_clause == '' || update_clause == null){
update_clause = interim_variable
}else{
update_clause = update_clause + interim_variable;
}
}
//Remove the extra separator at the end.
update_clause = update_clause.substring(0,update_clause.length - 1);
// prepare the statement that needs to be fired on BigQuery.
let merge_stmt = `
MERGE demo.target_table T
USING demo.change_table S
ON T.id = S.id
WHEN MATCHED THEN
UPDATE SET ` + update_clause;

//Prepare the options for submission of job to bigquery.
const merge_options = {
query: merge_stmt,
location: 'US',
};

// Run the query as a job
const [merge_job] = await bigquery.createQueryJob(merge_options);
// Wait for the query to finish
const [merge_rows] = await merge_job.getQueryResults();

}
// [END bigquery_query]
query();
}
main(...process.argv.slice(2));

In the above script , the 3 highlighted part are the important part.

  1. We fire the query to get the column information of your change_table which has the list of columns that would need to be updated.
  2. Loop through the above list to create a dynamic UPDATE clase.
  3. Concatenate the UPDATE clause with the MERGE statement and then fore this dynamic MERGE clause to BigQuery.

Hope this was helpful to understand the power of BigQuery and how you can dynamic updates for execution. Stay tuned and drop comments for any feedback or issues you are facing in JSON data processing with BigQuery!

--

--