Nested Record Processing with BigQuery

Sourabh Jain
3 min readFeb 2, 2022

BigQuery DataWarehouse

Welcome to the third story of the semi-structured processing with BigQuery Data Warehouse. Please visit the below stories to get an understanding of

  1. Processing semi-structured data with BigQuery : Loading Dynamic JSONs into BigQuery
  2. Dynamic JSON Update with BigQuery : In this, we saw how dynamic updates can be generated from incremental changes received.

In this story we will see how incremental nested data can be merged into the existing tables.

To start with, let’s assume we have the below data in our master table:

Base Data

Now, we have received the incremental data as below:

Incremental Data

In this story, we would look at below 3 scenarios:

  1. The record in the incremental data with id = 8 is completely new record and should be inserted into the base table.
  2. The record in the incremental data with id = 7 and contact_details.phone=1818181818 and 2020202020 are new rows and its phone and type value should be inserted into the base table.
  3. The record in the incremental data with id = 7 and contact_details.phone=1717171717 is an existing row and its type value should be updated from “fax” to “home”.

The approach to perform this ideally would be a single MERGE process that BigQuery supports but I found some limitations in child nested updates and hence we need to break it down in three steps:

  1. Populate a change interim table that identifies if the record is meant to be inserted , insert as a nested row or update as a nested row. Please see the status column below.
Tagged Interim Table
DROP TABLE `sourabhjainceanalytics.demo.customer_incremental_change_interim`;CREATE TABLE `sourabhjainceanalytics.demo.customer_incremental_change_interim`
(
id INT64,
name STRING,
company STRING,
status STRING,
contact_details ARRAY<STRUCT<
type STRING,
phone INT64
>>
);
-----------------------------
INSERT INTO `sourabhjainceanalytics.demo.customer_incremental_change_interim` (id,name,company,status,contact_details)
WITH incremental AS(
SELECT S.id,
S.name,
S.company,
cd.type,
cd.phone
FROM
`sourabhjainceanalytics.demo.customer_incremental_change` AS S
, UNNEST(contact_details) AS cd
),
master AS(
SELECT M.id,
M.name,
M.company,
cd.type,
cd.phone
FROM `sourabhjainceanalytics.demo.customer_master` AS M
, UNNEST(contact_details) AS cd
)
SELECT i.id,
i.name,
i.company,
IF(f.id IS NULL,'I','U') as status,
ARRAY_AGG(STRUCT(i.type,i.phone)) AS contact_details,
FROM incremental AS i
LEFT OUTER JOIN master AS f
ON i.id = f.id AND i.phone = f.phone
GROUP BY id,name,company,status;

2. Execute the merge statement to perform the insert operations i.e. inserting any new parent rows and also inserting child records for the existing records.

MERGE `sourabhjainceanalytics.demo.customer_master` T
USING `sourabhjainceanalytics.demo.customer_incremental_change_interim` S
ON T.id = S.id
WHEN NOT MATCHED THEN
INSERT VALUES (S.id,S.name,S.company,S.contact_details)
WHEN MATCHED AND S.status = 'I' THEN
UPDATE SET T.contact_details = ARRAY_CONCAT(T.contact_details, S.contact_details);
Inserted records.

3. Execute the Update statement to update the child nested records.

UPDATE `sourabhjainceanalytics.demo.customer_master` m
SET contact_details = ARRAY
(
SELECT AS STRUCT IF(mcd.phone=scd.phone,scd.type,mcd.type) AS type,mcd.phone
FROM UNNEST(m.contact_details) mcd
)
FROM `sourabhjainceanalytics.demo.customer_incremental_change_interim` s, UNNEST(s.contact_details) scd
WHERE m.id = s.id
AND s.status = 'U';
Updated records.

Hope this was helpful to understand the power of BigQuery to perform merge operations and how to insert/update nested records. The code to run can be found here. Please let me know if you are facing any challenges.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Sourabh Jain
Sourabh Jain

Responses (1)

Write a response

Good one Sourabh!