Dynamic Listing of tables in BigQuery

Sourabh Jain
3 min readDec 4, 2022

BigQuery

As the workloads grow in an organisation, one of the common challenge is to find out how many entities have been created and metadata like # of rows and size of the data.

If you are using BigQuery, it provides this details in the __TABLES__ metadata. However the information is available at the dataset level i.e. you need to associate the dataset with every query i.e.

SELECT * FROM `dataset-id.__TABLES__`

The problem with this is that you will have to UNION all such queries for your datasets and that becomes a tedious task.

However you can leverage BigQuery scripting and dynamically generate the above queries. Once the dynamic query is generated, we can leverage BigQuery’s EXECUTE IMMEDIATE capability to fire the dynamic statement. Please refer the script below:

DECLARE datasets ARRAY<string>;
DECLARE query string;
DECLARE counter INT64 DEFAULT 0;
DECLARE totalDatasets INT64;

SET datasets = ARRAY(select schema_name from `<<project-id>>.region-<<region-id>>.INFORMATION_SCHEMA.SCHEMATA`);
SET query = "SELECT * FROM (";
SET totalDatasets = ARRAY_LENGTH(datasets);

WHILE counter < totalDatasets - 1 DO
SET query = CONCAT(query, "SELECT '", datasets[OFFSET(counter)], "', table_ID, row_count, size_bytes from `on-prem-project-337210.", datasets[OFFSET(counter)], '.__TABLES__` UNION ALL ');
SET counter = counter + 1;
END WHILE;

SET query = CONCAT(query, "SELECT '", datasets[ORDINAL(totalDatasets)], "', table_ID, row_count, size_bytes from `on-prem-project-337210.", datasets[ORDINAL(totalDatasets)], '.__TABLES__` )');

EXECUTE IMMEDIATE query;

NOTE: You need to replace the project-id and region-id in the above queries appropriately.

Lets understand the breakdown:

SET datasets = ARRAY(select schema_name from `<<project-id>>.region-<<region-id>>.INFORMATION_SCHEMA.SCHEMATA`);

In the above code, we are fetching all the dataset names into the datasets.

SET query = "SELECT * FROM (";

We build a query variable where we will build and store the entire dynamic query that needs to be executed.

SET totalDatasets = ARRAY_LENGTH(datasets);

Above, we find the total datasets count.

WHILE counter < totalDatasets - 1 DO
SET query = CONCAT(query, "SELECT '", datasets[OFFSET(counter)], "', table_ID, row_count, size_bytes from `on-prem-project-337210.", datasets[OFFSET(counter)], '.__TABLES__` UNION ALL ');
SET counter = counter + 1;
END WHILE;

Now , we loop through the entire datasets array and build a dynamic query for all the datasets and keep concatenating the same to query variable.

SET query = CONCAT(query, "SELECT '", datasets[ORDINAL(totalDatasets)], "', table_ID, row_count, size_bytes from `on-prem-project-337210.", datasets[ORDINAL(totalDatasets)], '.__TABLES__` )');

We prepare the final query that needs to be executed by appending the necessary columns that needs to be presented to the output.

EXECUTE IMMEDIATE query;

Finally, we execute the dynamically created query.

Once you run the above script, you will find the output like below. Click on the VIEW RESULT for the 2nd row to see the desired output.

Output of Dynamic Script
Output

Hope this helps! Happy building applications on BigQuery!

Please provide feedback if you want more of such assets.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Sourabh Jain
Sourabh Jain

Responses (1)

Write a response