GA4 to BigQuery: Optimizing Your Looker Studio Reports for Speed and Cost Efficiency
Have you ever felt limited by what you can see in the Google Analytics 4 interface? You're not alone. While GA4 provides excellent visualization tools, the real magic happens when you can directly query your raw event data in BigQuery.
In this guide, we'll focus specifically on writing effective SQL queries for your GA4 data in BigQuery. I'll walk you through practical examples that demystify the complex nested structure of GA4 tables and show you how to extract meaningful insights from your raw analytics data.
Let's dive into the world of GA4 querying and unlock the full potential of your analytics data!
Getting Started with Basic Queries
Understanding the Table Structure
Before writing our first query, let's orient ourselves in the BigQuery environment. When you look at your GA4 export in BigQuery, you'll notice a table named something like events_(741) rather than a simple table name.
This structure represents a sharded table - essentially a collection of daily tables grouped together for convenience. Each individual table follows the naming pattern events_YYYYMMDD (for example, events_20230101).
Your First Query: Viewing Raw Data
Let's start with the most basic query - retrieving some raw event data:
SELECT * FROM `your_project.analytics_XXXXX.events_20230101` LIMIT 10;
This query selects all columns from a specific day's events table and limits the results to 10 rows. It gives you a glimpse of the raw data structure, including the nested fields we'll need to work with.
Unnesting Event Parameters: The Core Technique
GA4's data structure in BigQuery uses nested arrays for event parameters, which means we can't directly access parameters like page_location or ga_session_id with simple column references.
The UNNEST Function: Your Secret Weapon
The key to working with GA4 data is the UNNEST() function, which transforms an array into a table structure we can query. Let's see it in action:
SELECT event_date, event_timestamp, event_name, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location FROM `your_project.analytics_XXXXX.events_20230101` LIMIT 1000;
This query extracts the page_location parameter from the nested event_params array and presents it as its own column. The subquery is saying: "Look inside the event_params array, find the parameter with key 'page_location', and give me its string value."
Working with Multiple Parameter Types
GA4 stores different types of values in different fields within each parameter. For example:
String values are in
value.string_valueInteger values are in
value.int_valueFloat values are in
value.double_value
Let's expand our query to include the session ID, which is stored as an integer:
SELECT event_date, event_timestamp, event_name, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id FROM `your_project.analytics_XXXXX.events_20230101` LIMIT 1000;
Querying Across Date Ranges
Using Wildcards for Date Ranges
Rather than querying a single day's table, you can use a wildcard to query across multiple days:
SELECT event_date, event_name, COUNT(*) AS event_count FROM `your_project.analytics_XXXXX.events_*` -- This queries all available daily tables WHERE _TABLE_SUFFIX BETWEEN '20230101' AND '20230131' -- Limit to January 2023 GROUP BY event_date, event_name ORDER BY event_date DESC, event_count DESC;
This query counts events by date and name across the entire month of January 2023.
Converting Date Formats
GA4 stores dates as strings in the format 'YYYYMMDD', which isn't ideal for visualization. Let's convert this to a proper date format:
SELECT
PARSE_DATE('%Y%m%d', event_date) AS formatted_date,
event_name,
COUNT(*) AS event_count
FROM
`your_project.analytics_XXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
GROUP BY
formatted_date, event_name
ORDER BY
formatted_date DESC, event_count DESC;
This query uses the PARSE_DATE() function to convert the string date into a proper DATE data type that Looker Studio can interpret correctly.
Advanced Querying Techniques
Creating Unique Session IDs
A common challenge with GA4 data is that the ga_session_id is actually just a timestamp of when the session started. For high-traffic sites, multiple users might have sessions starting at the exact same microsecond.
To create truly unique session IDs, we can concatenate the user ID with the session ID:
SELECT
event_date,
event_name,
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS raw_session_id,
CONCAT(
user_pseudo_id,
'-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) AS unique_session_id
FROM
`your_project.analytics_XXXXX.events_*`
LIMIT 1000;
This creates a composite key that uniquely identifies each session, even on high-traffic sites.
Using Window Functions for User Analysis
Window functions allow us to perform calculations across rows related to the current row. Let's use them to identify a user's first interaction timestamp:
SELECT
event_date,
event_timestamp,
event_name,
user_pseudo_id,
MIN(event_timestamp) OVER (PARTITION BY user_pseudo_id) AS first_touch_timestamp,
-- If the current event timestamp equals the first touch timestamp, this is the user's first event
CASE
WHEN event_timestamp = MIN(event_timestamp) OVER (PARTITION BY user_pseudo_id)
THEN 'Yes'
ELSE 'No'
END AS is_first_event
FROM
`your_project.analytics_XXXXX.events_*`
LIMIT 1000;
This query identifies the earliest timestamp for each user and adds a flag indicating whether the current event was their first interaction with your site.
Source/Medium Analysis
Let's extract traffic source information for session start events:
SELECT
PARSE_DATE('%Y%m%d', event_date) AS date,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))
) AS sessions
FROM
`your_project.analytics_XXXXX.events_*`
WHERE
event_name = 'session_start'
AND _TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
GROUP BY
date, source, medium
ORDER BY
date DESC, sessions DESC;
This query shows the number of sessions by source and medium over time, giving you insight into which traffic sources are driving the most engagement.
Cross-joining for Complete Parameter Exploration
Sometimes you might want to see all event parameters flattened out for easier exploration. We can use a cross join to unnest all parameters:
SELECT event_date, event_name, ep.key AS parameter_name, ep.value.string_value AS string_value, ep.value.int_value AS int_value, ep.value.double_value AS double_value FROM `your_project.analytics_XXXXX.events_20230101` e, UNNEST(e.event_params) AS ep LIMIT 1000;
This approach "explodes" the nested parameters into separate rows, allowing you to see all parameters for all events in a more approachable tabular format.
Common Challenges and Solutions
Dealing with NULL Values
You might notice NULL values in your session ID field or other parameters. This can happen for several reasons:
Not all events include all parameters (for example, page_view events have page parameters, but other events might not)
Users might have denied consent for analytics cookies
Implementation issues with your tracking setup
Always check for NULL values and handle them appropriately in your queries:
SELECT
event_date,
event_name,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
-- Count events with missing session IDs
CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') IS NULL
THEN 'Missing Session ID'
ELSE 'Has Session ID'
END AS session_id_status
FROM
`your_project.analytics_XXXXX.events_20230101`
GROUP BY
event_date, event_name, session_id, session_id_status
ORDER BY
COUNT(*) DESC;
This helps you identify which events tend to have missing values and assess whether it's a significant issue.
Optimizing for Performance
BigQuery charges based on the amount of data processed, so optimizing your queries is important. Some tips:
Always include a date range using
_TABLE_SUFFIXto limit the number of sharded tables scannedSelect only the columns you need rather than using
SELECT *Pre-filter your data with WHERE clauses before performing expensive operations
Consider using a WITH clause to break complex queries into more manageable parts
WITH session_events AS (
SELECT
PARSE_DATE('%Y%m%d', event_date) AS date,
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
FROM
`your_project.analytics_XXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
AND event_name = 'session_start'
)
SELECT
date,
COUNT(DISTINCT CONCAT(user_pseudo_id, '-', session_id)) AS unique_sessions
FROM
session_events
GROUP BY
date
ORDER BY
date;
This query uses a WITH clause to first filter down to just session start events before performing the aggregation, potentially saving on data processing costs.
Frequently Asked Questions (FAQs)
Why are some of my ga_session_id values NULL?
This typically happens for one of three reasons:
The event occurred outside of a session context
The user declined analytics cookies (if you have consent mode enabled)
There might be an issue with your tracking implementation
Check your implementation and consent settings if you're seeing a large number of NULL session IDs.
How can I filter data to only show specific pages or events?
You can add WHERE clauses with subqueries to filter on specific parameters:
SELECT
event_date,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
FROM
`your_project.analytics_XXXXX.events_*`
WHERE
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') LIKE '%/blog/%'
-- Only show events from blog pages
LIMIT 1000;How do I count unique users or sessions over time?
Use COUNT(DISTINCT) with the appropriate identifier:
SELECT
PARSE_DATE('%Y%m%d', event_date) AS date,
COUNT(DISTINCT user_pseudo_id) AS unique_users,
COUNT(DISTINCT CONCAT(user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))
) AS unique_sessions
FROM
`your_project.analytics_XXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
GROUP BY
date
ORDER BY
date;This shows daily unique users and sessions over a month.
Is it better to use a CROSS JOIN or subqueries for unnesting parameters?
It depends on your needs:
Subqueries (as shown in most examples) are more efficient when you only need a few specific parameters
CROSS JOINs are better when you need to explore all parameters or don't know exactly which ones you need
If you're just getting started, try the CROSS JOIN approach to explore your data, then switch to subqueries for your final queries.
Mastering GA4 queries in BigQuery unlocks a new level of analytics power. You can move beyond the limitations of the GA4 interface and access every single piece of data you've collected.
Remember these key points as you build your own queries:
Use
UNNEST()to access nested parametersPay attention to value types (string_value, int_value, etc.)
Create truly unique session IDs by concatenating user_pseudo_id with ga_session_id
Convert dates and timestamps to proper formats for analysis
Use window functions for advanced user journey analysis
With these techniques, you'll be able to answer virtually any analytics question by writing the right query. And as you become more comfortable, you can create scheduled queries that automatically refresh optimized tables for your Looker Studio reports.
Happy querying!
Note:
This post is based on a subject covered in the Looker Studio Masterclass Program. To learn more about Looker Studio Masterclass, click here.