GA4 to BigQuery: Optimizing Your Looker Studio Reports for Speed and Cost Efficiency

Have you ever felt limited by what you can see in the Google Analytics 4 interface? You're not alone. While GA4 provides excellent visualization tools, the real magic happens when you can directly query your raw event data in BigQuery.

In this guide, we'll focus specifically on writing effective SQL queries for your GA4 data in BigQuery. I'll walk you through practical examples that demystify the complex nested structure of GA4 tables and show you how to extract meaningful insights from your raw analytics data.

Let's dive into the world of GA4 querying and unlock the full potential of your analytics data!

Getting Started with Basic Queries

Understanding the Table Structure

Before writing our first query, let's orient ourselves in the BigQuery environment. When you look at your GA4 export in BigQuery, you'll notice a table named something like events_(741) rather than a simple table name.

This structure represents a sharded table - essentially a collection of daily tables grouped together for convenience. Each individual table follows the naming pattern events_YYYYMMDD (for example, events_20230101).

Your First Query: Viewing Raw Data

Let's start with the most basic query - retrieving some raw event data:

SELECT *
FROM `your_project.analytics_XXXXX.events_20230101`
LIMIT 10;

This query selects all columns from a specific day's events table and limits the results to 10 rows. It gives you a glimpse of the raw data structure, including the nested fields we'll need to work with.

Unnesting Event Parameters: The Core Technique

GA4's data structure in BigQuery uses nested arrays for event parameters, which means we can't directly access parameters like page_location or ga_session_id with simple column references.

The UNNEST Function: Your Secret Weapon

The key to working with GA4 data is the UNNEST() function, which transforms an array into a table structure we can query. Let's see it in action:

SELECT
  event_date,
  event_timestamp,
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
FROM
  `your_project.analytics_XXXXX.events_20230101`
LIMIT 1000;

This query extracts the page_location parameter from the nested event_params array and presents it as its own column. The subquery is saying: "Look inside the event_params array, find the parameter with key 'page_location', and give me its string value."

Working with Multiple Parameter Types

GA4 stores different types of values in different fields within each parameter. For example:

  • String values are in value.string_value

  • Integer values are in value.int_value

  • Float values are in value.double_value

Let's expand our query to include the session ID, which is stored as an integer:

SELECT
  event_date,
  event_timestamp,
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
FROM
  `your_project.analytics_XXXXX.events_20230101`
LIMIT 1000;

Querying Across Date Ranges

Using Wildcards for Date Ranges

Rather than querying a single day's table, you can use a wildcard to query across multiple days:

SELECT
  event_date,
  event_name,
  COUNT(*) AS event_count
FROM
  `your_project.analytics_XXXXX.events_*` -- This queries all available daily tables
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230131' -- Limit to January 2023
GROUP BY
  event_date, event_name
ORDER BY
  event_date DESC, event_count DESC;

This query counts events by date and name across the entire month of January 2023.

Converting Date Formats

GA4 stores dates as strings in the format 'YYYYMMDD', which isn't ideal for visualization. Let's convert this to a proper date format:

SELECT
  PARSE_DATE('%Y%m%d', event_date) AS formatted_date,
  event_name,
  COUNT(*) AS event_count
FROM
  `your_project.analytics_XXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
GROUP BY
  formatted_date, event_name
ORDER BY
  formatted_date DESC, event_count DESC;

This query uses the PARSE_DATE() function to convert the string date into a proper DATE data type that Looker Studio can interpret correctly.

Advanced Querying Techniques

Creating Unique Session IDs

A common challenge with GA4 data is that the ga_session_id is actually just a timestamp of when the session started. For high-traffic sites, multiple users might have sessions starting at the exact same microsecond.

To create truly unique session IDs, we can concatenate the user ID with the session ID:

SELECT
  event_date,
  event_name,
  user_pseudo_id,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS raw_session_id,
  CONCAT(
    user_pseudo_id, 
    '-', 
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
  ) AS unique_session_id
FROM
  `your_project.analytics_XXXXX.events_*`
LIMIT 1000;

This creates a composite key that uniquely identifies each session, even on high-traffic sites.

Using Window Functions for User Analysis

Window functions allow us to perform calculations across rows related to the current row. Let's use them to identify a user's first interaction timestamp:

SELECT
  event_date,
  event_timestamp,
  event_name,
  user_pseudo_id,
  MIN(event_timestamp) OVER (PARTITION BY user_pseudo_id) AS first_touch_timestamp,
  -- If the current event timestamp equals the first touch timestamp, this is the user's first event
  CASE 
    WHEN event_timestamp = MIN(event_timestamp) OVER (PARTITION BY user_pseudo_id) 
    THEN 'Yes' 
    ELSE 'No' 
  END AS is_first_event
FROM
  `your_project.analytics_XXXXX.events_*`
LIMIT 1000;

This query identifies the earliest timestamp for each user and adds a flag indicating whether the current event was their first interaction with your site.

Source/Medium Analysis

Let's extract traffic source information for session start events:

SELECT
  PARSE_DATE('%Y%m%d', event_date) AS date,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
  COUNT(DISTINCT CONCAT(user_pseudo_id, 
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))
  ) AS sessions
FROM
  `your_project.analytics_XXXXX.events_*`
WHERE
  event_name = 'session_start'
  AND _TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
GROUP BY
  date, source, medium
ORDER BY
  date DESC, sessions DESC;

This query shows the number of sessions by source and medium over time, giving you insight into which traffic sources are driving the most engagement.

Cross-joining for Complete Parameter Exploration

Sometimes you might want to see all event parameters flattened out for easier exploration. We can use a cross join to unnest all parameters:

SELECT
  event_date,
  event_name,
  ep.key AS parameter_name,
  ep.value.string_value AS string_value,
  ep.value.int_value AS int_value,
  ep.value.double_value AS double_value
FROM
  `your_project.analytics_XXXXX.events_20230101` e,
  UNNEST(e.event_params) AS ep
LIMIT 1000;

This approach "explodes" the nested parameters into separate rows, allowing you to see all parameters for all events in a more approachable tabular format.

Common Challenges and Solutions

Dealing with NULL Values

You might notice NULL values in your session ID field or other parameters. This can happen for several reasons:

  1. Not all events include all parameters (for example, page_view events have page parameters, but other events might not)

  2. Users might have denied consent for analytics cookies

  3. Implementation issues with your tracking setup

Always check for NULL values and handle them appropriately in your queries:

SELECT
  event_date,
  event_name,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
  -- Count events with missing session IDs
  CASE 
    WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') IS NULL 
    THEN 'Missing Session ID' 
    ELSE 'Has Session ID' 
  END AS session_id_status
FROM
  `your_project.analytics_XXXXX.events_20230101`
GROUP BY
  event_date, event_name, session_id, session_id_status
ORDER BY
  COUNT(*) DESC;

This helps you identify which events tend to have missing values and assess whether it's a significant issue.

Optimizing for Performance

BigQuery charges based on the amount of data processed, so optimizing your queries is important. Some tips:

  1. Always include a date range using _TABLE_SUFFIX to limit the number of sharded tables scanned

  2. Select only the columns you need rather than using SELECT *

  3. Pre-filter your data with WHERE clauses before performing expensive operations

  4. Consider using a WITH clause to break complex queries into more manageable parts

WITH session_events AS (
  SELECT
    PARSE_DATE('%Y%m%d', event_date) AS date,
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
  FROM
    `your_project.analytics_XXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
    AND event_name = 'session_start'
)

SELECT
  date,
  COUNT(DISTINCT CONCAT(user_pseudo_id, '-', session_id)) AS unique_sessions
FROM
  session_events
GROUP BY
  date
ORDER BY
  date;

This query uses a WITH clause to first filter down to just session start events before performing the aggregation, potentially saving on data processing costs.

Frequently Asked Questions (FAQs)

Why are some of my ga_session_id values NULL?

This typically happens for one of three reasons:

  1. The event occurred outside of a session context

  2. The user declined analytics cookies (if you have consent mode enabled)

  3. There might be an issue with your tracking implementation

Check your implementation and consent settings if you're seeing a large number of NULL session IDs.

How can I filter data to only show specific pages or events?

You can add WHERE clauses with subqueries to filter on specific parameters:

SELECT
  event_date,
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
FROM
  `your_project.analytics_XXXXX.events_*`
WHERE
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') LIKE '%/blog/%'
  -- Only show events from blog pages
LIMIT 1000;

How do I count unique users or sessions over time?

Use COUNT(DISTINCT) with the appropriate identifier:

SELECT
  PARSE_DATE('%Y%m%d', event_date) AS date,
  COUNT(DISTINCT user_pseudo_id) AS unique_users,
  COUNT(DISTINCT CONCAT(user_pseudo_id, '-', 
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))
  ) AS unique_sessions
FROM
  `your_project.analytics_XXXXX.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
GROUP BY
  date
ORDER BY
  date;

This shows daily unique users and sessions over a month.

Is it better to use a CROSS JOIN or subqueries for unnesting parameters?

It depends on your needs:

  • Subqueries (as shown in most examples) are more efficient when you only need a few specific parameters

  • CROSS JOINs are better when you need to explore all parameters or don't know exactly which ones you need

If you're just getting started, try the CROSS JOIN approach to explore your data, then switch to subqueries for your final queries.


Mastering GA4 queries in BigQuery unlocks a new level of analytics power. You can move beyond the limitations of the GA4 interface and access every single piece of data you've collected.

Remember these key points as you build your own queries:

  • Use UNNEST() to access nested parameters

  • Pay attention to value types (string_value, int_value, etc.)

  • Create truly unique session IDs by concatenating user_pseudo_id with ga_session_id

  • Convert dates and timestamps to proper formats for analysis

  • Use window functions for advanced user journey analysis

With these techniques, you'll be able to answer virtually any analytics question by writing the right query. And as you become more comfortable, you can create scheduled queries that automatically refresh optimized tables for your Looker Studio reports.

Happy querying!


Note:

This post is based on a subject covered in the Looker Studio Masterclass Program. To learn more about Looker Studio Masterclass, click here.

Previous
Previous

Advanced Strategies for GA4 Data in BigQuery: Tips & Resources

Next
Next

Decoding GA4's BigQuery Schema: A Complete Guide to Unnesting & Querying Analytics Data