Chapter 17: BigQuery: Hands-on Workshop
17.10. SQL Join in BigQuery
Setting Up the Query
In data modeling in the BigQuery lesson, we covered preparing data and joining tables. Now we will dive deeper into joining two tables: Google Search Console as the left table and Google Analytics data as the right table.
First, we'll create clean versions of both tables using common table expressions (CTE). CTEs, which start with WITH
, help us define subqueries that we can use later in our main query.
WITH
GA AS (
SELECT ...
),
GoogleSearchConsole AS (
SELECT ...
)
Now we'll join these two tables based on their landing pages:
SELECT *
FROM GoogleSearchConsole
LEFT JOIN GA
ON GoogleSearchConsole.landing_page = GA.landing_page;
Before running the join, let's see how many rows we have in our initial dataset - it should be 266.
Troubleshooting the Join
After running the join, we get 488 rows instead of 266. Let's figure out why this is happening. It turns out that there are more search queries for each landing page in our dataset.
To fix this issue, we need to create another intermediary table called GA_agg
to aggregate sessions, goal completions, and goal values by landing page:
WITH
GA_agg AS (
SELECT
landing_page,
SUM(sessions) AS sessions,
SUM(goal_completions) AS goal_completions,
SUM(goal_value) AS goal_value
FROM GA
GROUP BY landing_page
)
Now let's update our main query to use GA_agg
instead of GA
:
SELECT *
FROM GoogleSearchConsole
LEFT JOIN GA_agg
ON GoogleSearchConsole.landing_page = GA_agg.landing_page;
After running this updated query, we get 266 rows as expected.
Selecting Specific Columns
Our current query selects all columns from both tables, but we only want specific columns from the Google Analytics table. We can update our SELECT
statement to include only the desired columns:
SELECT
GoogleSearchConsole.*,
GA_agg.sessions,
GA_agg.goal_completions,
GA_agg.goal_value
FROM GoogleSearchConsole
LEFT JOIN GA_agg
ON GoogleSearchConsole.landing_page = GA_agg.landing_page;
Now our query returns a cleaner table with landing page impressions, clicks, sessions, goal completions, and goal value for each search query - just what we wanted!
📩 Receive my weekly Looker Studio tips
🖇 Connect with me on LinkedIn