17.1. BigQuery & Looker Studio: A Hands-on Workshop
In our last session, we explored BigQuery and its various applications. Today, we're diving in headfirst and creating an end-to-end data pipeline for analysis, data extraction, and reporting from scratch in under 90 minutes.
Our goal is to perform search query analysis using data from both Google Search Console and Google Analytics. We will then combine this information to assess the impact of search queries on conversions and value creation on our website.
Search Query Data
While Google doesn't provide us with keyword-based traffic information for each session in Google Analytics, we can obtain search query data from Google Search Console. This includes metrics like the number of impressions and clicks for each query.
Combining Data Points
To gain more insights from this search query data, we need to combine it with landing page sessions, conversions, and goal conversion values from Google Analytics. By doing so, we'll be able to see how much each search query contributes to traffic acquisition, goal conversion, and goal value.
Calculating Performance Rank
Once we have all the necessary data points combined, it's time to calculate a performance rank for each search query. To do this, we'll create a metric called "value per thousand impressions." This metric will show us how much value (as defined by goal values in Google Analytics) is generated by every 1000 impressions of a particular search query in the search result pages.
The final result within Looker Studio should display:
- Search Query
- Number of Impressions (from Google Search Console)
- Number of Clicks (from Google Search Console)
- Goal Completions
- Goal Value (from Google Analytics)
We will also calculate the "value created per thousand impressions" by dividing the goal value for each search query by the number of impressions.
Setting Values & Goals in Google Analytics
For this exercise, we'll be using a personal website account with three goals and no transactions. Here's a brief overview of the goals and their values:
- Page Views Goal: 3 or more pages viewed per session, valued at $0.10.
- Affiliate Link Click Goal: Clicking on an affiliate outbound link, valued at $1 per click (due to expected conversions and commissions).
- Session Duration Goal: Staying on the site for more than 5 minutes during a session, valued at $0.10.
With these values in place, we can better understand the performance of different search queries and improve our overall strategy accordingly.
17.2. Planning the Data Pipeline
In this lesson, we'll create a data pipeline from scratch to extract data from Google Analytics and Google Search Console, process it in BigQuery, and visualize it in Looker Studio.
First, we want to grab data such as session, landing page, number of sessions, and poll completions from Google Analytics. We also want to extract search query, landing page, impressions, and clicks data from Google Search Console. To do this, we'll use Supermetrics for Sheets to connect to both APIs and pull the data into separate Google Sheets.
Moving Data to BigQuery
Instead of performing all the data modeling in Sheets like we used to do traditionally, we will bring each Google Sheet into BigQuery as separate tables. So we will have two BigQuery tables representing the extracted data from both sources.
Processing Data in BigQuery
Once our tables are created in BigQuery, we'll perform all necessary operations like cleaning up, processing, joining, merging and calculations there.
Connecting Looker Studio
Finally, after processing our data in BigQuery, we'll connect it directly to Looker Studio for visualization purposes.
Understanding The Data Relationship
It's important to note that multiple search queries can point to one landing page. There isn't a one-to-one relationship between queries and landing pages. All these queries with their impressions and clicks are creating sessions in Google Analytics.
We need a way to calculate the share that each query has in creating these sessions based on their clicks' contribution because the value generated cannot be directly attributed entirely to each query.
Calculating Attribution Shares
To calculate attribution shares for each query:
- Calculate the share of total depending on clicks.
- Multiply the total number of sessions by that share of total.
This calculation will give us an accurate distribution of attribution for each query in terms of sessions, conversions, and value.
After all the calculations and data analysis are done in BigQuery, we'll have a comprehensive view of the data in Looker Studio, making it easier to understand and visualize the relationship between queries, clicks, sessions, conversions, and generated value.
17.3. Getting Data into BigQuery
In this lesson, we'll go over how to create a data pipeline, review raw data from Google Sheets, and import that data into BigQuery. Although we won't have time to cover everything in depth, this example is designed to demonstrate as many features in BigQuery as possible that you'll likely use when working with Looker Studio.
Reviewing Data from Google Sheets
We have two sets of data in Google Sheets obtained using Supermetrics:
- Google Search Console Data: Search queries, landing pages, impressions, and clicks.
- Google Analytics Data: Landing page data source medium with sessions, goal completions, and total goal value.
Our goal is to import these datasets into BigQuery.
Setting up a BigQuery Project
First things first - navigate to the BigQuery console. If you don't have an account yet, follow the prompts to create one and enter your billing information.
Once you're logged in, you'll notice that BigQuery is structured around projects. Each project governs access to different resources within Google Cloud (including BigQuery) and handles billing.
To create a new project:
- Click on the project dropdown at the top of the page.
- Select New Project.
- Give your project a name (e.g., Profitable Dashboard BQ).
- Choose your organization (this will inherit billing settings from it).
Wait for a moment while BigQuery creates your new project.
Accessing BigQuery within Your Project
Now that your project has been created:
- Select it from the list of projects.
- On the right-hand side of the screen, find BigQuery under one of the tools available in Google Cloud Platform.
- Pin it at the top for quick access.
Most of your time in BigQuery will be spent within the SQL Workspace.
Exploring the SQL Workspace
Now that we're in the SQL Workspace, we can start building our data pipeline and importing the data from Google Sheets. In future lessons, we'll dive deeper into using BigQuery and Looker Studio to manipulate and visualize this data.
17.4. SQL Workspace in BigQuery
Welcome to the SQL Workspace in BigQuery, where we'll explore how to create projects, datasets, and tables within Looker Studio.
Hierarchy in BigQuery
BigQuery has a hierarchy similar to Google Sheets. In Google Sheets, you have an account with multiple spreadsheets nested inside. Similarly, in BigQuery:
- Projects act like your account
- Datasets are contained within projects
- Tables and views are housed within datasets
Creating a Dataset
Since our project is currently empty, let's create a dataset:
- Click on your project name (e.g., profitable_dashboards_bq)
- Click the
+icon to create a new dataset
- Give your dataset an ID (e.g., query_analysis)
- Choose the data location (US or EU servers)
Now you have both a project and a dataset.
There are various naming conventions for datasets, tables, and fields in BigQuery and SQL:
- Snake case: all lowercase characters with underscores separating words (e.g., query_analysis)
- Camel case: starts with lowercase; each new word begins with an uppercase letter (e.g., queryAnalysis)
- Pascal case: every word starts with an uppercase letter (e.g., QueryAnalysis)
I recommend using snake case since it's widely used by Google and others in advanced data processing scenarios like GA4 to BigQuery data transfer.
Now that we have our dataset created, it's time to bring some data into it!
17.5. Bringing a Google Sheet into BigQuery
In this lesson, we're going to explore how to import your Google Sheet data into BigQuery as a table. We will cover two different methods, starting with the simplest one: importing a CSV file.
Importing GA Data as a CSV
First, let's export the GA data in CSV format:
- Go to your Google sheet.
- Click on File > Download > Comma-separated values (CSV).
- Save the file to your desired location.
Now that you have downloaded the GA data as a CSV file, let's bring it into BigQuery.
Creating a Table in BigQuery
Follow these steps to create a table from the downloaded CSV file:
- Go to BigQuery and navigate to your dataset (make sure it's empty).
- Right-click on the dataset and click Create Table.
- You will be presented with options for creating your table.
Selecting File Source
Choose these options:
- Create table from: Select "File".
- Click Browse and select the previously downloaded CSV file (sample_data_ga_data.csv).
Now you need to set up where this new table should be created:
- Choose your project and dataset.
- Name your new table (e.g., "ga" for GA data).
- For Table type, choose "Native Table".
Next, you'll decide on the schema for the new table:
- Choose either "Auto detect" or "Define schema manually".
- If defining manually, input each field name and its corresponding data type.
Once you've set up your schema, click on Create Table.
Your Google Sheet data has now been imported into Looker Studio as a native table within BigQuery!
17.6. Table Schema in BigQuery
Understanding schema is essential when working with data in Looker Studio and BigQuery. In this lesson, we will explore what a schema is and how to define it when importing data.
What is Schema?
A schema represents the structure of your data, including column names and their respective data types. For example, a table might have five columns:
total_value. These columns could have different data types such as strings, integers, or floats.
Defining Schema in BigQuery
When importing data into BigQuery, you can either define the schema manually or rely on BigQuery to auto-detect it for you. Auto-detection can save time by analyzing the columns and determining their appropriate data types.
Under advanced options during import, you can specify whether an existing table should be overwritten or appended to if it already exists. Additionally, you can specify the number of header rows to skip during import; this ensures that column names are not mistakenly imported as row values.
Verifying Your Table Schema
Once your table has been created in BigQuery, you can check its schema by clicking on the table within its dataset. You'll see three tabs: Schema, Details, and Preview.
- The Schema tab displays your table's fields (columns), including their names and data types.
- The Details tab provides information about the table ID, path to the table, size of stored data, creation time, etc.
- The Preview tab allows you to view a sample of your imported rows without loading all available rows since tables in BigQuery can contain millions or even billions of records.
In conclusion, understanding and defining schemas in Looker Studio and BigQuery are crucial aspects of managing your data. By using the auto-detection feature and verifying your table schema, you can ensure that your data is correctly structured and ready for analysis.
17.7. External Tables in BigQuery
In this lesson, we'll explore how to create an external table in BigQuery that reads data directly from a Google Sheet.
Connecting Google Sheets with BigQuery
BigQuery can read data directly from a Google Sheet without having to download it. This is because both products are part of the Google ecosystem, making it easy to integrate them. By connecting the sheet as an external table, we can ensure that any changes made in the sheet will be reflected in our queries.
First, copy the URL of your Google Sheet as you'll need it later. Then, follow these steps:
- Click on the ellipses (three dots) next to your dataset.
- Select "Create table".
- Choose "Google Drive" as the source of your data.
- Paste the URL of your Google Sheet.
- Select "Google Sheets" as the file format.
- Define your sheet's range (e.g.,
Defining Destination and Schema
After defining your source, you'll need to specify your destination and schema:
- Choose the same project and dataset as before.
- Name your table (e.g., "Google_Search_Console").
- Set the table type to "External Table".
- Allow BigQuery to auto-detect the schema by selecting "Auto detect".
Before creating the table, ensure that you've set up header skipping correctly by specifying how many rows should be skipped (usually one row for headers). If not done correctly, field names may not be properly detected.
Deleting and Recreating Tables
If you've made a mistake while creating your external table, you can delete it and create another one:
- Click on the ellipses next to your table.
- Select "Delete".
- Type "DELETE" when prompted.
Remember that deleting a table is a permanent action and cannot be undone.
Working with External Tables
Once your external table is created, you'll notice some differences compared to native tables:
- The "Details" tab will show zero bytes, as the data is still stored in the Google Sheet, not in BigQuery.
- There's no "Preview" tab available for external tables.
To access the data within an external table, you need to run a query. Keep in mind that every time you want to view the data, you'll have to query it since BigQuery doesn't store the actual data from Google Sheets.
17.8. Writing SQL in BigQuery
In this lesson, we will explore writing SQL in BigQuery and understand the differences between native and external tables when working with Looker Studio.
Split Tab View in BigQuery
When working with BigQuery, you can use the split tab view to see the schema of a table or details of a table as you write queries on the side. This allows you to work with different queries side by side and see how they interact.
For example, we can select everything from a table within our project:
SELECT * FROM `project_name.dataset_name.table_name`
Comparing Native and External Tables
To demonstrate the differences between native and external tables, we will run simple queries on both types of tables. One table is from Google Search Console (external), while the other is from Google Analytics (native).
After running both queries, we can compare their execution details to see how long it took for BigQuery to perform each calculation. The native table query is significantly faster than the external one.
- Data is stored within BigQuery
- Faster query execution
- Static data (not live)
- Data accessed through an API (e.g., Google Sheets)
- Slower query execution due to API requests
- Live data that can change
Working with Live Data
With external tables like Google Search Console data, changes made in the source (e.g., Google Sheets) are reflected immediately when running queries in BigQuery. This makes it ideal for scenarios where live data updates are crucial.
However, if you change the name or range of your source data sheet, it might cause errors when querying that table in BigQuery because it relies on specific references.
In Practice: Data Modeling with Native and External Tables
Now that we have an understanding of native and external tables' key differences, we can begin working with our Google Search Console and Google Analytics data to perform more advanced data modeling in Looker Studio.
17.9. Data Modeling in BigQuery
In this lesson, we'll walk through data modeling in BigQuery using Google Search Console and Google Analytics data as examples.
Preparing the Data
First, let's prepare our data. We have two datasets: Google Search Console and Google Analytics. We'll start by removing rows with zero clicks from the Google Search Console dataset, as they don't provide any valuable insights for our analysis.
SELECT * FROM search_console_data WHERE clicks > 0
Next, we need to filter the Google Analytics dataset to only include traffic from Google Organic:
SELECT * FROM google_analytics_data WHERE source_medium = 'google/organic'
Standardizing Landing Pages
In order to join these two tables together, we need to standardize the format of the landing pages. For example, we need to remove
https:// and ensure that both formats have a trailing slash at the end.
For Google Search Console data:
SELECT search_query AS search_query, REPLACE(landing_page, 'https://', '') AS landing_page, impressions AS impressions, clicks AS clicks FROM search_console_data WHERE clicks > 0
For Google Analytics data:
SELECT REGEXP_EXTRACT(landing_page_path, r'^[^?]*') AS landing_page, source_medium AS source_medium, sessions AS sessions, goal_completions_all_goals AS goal_completions, total_goal_value_all_goals AS goal_value FROM google_analytics_data_cleaned_up WHERE source_medium = 'google/organic'
Now that we've standardized the landing pages in both datasets, we can proceed with joining them.
We want to join these tables using a left join, with the Google Search Console dataset as the left table and the Google Analytics dataset as the right. We'll use the
landing_page field as our join key.
SELECT search_query, landing_page, impressions, clicks, sessions, goal_completions, goal_value FROM search_console_data_cleaned_up LEFT JOIN google_analytics_data_cleaned_up ON search_console_data_cleaned_up.landing_page = google_analytics_data_cleaned_up.landing_page
With this data, we can now analyze the impact of search queries on site traffic and conversions. Remember to save your queries so you can easily access them later!
17.10. SQL Join in BigQuery
Setting Up the Query
We're going to join two tables: Google Search Console as the left table and Google Analytics data as the right table.
First, we'll create clean versions of both tables using common table expressions (CTE). CTEs, which start with
WITH, help us define subqueries that we can use later in our main query.
WITH GA AS ( SELECT ... ), GoogleSearchConsole AS ( SELECT ... )
Now we'll join these two tables based on their landing pages:
SELECT * FROM GoogleSearchConsole LEFT JOIN GA ON GoogleSearchConsole.landing_page = GA.landing_page;
Before running the join, let's see how many rows we have in our initial dataset - it should be 266.
Troubleshooting the Join
After running the join, we get 488 rows instead of 266. Let's figure out why this is happening. It turns out that there are more search queries for each landing page in our dataset.
To fix this issue, we need to create another intermediary table called
GA_agg to aggregate sessions, goal completions, and goal values by landing page:
WITH GA_agg AS ( SELECT landing_page, SUM(sessions) AS sessions, SUM(goal_completions) AS goal_completions, SUM(goal_value) AS goal_value FROM GA GROUP BY landing_page )
Now let's update our main query to use
GA_agg instead of
SELECT * FROM GoogleSearchConsole LEFT JOIN GA_agg ON GoogleSearchConsole.landing_page = GA_agg.landing_page;
After running this updated query, we get 266 rows as expected.
Selecting Specific Columns
Our current query selects all columns from both tables, but we only want specific columns from the Google Analytics table. We can update our
SELECT statement to include only the desired columns:
SELECT GoogleSearchConsole.*, GA_agg.sessions, GA_agg.goal_completions, GA_agg.goal_value FROM GoogleSearchConsole LEFT JOIN GA_agg ON GoogleSearchConsole.landing_page = GA_agg.landing_page;
Now our query returns a cleaner table with landing page impressions, clicks, sessions, goal completions, and goal value for each search query - just what we wanted!
17.11. Calculating % Share of Total using SQL
Scenario: Multiple queries sending traffic to a single landing page
In this lesson, we have a scenario where multiple queries send traffic to a single landing page. Let's see if it's the case in our dataset and figure out how we can calculate the share of each keyword. To do this, we need to sort by landing page and order the results accordingly.
Adding a new column for sum of clicks
We want to calculate the sum of clicks for each cluster of landing pages. To do this, we can use the
SUM function in SQL and partition our table by landing page:
SUM(clicks) OVER (PARTITION BY GSC_landing_page) AS sum_LP_clicks
This will create a new column with the sum of clicks for each landing page group.
Calculating query share of total clicks
Next, we want to calculate the share of total clicks for each query. We can do this by dividing the number of clicks by the sum_LP_clicks calculation:
clicks / sum_LP_clicks AS query_share_of_total
Now that we have calculated the query share of total clicks, let's check if there are any duplicate search queries or queries sending data to multiple landing pages.
Checking for duplicate search queries
To check for duplicate search queries, we can use
SELECT DISTINCT search_query FROM shares;
If there are duplicate search queries, their count will be less than 266 (the number of rows in our dataset).
Aggregating results for unique search queries
Now that we have identified duplicate search queries, let's aggregate their results together so that they're combined into one row:
SELECT search_query, SUM(impressions) AS impressions, SUM(clicks) AS clicks, SUM(sessions * query_share_of_total) AS sessions, SUM(goal_completions * query_share_of_total) AS goal_completions, SUM(goal_value * query_share_of_total) AS goal_value FROM shares GROUP BY search_query;
This will sum the number of impressions, clicks, sessions, goal completions, and goal value for each unique search query.
We have successfully calculated the share of each keyword in sending traffic to a landing page and aggregated results for unique search queries. This information can be used to calculate our final rank for a keyword based on its value per thousand impressions.
17.12. Calculating Keyword Ranking in BigQuery SQL
In this advanced Looker Studio lesson, you'll learn how to calculate the value each keyword brings to your website per thousand impressions, using BigQuery SQL.
Creating the Calculation
First, let's create a new calculation by dividing the value by the number of impressions. To get the value per thousand impressions, multiply the entire calculation by 1000:
(value / impressions) * 1000 AS value
This calculation gives you the value each query brings for a thousand impressions from sales.
Sorting by Value
To make your analysis more interesting and relevant, sort your data by value in descending order. This way, the highest values will appear at the top of your results.
ORDER BY value DESC
You might notice some outliers with high values per thousand impressions. These could be due to implementation issues, session timeouts, or even bots.
Saving and Analyzing Your Query
Once you're satisfied with your query and calculations, save it for future reference. Keep in mind that this example is designed to demonstrate an end-to-end data pipeline rather than provide specific optimization advice for SEO or website improvements.
Remember that results may be temporary and subject to change as you refine your calculations or update your data sources over time.
17.13. Saving Queries in BigQuery and Connecting to Looker Studio
Save Query Results
When running a query in BigQuery, you might want to save the results for visualization or later use. You have several options for saving your query results, such as:
- CSV to Google Drive
- CSV to your computer
- Google Sheets (static)
- Copying to clipboard
- BigQuery Table
The most effective approach in this case is defining a BigQuery table within your project's dataset.
Create a New Table
- Run your query.
- Choose the option to save the result.
- Select "BigQuery table."
- Choose your project and dataset.
- Name your table (e.g., "query_analysis_underlying_table").
Now you have a new table in your dataset.
Connect Looker Studio with BigQuery
- Go to Looker Studio and create a blank report.
- Connect it to a data source.
- Choose BigQuery as the data source.
- Select the project and dataset containing your newly created table.
After connecting, you can visualize and analyze your data in Looker Studio using various dimensions and metrics.
Acceleration with BigQuery BI Engine
Your report will benefit from the BigQuery BI Engine, which caches queries, tables, and data in RAM for sub-second querying and analysis capabilities.
However, there's an issue with this setup: if data changes in Google Search Console or Google Sheets, this method won't update the analysis every hour or day as needed.
Creating Live Connections
To address this issue:
- Ensure both tables are external and connected live to Google Sheets so that any changes in Google Sheets will be reflected instantly in BigQuery tables.
- Schedule Supermetrics queries (if possible) to run every hour so that live data is accessed more frequently.
This way, even though your final result isn't live, you'll be able to access updated data more frequently and have more accurate insights in Looker Studio.
17.14. Creating Views in BigQuery
In this lesson, we'll explore how to save a query as a view in BigQuery and understand the differences between views and tables. Then, we'll see how to use these views in Looker Studio.
Saving a Query as a View
When working with queries in BigQuery, you have the option to save your query as a view instead of just saving the result or the query itself. A view is different from a table and will be represented by a different icon within your dataset.
To create a view, simply choose your data center project dataset and use the same name for both the query analysis and underlying table. After saving it, you'll notice that another entity with a distinct icon is added to your dataset.
Clicking on the new view will show you its schema but not its preview. This indicates that it's not an ordinary dataset but rather a definition of data - essentially, it's that specific query running. In Looker Studio, you can connect to both tables and views.
To identify whether you're connecting to a view or table in Looker Studio, it's helpful to add "underlying table" or "underlying view" when naming them since there's no icon available for differentiation at this stage.
Using Views in Looker Studio
When adding your new view to Looker Studio, it will behave similarly to any other dataset. However, keep in mind that unlike tables, views cannot be previewed like extended tables; they must be queried.
If external tables are involved, BigQuery will request live data from sources such as Google Sheets when executing the view query. The result is live data calculated on-the-fly at the time of request.
In Looker Studio, if you duplicate your existing table and connect it to your new view dataset instead of using an actual table source, you should see the same information – with the exception that it's not accelerated by BigQuery. This is because there's no hard data stored in BigQuery for views, only a set of different definitions.
Comparing Views and Tables
The main difference between views and tables is that views provide live data, while tables do not. However, tables can be accelerated by BigQuery Engine, while views cannot. This results in faster execution times for tables compared to views.
If you wish to have live or near-live data with acceleration, one solution is to use scheduled queries in BigQuery. This feature allows you to balance the need for updated information with faster performance.
17.15. Scheduled Queries in BigQuery
In this lesson, we'll explore how to schedule queries in BigQuery to fetch near-real-time data and cache it for faster access in Looker Studio.
Scheduled queries allow us to run a query at preset intervals, such as every hour, 30 minutes, daily or weekly, and save the result in a native table. This way, we can cache the results of our views and make them available for faster access with the BI engine.
Enabling Scheduled Queries
If you're using scheduled queries for the first time in your project or account, you need to enable them by clicking Enable API. This is a one-time setup that takes only a few seconds.
Creating a New Scheduled Query
To create a new scheduled query:
- Start from the BigQuery screen with your desired query already written.
- Click on Schedule Query.
- Give your scheduled query a name (e.g., "query_analysis") and add "_sq" to identify it as a scheduled query.
- Choose how often you want the query to run (daily, hourly etc.) and set start and end dates if needed.
- Set a destination table for your query results by selecting the dataset (e.g., "query_analysis") and table ID (using the same name as your scheduled query).
Configuring Table Settings
When configuring your destination table settings:
- Decide whether you want each run's results appended to existing data or if you want to override existing data with new results.
- Choose where you want your data stored; however, this is optional.
Save your settings and wait for the first run of your scheduled query.
Connecting Looker Studio to Your Scheduled Query
Once your scheduled query has started running:
- Go back to Looker Studio's edit mode.
- Edit your data source and rename it to include "Scheduled Query."
- Edit the connection and switch from connecting to the table to connecting to your scheduled query.
- Reconnect and refresh your data.
Now, you should see that your Looker Studio report is fetching data from the scheduled query, which is accelerated by the BI engine and cached on an hourly basis. This provides a faster, more efficient way of accessing near-real-time data in Looker Studio.