Chapter 9: Data Blending in Looker Studio: The Foundations
Data blending is a powerful technique that helps you merge tables and combine data from multiple sources into a single, unified dataset. In this chapter, we'll explore the foundations of data blending and why it matters for your Looker Studio (Google Data Studio) projects.
9.1. Introduction to Data Blending in Looker Studio
Why Data Blending Matters
Understanding the "why" behind data blending can help you appreciate its potential for your data modeling projects. Let's dive into some use cases to see the different end results you can achieve with data blending:
1. Combine Fields from Different Data Sources
One of the primary benefits of data blending is the ability to combine fields from different data sources on a single chart. For instance, you might want to show costs from Facebook Ads, Google Ads, and Microsoft Ads on one chart to get a comprehensive view of your marketing spend across platforms.
2. Aggregate Data from the Same Marketing Tool
You can also use data blending to aggregate data from the same marketing tool but for different accounts. For example, you might want to see a roll-up chart displaying summarized data from multiple Google Ads accounts for all your clients.
3. De-Aggregate and Re-Aggregate Data
Data blending allows you to de-aggregate and re-aggregate data, which is often the only way to achieve the desired results. We'll see an example of this later in the session.
4. Blend Data for Calculated Fields
Data blending enables you to create calculated fields using data from different sources. For example, you might want to calculate the conversion rate using metrics from multiple data sources.
5. Enrich and Widen Data
Data enrichment and widening are useful techniques for adding more context to your dataset. For instance, you can enrich your data by looking up the profit margin for each SKU on an e-commerce site or by adding population data for a city. In both cases, you're "widening" your dataset by adding more columns of information.
9.2. One Chart, One Data Source
The fundamental concept in Looker Studio is that one chart can only connect to a single data source. This rule still holds true even when working with data blending.
Data Blending to the Rescue
Data blending comes to the rescue by allowing you to merge multiple data sources into a single blended data source. For example, if you want to add five data sources to a chart from five different accounts, you can blend those five data sources into one. Now you have a single data source that combines them all, which you can connect to a single chart.
Data Blending Types
Initially, Looker Studio's data blending feature only supported the "left outer join" method, a concept borrowed from SQL. While left outer join proved to be useful in most use cases, there were still scenarios where it couldn't help.
Fortunately, Looker Studio has since added support for more join types, including full outer join, right join, inner join, and cross join. These join types offer more flexibility when blending data from different sources.
For the purpose of this lesson, we'll focus on the foundations of data blending using the left join method. Once you understand the basics, you'll be able to apply the same principles to the other join types in Looker Studio.
To learn more about different join types, you can search for articles or videos about SQL joins, as the concepts are similar.
9.3. Data Blending: Behind the Scenes - Theory
Data blending combines data from different data sources to generate a single, unified data set. In this lesson, we'll focus on the left join method, which retains all the records from the left data source and combines them with matching records from the right data source based on a join key or condition.
Imagine two circles, each representing a data source. The left circle represents the left data source, and the right circle represents the right data source. Data blending using left join works by taking all the records from the left data source and combining them with matching records from the right data source based on the join key or condition.
If there is data in the right data source that doesn't match anything in the left data source, it will be left out of the blended data set.
Consider the following example: you have two tables, one with dates and clicks from an ad platform (left data source) and another with dates and offline sales from a CRM (right data source).
To blend the data, you'll use the "date" field as the join key. The blended data will retain all the records from the left data source (ad platform) and combine them with the matching records from the right data source (CRM). If a date from the left data source has no matching date in the right data source, the sales value will be unknown, but the clicks data will still be retained in the blended data set.
- Data blending with left join retains all records from the left data source and combines them with matching records from the right data source based on a join key or condition.
- If there is data in the right data source that doesn't match anything in the left data source, it will be left out of the blended data set.
- Data blending is similar to performing a lookup for join keys and returning matching values.
Key Concepts about Data Blending
- Join keys and join conditions must match exactly: To successfully blend data, the join keys in both data sources must match exactly. This includes matching formats, capitalization, and the absence of trailing spaces. If the join keys do not match, the blended data will return null values for those records.
- All matching rows from the right data source will be returned for every row in the left data source: Data blending works by looking up the join key in both data sources and returning all matching rows. This can cause duplicate data and incorrect totals if there are multiple matches for a single join key.
Example: Data Blending with Multiple Matches
Consider the example where you have a table with colors and the number of searches for each color (left data source) and another table with translations of the colors into Italian (right data source). In the Italian table, there are two translations for "red" and "yellow" but only one for "blue".
When blending the data using the color as the join key, Data Studio will return both translations for "red" and "yellow", resulting in duplicate rows for these colors. This will cause the total number of searches to be incorrect in the blended data set.
To avoid this issue, you can try the following:
- Eliminate duplicates from the right data source before blending.
- Establish a one-to-one relationship between the tables that you're trying to blend with each other.
When troubleshooting data blending issues, it's helpful to create the tables side by side and analyze them to understand what could be causing duplicate or incorrect totals in the blended data source.
Blending Data with Different Granularity
When blending data from different sources, it's important to ensure that the granularity of the data is consistent. If one data source has a higher level of granularity, such as sales per day, and the other has a lower level of granularity, such as sales per SKU, blending the data can cause problems such as duplicated rows and incorrect totals.
Example: Sales and Clicks Data
Consider a scenario where you have two data sources: one with the number of clicks per day from an ad platform and another with sales data broken down by date and SKU. When blending these data sources, the clicks data will be duplicated for each SKU, causing incorrect totals for clicks.
Fixing Granularity Issues
There are a few ways to fix this problem:
- Aggregate the sales data by day to match the granularity of the clicks data. This would remove the SKU breakdown and make the data sources more compatible for blending.
- Obtain the clicks data with the same level of granularity as the sales data, breaking it down by SKU. This would allow for a more accurate blend of the data.
Mental Checklist for Data Blending in Looker Studio
When working with large amounts of data, it's essential to verify that the blended data is accurate. Here's a mental checklist to ensure data is preserved correctly during the blending process:
- Ensure a one-to-one relationship between join keys: Make sure that the join keys are unique in both data sources to avoid duplicate rows.
- Check the total number of rows before and after blending: The total number of rows should remain consistent before and after blending. If the number of rows increases after blending, it's likely that there are duplicates in the data.
- Clean up data before blending: Go through the data sources and eliminate any inconsistencies or duplicates that may cause issues during the blending process.
When working with large datasets, it's helpful to double-check the data and perform any necessary cleanup before blending to ensure accurate results. If you notice discrepancies in the blended data, such as an increase in the number of rows or incorrect totals, it's crucial to investigate the cause and resolve it before proceeding with data analysis and visualization.
9.4. Data Blending: The Simple Blend
Let's dive into Looker Studio and explore the simplest form of data blending. Imagine we have sales data from a Google Sheet and another data point such as the number of clicks, impressions, or cost from another data source like a GA demo account. Although these are not tables, they're simple numbers that can be blended together to provide valuable insights.
The Need for Blending Metrics
We want to access both of these metrics in a single data source. But why do we need sales and clicks in a single data source? We may want to calculate conversion rates, ratios, or even add or subtract these metrics from each other. However, we can't do that because they exist in separate data sources, and one chart cannot be connected to more than one data source.
Blending Data in Looker Studio
To blend these data points, we'll select the two scorecards from the different data sources that we want to combine. By right-clicking and selecting "blend data," a new scorecard will appear with the blended data. This new data source is called "blended data number one."
In the blended data source, we only have access to two fields: clicks and sales. Looker Studio tries to guess what kind of calculations we want to make, such as clicks divided by sales. However, we may want to divide sales by clicks instead. We can click on the metric to either show the number of clicks directly or create a field to do the opposite calculation, such as the sum of sales divided by the sum of clicks.
Remember, when blending data, it's essential to ensure that the join keys in both data sources have a one-to-one relationship, meaning that each key appears only once in each data source.
Data Blending: The Simple Self Blend
Sometimes, you might want to blend two subsets of metrics from the same data source. This process is called a simple self blend. In this lesson, we'll learn how to blend two metrics from the same data source with different filters, segments, or date ranges applied.
Working with Subsets of Metrics
Imagine we have Google Analytics as a data source, and we want to blend two subsets of the total events metric with different filters applied. For example, we want to calculate the conversion rate of a contact form by dividing the number of submissions by the number of impressions.
However, we cannot use a subset of a metric with a specific filter applied in a calculation directly. Instead, we can blend the data source with itself to create a self blend.
Blending Subsets of Metrics
To blend the two subsets of metrics, we'll select the two scorecards with the specific filters applied. In our example, both scorecards show the total events metric but with different filters applied for contact form submissions and contact form impressions.
When we blend the data, Looker Studio automatically tries to figure out the calculation we want to perform. In our example, it divides the number of contact form submissions by the number of contact form impressions.
Data blending allows us to define subsets of metrics, either by segmenting, filtering, or applying a specific date range as a single metric, and then use them in calculations.
Applying Date Ranges
When blending data with different date ranges, you might need to reapply the date ranges in the blended data source to ensure the calculation is accurate. In our example, we had to reapply the date ranges for contact form submissions in 2021 and 2022 to calculate the correct conversion rate.
Use Cases for Simple Self Blends
Simple self blends can be particularly useful for funnel tracking. If you have a funnel with multiple steps, you can filter page views or events by each step in the funnel and blend them together. This process allows you to report on the number of users across the funnel and use the segmented metrics in calculations.
9.5. Data Blending: Behind The Scenes - Live Demo
Understanding Data Blending with a Live Example
We previously explored data blending in slides. Now, let's examine how it works with real tables in Looker Studio.
Imagine we have two tables: one with dates and clicks from the 1st to the 6th of March 2021, and another with dates and sales for some of those dates. However, we don't have the number of sales for all of the dates.
When data blending is applied, it will return a table with all the dates from the left data source, clicks from the left data source, and sales where it can find it from the right data source. When it cannot find matching data, it will show "null" values.
Key Takeaways from the Live Example
- Joint key: In this example, the joint key is "Date." It is essential to have one or more joint keys to blend data.
- Left join: In this scenario, we use a left join with one condition – that the date in Table 1 should exactly match the date in Table 2 for data from Table 2 to be written.
- Nulls: Null values are not zeros or empty strings. They essentially mean "nothing" and don't have a field type. It's crucial to differentiate null values from zero, especially when dealing with missing data.
- Missing data representation: In Looker Studio, you can choose how to represent missing data in tables. Options include showing "null" as it is, a dash, "no data," zero, or blank. However, zero and blank can be confusing, as they may not accurately represent the data.
Remember that null values might indicate gaps in your data that need to be fixed. It's good practice to show "null" unless you're certain that missing data means zero.
Dealing with Null Values and Related Issues
When calculating averages, nulls can affect the results. Let's explore how Looker Studio handles nulls and the potential problems with replacing null values with zeros.
Example: Averaging Sales with Null Values
Suppose you want to calculate the average of sales data that includes null values. When Looker Studio calculates the average, it correctly ignores the null values. However, if you replace the null values with zeros, the calculated average might appear incorrect, especially to someone manually calculating it. This is why showing null values instead of zeros is a better practice.
Common Issues in Blending Data
In some cases, you may encounter data application issues when blending data. For example, imagine you have data on clicks and sales for different states on different days. When blending this data, you may end up with duplicate click values, which can lead to incorrect results.
To fix data application issues, you can either:
- Aggregate the right dataset: Remove the unnecessary field (e.g., state) from the right dataset, causing Looker Studio to aggregate the sales data by date. This will result in one sales value for each day, ensuring correct blending.
- Break down the data: Break down the data by both date and state (e.g., clicks from New York and clicks from California). This will allow you to join the data based on two different values (date and state), which will ensure accurate reporting.
Keeping Blended Data Simple
When creating blended data sources in Looker Studio, it's easy to get carried away and include numerous metrics and dimensions. However, this can lead to overcomplicated and confusing data sources, which might result in incorrect numbers or duplicate values.
To avoid these issues, it's best to keep your blended data sources as simple as possible. Focus on including only the necessary metrics and dimensions for the resulting chart or visualization you want to create. This will make it easier to manage, understand, and troubleshoot your data blends.
Remember, a simple and streamlined approach to data blending can save you time and prevent potential errors in your reports.
9.6. Data Blending: Calculated Fields & Dealing with Nulls
When working with calculated fields in blended data sources, handling null values is essential. In some cases, null values might indicate an issue with the incoming dataset that needs to be fixed. However, if you know that null values are expected and represent a specific meaning, such as zero sales, you can use functions like IFNULL, NULLIF, or COALESCE to replace null values with a default value.
Handling Nulls in Calculated Fields
- Understand the reason for nulls: Before attempting to replace nulls, make sure you understand why they are present in your data. They might indicate a data gap or an issue with the incoming data that needs to be fixed.
- Use IFNULL, NULLIF, or COALESCE functions: These functions can be used to replace null values with a default value. For example, you can use IFNULL to replace a null sales value with zero, allowing you to calculate the sales conversion rate accurately.
- Be mindful of potential issues: Replacing null values with default values can cause problems in your calculations if not handled correctly. For instance, if you replace a null value with zero and use it as a divisor, you might encounter division by zero errors.
- Choose an appropriate default value: Depending on the use case and the type of data you are working with, you might want to replace null values with different default values. For example, you can replace null text values with "Not Set" or null numerical values with zero or one, depending on the context.
Remember, always investigate the cause of null values before attempting to replace them with default values. Ensuring the accuracy and integrity of your data is critical for reliable calculations and reporting.
Benefits of Calculated Fields at the Data Source Level
Creating calculated fields at the data source level, particularly within blended data sources, has two main benefits:
- Accessibility: When sharing a report with a team member who only has view access to the data source but can edit the report, they might not be able to create calculated fields at the data source level. However, if a blended data source is created within the report, they have ownership of that blended data source and can create calculated fields within it.
- Joint Key Formatting: Sometimes, joint keys might not match due to differences in formatting, case sensitivity, or other issues. Creating calculated fields at the blended data source level allows you to modify and process joint keys to ensure they match and function correctly in the blending process.
These benefits make it easier for team members to collaborate on reports and help ensure that data blending works correctly, even when dealing with joint keys that require special formatting or processing.