6.6. Cleaning Data with CASE Function
The CASE function can be utilized to clean inconsistent or messy data. In this example, we'll explore how to use the CASE function to correct inconsistencies in source medium data. We'll merge some sources, lower case their values, and replace certain values with others.
Suppose you have source medium data with the following issues:
- "direct" and "none" are represented as "not set."
- The values have mixed casing, and you want everything to be lowercase.
- There are multiple instances of "Google" in different formats, and you want to report them all as simply "Google."
Here's how you can use the CASE function to clean this data:
CASE WHEN (source = 'not set' AND medium = 'not set') THEN 'direct none' WHEN (source CONTAINS 'Google') THEN CONCAT('google ', medium) ELSE LOWER(source, medium) END
This function performs the following actions:
- When the source is "not set" and the medium is "not set," it returns "direct none."
- When the source contains the word "Google," it concatenates "google" with the value of the medium.
- Otherwise, it lowercases the source and medium values.
By using the CASE function, you can clean up your data, merge sources, and modify values as needed to create a cleaner and more useful dataset for your viewers or clients.