BigQuery SQL Essentials: Data Analysis Examples and AI Query Generation Tips

In today's data-driven world, mastering SQL fundamentals and knowing how to efficiently use AI for query generation can dramatically improve your analytics workflow. This blog post explores practical data analysis examples using SQL in BigQuery and demonstrates how to effectively leverage AI tools to write complex queries.

From Basic SQL to Real-World Data Analysis

Once you've grasped the fundamentals of SQL (SELECT, FROM, WHERE, ORDER BY, and other core commands) it's time to apply those skills to real business problems. Let's explore how to transform raw order data into actionable customer insights using BigQuery.

Building a Customer Analytics Table

Imagine you're an analyst at an e-commerce company. Your marketing team wants to understand customer behavior to improve targeting and retention. You have access to an orders table, but you need to create a customer-centric view of the data.

Here's how we can transform order-level data into valuable customer insights:

SELECT
  customer_id,
  CONCAT(INITCAP(first_name), ' ', INITCAP(last_name)) AS customer_name,
  state AS customer_state,
  city AS customer_city,
  zip_code AS customer_zip,
  
  -- Lifetime Value Metrics
  SUM(revenue) AS customer_ltv,
  AVG(revenue) AS customer_aov,
  COUNT(DISTINCT order_id) AS frequency,
  
  -- Timeline Metrics
  MIN(order_date) AS first_order_date,
  MAX(order_date) AS last_order_date,
  DATE_DIFF(CURRENT_DATE(), MIN(order_date), DAY) AS customer_age_days,
  DATE_DIFF(CURRENT_DATE(), MAX(order_date), DAY) AS recency_days
FROM
  `looker-studio-masterclass.LSM_2024.online-store-orders`
GROUP BY
  1, 2, 3, 4, 5
ORDER BY
  frequency DESC

This query:

  1. Cleans and formats customer names using text functions

  2. Calculates key metrics like lifetime value (LTV) and purchase frequency

  3. Determines when the customer first and last purchased

  4. Computes metrics for customer age and recency

The power of this approach is that we've gone from transactional data (orders) to relationship data (customers). Now we can easily identify:

  • High-value customers by LTV

  • Most loyal customers by frequency

  • At-risk customers by examining recency

Visualizing Results Directly in Looker Studio

One of BigQuery's strengths is its seamless integration with Looker Studio. After running your analysis, click "Explore Data" to instantly visualize your results:

  1. For city-level analysis, create a bar chart showing average customer LTV

  2. To visualize geographic distribution, create a bubble map with:

    • Location: Concatenate city, state, and zip code

    • Bubble size: Customer LTV

    • Color: Purchase frequency

Visualizing customer concentration and value by location helps identify geographic targeting opportunities

This visualization makes it immediately apparent where your high-value customers are concentrated, providing actionable insights for regional marketing campaigns or expansion planning.

Leveraging AI for SQL Query Writing

Writing complex SQL queries can be time-consuming, especially when dealing with intricate business questions or unfamiliar database structures. AI tools like ChatGPT can significantly accelerate this process, but only if you use them effectively.

The Three-Part Formula for Effective AI Query Generation

To get reliable SQL from AI assistants, I follow a structured approach:

1. Provide Detailed Dataset Context

Before asking for a specific query, thoroughly explain your data structure:

I have a table in BigQuery with information about online store orders. 
The table contains customer details (ID, name, location), order information 
(date, ID, channel), product details, and revenue figures.

2. Share Sample Data

AI works best when it can "see" your actual data patterns. Use a LIMIT query to grab a small sample:

Then copy the results into your AI prompt. This helps the AI understand data formats, values, and patterns.

3. Include the Table Schema

BigQuery lets you copy the JSON schema of your table, which precisely defines column names and data types:

[
  {"name": "order_date", "type": "DATE"},
  {"name": "customer_id", "type": "INTEGER"},
  {"name": "first_name", "type": "STRING"},
  ...
]

Including this in your prompt dramatically reduces the chance of syntax errors in the generated SQL.

RFM Analysis Example

Let's see this in action by asking for an RFM (Recency, Frequency, Monetary) analysis which is a common customer segmentation technique.

After providing the three pieces of context above, I asked:

Please give me the full SQL to run in BigQuery for an RFM analysis of this customer data.
I want to segment customers based on recency, frequency, and monetary value.

The AI generated a comprehensive query using the NTILE function (which divides data into equal groups):

WITH rfm_scores AS (
  SELECT
    customer_id,
    cutomer_name,
    -- Recency Score (lower is more recent)
    NTILE(5) OVER (ORDER BY DATE_DIFF(CURRENT_DATE(), MAX(order_date), DAY)) AS recency_score,
    -- Frequency Score (higher is more frequent)
    NTILE(5) OVER (ORDER BY COUNT(DISTINCT order_id) DESC) AS frequency_score,
    -- Monetary Score (higher is more valuable)
    NTILE(5) OVER (ORDER BY SUM(revenue) DESC) AS monetary_score,
    -- Additional data for analysis
    DATE_DIFF(CURRENT_DATE(), MAX(order_date), DAY) AS recency,
    COUNT(DISTINCT order_id) AS frequency,
    SUM(revenue) AS monetary
  FROM
    `looker-studio-masterclass.LSM_2024.online-store-orders`
  GROUP BY
    customer_id, cutomer_name
)
SELECT
  *,
  -- Calculate combined RFM score (scale of 3-15)
  recency_score + frequency_score + monetary_score AS rfm_score,
  -- Segment customers based on combined score
  CASE
    WHEN recency_score + frequency_score + monetary_score >= 13 THEN 'Champions'
    WHEN recency_score + frequency_score + monetary_score >= 10 THEN 'Loyal Customers'
    WHEN recency_score + frequency_score + monetary_score >= 7 THEN 'Potential Loyalists'
    WHEN recency_score + frequency_score + monetary_score >= 5 THEN 'At Risk'
    ELSE 'Needs Attention'
  END AS customer_segment
FROM
  rfm_scores
ORDER BY
  rfm_score DESC

This query introduces several advanced SQL concepts:

  1. Common Table Expressions (CTEs) - The WITH clause creates a temporary result set

  2. Window Functions - NTILE() and OVER() divide customers into equal segments

  3. Case Statements - Creating meaningful segments based on score ranges

Verifying AI-Generated SQL

While AI can generate impressive queries, always verify what it produces. When the AI used NTILE() (a function I wasn't familiar with), I asked for an explanation:

"What is NTILE? Could you explain how it works in this query?"

The AI explained that NTILE is a window function that divides rows into approximately equal ranked groups. In our case, it creates quintiles (5 groups) for each RFM dimension, with 1 being the lowest and 5 being the highest.

This verification step is critical. Never use AI-generated code you don't understand. View AI as a collaborative partner that drafts code and explains concepts, not as a replacement for your judgment.

Making SQL and AI Work Together in Your Analytics Workflow

The real power comes from combining traditional SQL skills with AI assistance. Here's how to integrate them effectively:

When to Use AI for SQL Generation

AI excels at:

  • Drafting complex queries when you know what you want but aren't sure of the syntax

  • Explaining unfamiliar SQL concepts or functions

  • Troubleshooting errors in existing queries

  • Optimizing query performance by suggesting improvements

However, AI should complement, not replace, your SQL knowledge. Understanding the fundamentals ensures you can verify the AI's output and make necessary adjustments.

Best Practices for AI-Assisted SQL Development

  1. Start with a clear question: The more specific your request, the better the AI's response

  2. Always provide context: Include sample data and schema information

  3. Build incrementally: Start with simple queries and add complexity gradually

  4. Verify and understand: Never use code you don't understand

  5. Learn from the AI: Use AI explanations to expand your SQL knowledge

By following these practices, you'll create a virtuous cycle where each AI interaction improves both your immediate productivity and your long-term SQL proficiency.

Frequently Asked Questions (FAQs)

Can I trust AI-generated SQL for production use?

AI-generated SQL should always be reviewed before production use. While AI tools have improved dramatically, they can still make mistakes, especially with complex database schemas or edge cases. Consider AI a "first draft" that needs your expert review.

What if I don't know enough SQL to verify the AI's output?

Focus on learning SQL fundamentals first. Once you understand basic query structure and logic, you can more effectively use AI as a learning tool. Ask the AI to explain its code in detail, and gradually build your verification skills.

How do I handle when the AI generates SQL with syntax or logical errors?

When errors occur, provide the error message back to the AI and ask for corrections. This iterative process often results in working code and helps you understand common pitfalls. For persistent issues, break down your request into smaller parts.

Do different AI models produce different quality SQL?

Yes, more advanced models like GPT-4 typically produce more accurate and optimized SQL than earlier versions. Specialized SQL-focused AI tools may also outperform general-purpose AI in certain scenarios. If possible, test multiple options to find what works best for your needs.

Is it better to write SQL manually or use AI?

The best approach combines both. Write simple queries manually to maintain your skills, and use AI for complex scenarios or when exploring unfamiliar territory. This hybrid approach maximizes productivity while ensuring you maintain control over your data analysis process.


By mastering SQL fundamentals and strategically leveraging AI assistance, you can dramatically accelerate your data analysis workflows. The combination gives you the best of both worlds: the precision and control of hand-crafted SQL with the efficiency and innovation potential of AI collaboration.

Remember that tools like BigQuery and AI assistants are meant to augment your analytical capabilities, not replace your critical thinking. The most powerful analytics comes from combining technological tools with human insight and domain expertise.


Note:

This post is based on a subject covered in the Looker Studio Masterclass Program. To learn more about Looker Studio Masterclass, click here.

Previous
Previous

Looker Studio Data Blending: A Beginner's Guide to Merging Data

Next
Next

Looker Studio: Unlock Advanced Maps (Heatmaps, 3D, Connections)