++++

Languages

Mar 2025×10 min read

A/B testing is how modern companies make decisions. While statisticians use specialized tools, the core data for expe...

Page 7: Experiment Analysis — Testing Hypotheses with SQL 🧪

Driptanil DattaSoftware Developer

Page 7: Experiment Analysis — Testing Hypotheses with SQL 🧪

A/B testing is how modern companies make decisions. While statisticians use specialized tools, the core data for experiment analysis is almost always calculated using SQL.

🌍

References & Disclaimer

This content is adapted from Mastering System Design from Basics to Cracking Interviews (Udemy). It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.

Strengths and Limits of SQL for Experiments

SQL is the "Engine" of experimentation. Use it to:

Assign users to variants.
Calculate conversion rates.
Aggregate metrics per variant.

Testing for Significance

SQL for the Chi-Squared Test (Binary Outcomes)

Used when checking if a categorical change (e.g., Button Color) affected a binary outcome (e.g., Click vs. No Click). You can calculate the OBSERVED and EXPECTED counts directly in SQL to feed into a significance formula.

SQL for the t-Test (Continuous Outcomes)

Used when checking metrics like "Average Revenue per User." You need the Mean, Variance, and Sample Size (N) for both Control and Treatment groups.

Common Experiment Pitfalls

Variant Assignment: Ensuring users stay in their assigned "bucket."
Outliers: A single whale can ruin a "Revenue per User" experiment. Be ready to trim your data.
Time Boxing: Only counting actions that happened after the user entered the experiment.

Repeated Exposure

Handling users who see the experiment multiple times. Should you count "First Seen" or "Total Touches"? Consistency is key.

When Controlled Experiments Aren't Possible

Sometimes you can't split the traffic. Use these SQL-based alternatives:

Pre/Post Analysis: Comparing the 2 weeks before a change to the 2 weeks after. (Note: Highly susceptible to seasonality).
Natural Experiments: Comparing two similar regions where only one got the update (e.g., California vs. Texas).
Threshold Population Analysis: Comparing users just above a certain threshold to those just below it.

Statistical Power: Correlation does not equal causation. Always ensure your sample size is large enough before drawing conclusions.

6. Anomaly Detection 8. Building Complex Data Sets