SQL for Data Analytics
Advanced SQL patterns for data analysis and reporting.

Page 4: Cohort Analysis β Understanding Groups Over Time π₯
A cohort is a group of users who share a common characteristic over a specified period. The most common cohort is based on Acquisition Date (e.g., "Users who joined in January").
This content is adapted from Mastering System Design from Basics to Cracking Interviews (Udemy). It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.
Why Cohorts Matter
Raw metrics like "Total Active Users" can be misleading. A massive influx of new users can hide the fact that your old users are leaving. Cohort analysis reveals the true health of your retention and product-market fit.
Building a Retention Curve
How to calculate retention in SQL
- Define the Cohort: Find the first activity date for every user.
- Calculate Activity Age: Subtract the cohort date from the activity date for every subsequent action.
- Count Unique Users: Group by Cohort Month and Month Number (Month 0, Month 1, etc.).
- Percentage: Divide the count of Month N by the count of Month 0.
Handling Sparse Cohorts
In many systems, users don't perform actions every single period. If you only look at "Month 1 Retention," you might miss users who returned in "Month 2."
- Returnship: Looking for any activity after a certain gap.
- Repeat Purchase Behavior: Tracking the frequency and recency of repeat conversions.
Survivorship and Churn
- Survivorship: What % of the original cohort is still active after X months?
- Churn Rate: The inverse of retentionβhow many users are we losing?
Cumulative Cohort Calculations
Tracking the LTV (Lifetime Value) of a cohort:
SUM(revenue) OVER(PARTITION BY cohort_id ORDER BY month_age)This tells you how much value a specific group of users has generated desde joining.
Cross-Sectional Analysis
Comparing different cohorts against each other at the same "age" (e.g., comparing "Month 3 Retention" for the January cohort vs. the February cohort). This helps determine if product updates or marketing campaigns are improving long-term stickiness.
