πŸš€
4. Cohort Analysis
Languages

SQL for Data Analytics

Advanced SQL patterns for data analysis and reporting.

Mar 202514 min read
SQL for Data Analytics

Page 4: Cohort Analysis β€” Understanding Groups Over Time πŸ‘₯

A cohort is a group of users who share a common characteristic over a specified period. The most common cohort is based on Acquisition Date (e.g., "Users who joined in January").

🌍
References & Disclaimer

This content is adapted from Mastering System Design from Basics to Cracking Interviews (Udemy). It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.


Why Cohorts Matter

Raw metrics like "Total Active Users" can be misleading. A massive influx of new users can hide the fact that your old users are leaving. Cohort analysis reveals the true health of your retention and product-market fit.

Building a Retention Curve

How to calculate retention in SQL
  1. Define the Cohort: Find the first activity date for every user.
  2. Calculate Activity Age: Subtract the cohort date from the activity date for every subsequent action.
  3. Count Unique Users: Group by Cohort Month and Month Number (Month 0, Month 1, etc.).
  4. Percentage: Divide the count of Month N by the count of Month 0.

Handling Sparse Cohorts

In many systems, users don't perform actions every single period. If you only look at "Month 1 Retention," you might miss users who returned in "Month 2."

  • Returnship: Looking for any activity after a certain gap.
  • Repeat Purchase Behavior: Tracking the frequency and recency of repeat conversions.

Survivorship and Churn

  • Survivorship: What % of the original cohort is still active after X months?
  • Churn Rate: The inverse of retentionβ€”how many users are we losing?

Cumulative Cohort Calculations

Tracking the LTV (Lifetime Value) of a cohort:

SUM(revenue) OVER(PARTITION BY cohort_id ORDER BY month_age)

This tells you how much value a specific group of users has generated desde joining.

Cross-Sectional Analysis

Comparing different cohorts against each other at the same "age" (e.g., comparing "Month 3 Retention" for the January cohort vs. the February cohort). This helps determine if product updates or marketing campaigns are improving long-term stickiness.

Β© 2026 Driptanil Datta. All rights reserved.

Software Developer & Engineer

Disclaimer:The content provided on this blog is for educational and informational purposes only. While I strive for accuracy, all information is provided "as is" without any warranties of completeness, reliability, or accuracy. Any action you take upon the information found on this website is strictly at your own risk.

Copyright & IP:Certain technical content, interview questions, and datasets are curated from external educational sources to provide a centralized learning resource. Respect for original authorship is maintained; no copyright infringement is intended. All trademarks, logos, and brand names are the property of their respective owners.

System Operational

Built with Love ❀️ | Last updated: Mar 16 2026