SQL for Data Analytics
Advanced SQL patterns for data analysis and reporting.

Page 2: Preparing Your Data β The Foundation of Good Analysis ποΈ
The quality of your analysis is directly proportional to the quality of your data. Data preparation, often called "data munging," is where analysts spend 80% of their time.
This content is adapted from Mastering System Design from Basics to Cracking Interviews (Udemy). It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.
Understanding Data Types
Before querying, you must understand what you're working with:
- Structured Data: Fixed format, like rows in a SQL table.
- Unstructured Data: No predefined model (e.g., PDFs, images, raw text).
- Quantitative: Numerical values (e.g., Revenue, Page views).
- Qualitative: Categorical descriptions (e.g., User Status, Product Category).
First-, Second-, and Third-Party Data
- First-Party: Data you collect directly (e.g., your app's logs).
- Second-Party: Another company's first-party data shared with you.
- Third-Party: Data purchased from aggregation services (e.g., Nielsen).
SQL Query Structure Refresher
A well-structured query follows this logical order:
SELECT(What)FROM(Where)JOIN(Connect)WHERE(Filter rows)GROUP BY(Aggregate)HAVING(Filter groups)ORDER BY(Sort)
Data Profiling Techniques
Profiling helps you understand the shape and health of your data:
- Histograms: Distribution of values.
- Frequencies: How often categories appear.
- Binning: Grouping continuous values into discrete ranges.
- N-tiles: Splitting data into buckets (quartiles, deciles).
Cleaning and Transforming Data
Detecting Duplicates
Use GROUP BY and COUNT(*) > 1 to find records that shouldn't repeat.
Handling Nulls
Use COALESCE(field, 'Replacement') to provide defaults and NULLIF(a, b) to prevent division by zero errors.
Type Casting
Use CAST(field AS TYPE) or field::TYPE to ensure your numbers act like numbers and dates act like dates.
Logic with CASE
Use CASE WHEN to create new flags or categories dynamically based on multiple conditions.
Pivoting and Unpivoting Data
Reshaping data is crucial for reporting:
- Pivoting: Turning long format (rows) into wide format (columns).
- Unpivoting: Turning wide format into long format for better normalization and analysis.
