Design a Notification System 🔔

A notification system delivers critical updates—like messages, shipping alerts, or security codes—across multiple channels (Email, SMS, Push, In-App).

🌍

References & Disclaimer

This content is adapted from Mastering System Design from Basics to Cracking Interviews (Udemy). It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.

🚀 Introduction

Building a notification system for 10M+ DAU requires more than just calling an API. It necessitates a distributed, resilient architecture capable of handling bursty events, managing user preferences, and ensuring high delivery rates across global providers.

Notification Events

++++

Engineering

Mar 2025×10 min read

A notification system delivers critical updates—like messages, shipping alerts, or security codes—across multiple cha...

Design a Notification System 🔔

Driptanil DattaSoftware Developer

📋 Requirements

Functional Requirements

Multi-Channel Support: Support for Email, SMS, Push Notifications, and In-App alerts.
User Preferences: Allow users to opt-in/out of specific event types and channels.
Templating: Localized, template-based message generation.
Retry & DLQ: Automatic retries for failed provider calls with dead-letter queueing.
API Access: Unified APIs for triggering notifications and fetching in-app history.

Non-Functional Requirements

At-Least-Once Delivery: Prioritize reliability to ensure critical alerts are not lost.
Low Latency: Near real-time delivery (< 10 seconds for most channels).
High Scalability: Handle flash-sale spikes (100M+ alerts/day).
Observability: End-to-end traceability for every event to provider delivery.

📊 Scale Estimation

DAU: 10 Million users.
Average Load: 5 events/user/day with a 2x channel fan-out = 100 Million notifications/day.
Peak Load: 3x multiplier during incidents or sales = ~3,500 notifications/second.
Storage: Preferences for 10M users + massive log volumes for delivery auditing.

📐 High-Level Architecture

The system is decoupled using a message broker to isolate ingestion from heavy delivery processing.

🏗️ The Final Design - Notification System

A comprehensive view of the entire system, from ingestion to external provider integration.

Notification Final Design

🛠️ Bottlenecks & Strategic Decisions

Third-Party Rate Limits: Providers like Twilio or SendGrid have strict throttles. Use Channel-Specific Queues to buffer traffic and implement circuit breakers to avoid cascading failures.
Preference Lookup Overhead: Querying the main SQL database for every notification is slow. Use Redis to cache user notification settings with a write-through strategy.
Idempotency: Retrying provider calls can lead to "double-pinging." Use a unique event_id or deduplication_key at the worker level to ensure a specific alert is sent only once per user.

💡 Top Interview Questions

Q: How do you handle "Quiet Hours"? The Orchestrator queries the Preference Service for local user time. If it falls within quiet hours, the notification is either queued for later or dropped, depending on the severity level (e.g., OTPs bypass quiet hours).

⚠️

Q: What is a Dead-letter Queue (DLQ) used for here? If a worker fails to deliver after multiple retries (e.g., invalid phone number), the message is moved to a DLQ for manual audit or further automated analysis without blocking the main worker threads.

📱 News Feed 💬 Chat Application