Design a Chat Application (WhatsApp) π¬
Designing a real-time chat application requires handling millions of persistent connections, ensuring sub-second message delivery, and maintaining data consistency across multiple devicesβall while keeping conversations private with end-to-end encryption.
This content is adapted from Mastering System Design from Basics to Cracking Interviews (Udemy). It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.
π Introduction
A modern chat system like WhatsApp must handle massive scale while feeling instant and personal. The architecture centers around persistent WebSocket connections and a distributed presence management system.

π Requirements
Functional Requirements
- Messaging: 1-to-1 and group chat support.
- Statuses: Typing indicators and real-time online status.
- Media: Secure upload and delivery of images, videos, and documents.
- Receipts: Delivery and read acknowledgments.
- Sync: Multi-device support with message history synchronization.
Non-Functional Requirements
- Low Latency: Sub-second message delivery globally.
- High Availability: 99.99% uptime; users expect the service to always be "on."
- Encryption: End-to-end encryption (E2EE) where the server has zero visibility into content.
- Scalability: Support for 100M+ DAU and billions of messages daily.
π Scale Estimation
- DAU: 100 Million users.
- Messages/day: ~5 Billion.
- Peak Load: 3x multiplier during major global events.
- Concurrent Connections: 20β30 million persistent connections at peak.
- Storage: Billions of messages per day requiring partitioned, write-optimized NoSQL stores.
π High-Level Architecture
The core of the system is the Connection Manager, which maintains long-lived WebSocket connections for every active device.

β‘ Message Flow & Sequence
Understanding how a message travels from Sender to Receiver in both 1-to-1 and Group scenarios.

The WebSocket Advantage
Chat applications rely on WebSockets (rather than standard HTTP) because:
- Bi-directional: The server can push messages to the client instantly.
- Lower Overhead: No need to resend headers for every message after the initial handshake.
- Real-time: Essential for typing indicators and presence updates.
ποΈ The Final Design
A comprehensive view of the entire chat ecosystem, including media handling and push fallback.

π οΈ Bottlenecks & Strategic Decisions
- Presence Updates: "Last seen" and typing indicators are high-frequency events. Use a Pub/Sub pattern with Redis to broadcast these efficiently without thumping the main database.
- Storage Write Pressure: 5 Billion messages/day is massive. Use Cassandra or DynamoDB because they are optimized for high-volume append writes and partitioned based on
conversation_id. - Connection Management: Handling 30M concurrent WebSockets is hard. Deploy a dedicated Connection Registry using Redis to track which user ID is connected to which specific WebSocket server instance.
- Encryption (E2EE): Implement key exchange (e.g., Signal Protocol) on the client-side. The server only stores encrypted blobs and metadata, never the raw message.
π‘ Top Interview Questions
Q: How do you handle group messages with 500+ members? Instead of the Chat Service fanning out 500 messages, it places a single message in a queue. A pool of Fan-out Workers then iterates through the member list and pushes the message to each member's active WebSocket connection.
Q: What happens if the receiver is offline? If the Connection Manager detects no active WebSocket for the recipient, the message is stored in the DB as "Pending" and a Push Notification is triggered via FCM/APNs to alert the user.