System Performance: Speed, Capacity & Efficiency β‘
Performance is a multi-dimensional goal. It's not just about "how fast" a system responds, but how efficiently it meets its functional requirements under varying loads.
This content is adapted from Mastering System Design from Basics to Cracking Interviews (Udemy). It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.
π Key Dimensions of Performance
- Speed: The time taken to complete an operation (Latency).
- Capacity: The volume of work a system can handle (Throughput).
- Efficiency: How much of the system's resources (CPU, RAM, Disk) are consumed to perform the work.
[!NOTE] Performance is a feature, not an afterthought. Poor performance leads to high bounce rates, loss of revenue, and system instability.
Latency vs. Throughput π°
These two metrics are often confused but are fundamentally different:
- Latency: The time taken to process a single request (measured in ms or seconds).
- Throughput: The number of requests processed in a given time period (e.g., Requests Per Second - RPS).
The "Pipe and Bucket" Analogy
Imagine water flowing through a pipe into a bucket:
- Bandwidth: The thickness of the pipe.
- Latency: The time it takes for a single drop of water to travel from one end of the pipe to the other.
- Throughput: The total volume of water that enters the bucket every minute.
Scalability vs. Responsiveness
- Scalability: The ability to handle increased load without performance degradation. (Horizontal vs. Vertical scaling).
- Responsiveness: The system's ability to respond quickly to user input. This is tightly linked to latency.
A good design ensures responsiveness at scale.
Interview Questions - Performance Fundamentals π‘
1. What is the difference between latency and throughput?
Answer:
- Latency is the time for a system to respond (e.g., "fast vs. slow").
- Throughput is the capacity handling (e.g., "high vs. low volume").
- Analogy: Latency is how fast one car travels; throughput is how many cars pass per hour. Optimizing one can often hurt the other (e.g., batching for throughput increases latency).
2. How would you ensure responsiveness in a highly scalable system?
Answer:
- Use asynchronous processing for non-critical paths (background jobs).
- Implement caching layers (Redis, CDN) to reduce backend load.
- Apply rate limiting and load shedding to maintain system health.
- Ensure horizontal scalability of stateless components.
- Monitor tail latencies and auto-scale proactively.
What's next? Learn how we measure these metrics in production β Performance Measurement: SLAs & Percentiles