Chapter 15: Capacity
Every system has limits. A server can only handle so many requests per second. A disk can only store so many bytes. A network link can only carry so much bandwidth. Capacity is the measurement of these limits and the practice of ensuring that a system has enough resources to meet its workload.
Capacity is measured at every level of the stack. At the hardware level: CPU cores, memory gigabytes, disk IOPS, network bandwidth. At the software level: requests per second, concurrent connections, queue depth, cache hit rate. At the service level: the number of users that can be served, the volume of data that can be stored, the latency that can be achieved.
The relationship between load and performance is rarely linear. A server that handles 1,000 requests per second with 10ms latency might handle 2,000 with 15ms, 3,000 with 50ms, and collapse entirely at 3,500. Understanding these non-linear relationships — through load testing, modeling, and experience — is essential for capacity planning.
Capacity management is a continuous process. As traffic grows, new features are added, and usage patterns change, the capacity requirements of a system evolve. The monitoring service provides the data needed to track capacity utilization over time and the configuration service allows capacity parameters (like connection pool sizes and cache limits) to be adjusted without redeployment.