Chapter 16: Utilization

Capacity tells you how much a system can do. Utilization tells you how much it is doing. Utilization is expressed as a percentage of capacity: a server using 70% of its CPU, a disk that is 85% full, a network link carrying 40% of its maximum bandwidth.

High utilization means resources are being used efficiently, but it also means there is little headroom for spikes in traffic or unexpected failures. Low utilization means the system is over-provisioned — resources are being paid for but not used. The sweet spot depends on the service's requirements: a latency-sensitive service might target 50% utilization to leave room for bursts, while a batch processing system might target 90%.

Utilization must be monitored across all resource dimensions simultaneously. A server with low CPU utilization but high memory utilization is still at risk of failure. The bottleneck resource — the one closest to capacity — determines the system's effective capacity. Identifying and relieving bottlenecks is a core operational skill.

Our monitoring service tracks utilization metrics like cache hit rates, queue depths, and request rates. These metrics, combined with system-level metrics like CPU and memory usage, provide a comprehensive picture of how effectively the system's resources are being used.