Chapter 43: Communication

During a site event, communication is as important as technical response. Users need to know that a problem exists, that it is being worked on, and when it is expected to be resolved. Internal teams need to coordinate their efforts, share findings, and avoid duplicating work. Effective communication can be the difference between a well-managed incident and a chaotic one.

Internal communication during an incident typically uses a dedicated channel (a chat room, a bridge call, or both) where all responders can share observations and coordinate actions. An incident commander leads the response, delegating tasks, tracking progress, and making decisions. A scribe documents the timeline of events, actions taken, and their outcomes, creating the raw material for the post-incident review.

External communication requires balancing transparency with accuracy. Premature statements about root causes can be wrong and erode trust. Status page updates should state what is known (the scope and impact of the incident), what is being done (the mitigation actions underway), and when the next update will be (to set expectations). It is better to say "we are investigating" than to speculate about causes.

The post-incident review is the most valuable communication artifact. Written as a blameless document, it describes the timeline, root causes, contributing factors, and corrective actions. These reviews, shared across the organization, build institutional memory and prevent the same class of incident from recurring. The best engineering organizations treat post-incident reviews not as bureaucratic overhead but as one of their most important learning mechanisms.