Chapter 9: Implementation

A design document describes what a system should do. Implementation is where that description becomes working software. The gap between design and code is where subtle bugs are born, where performance is won or lost, and where a system's true character emerges. In this chapter, we examine the patterns that appear across all of the systems we have built and the principles that guide their implementation.

The Implementation Pattern

Every service in our planetary scale computer follows the same structural pattern: a shared library that defines the interface, a server binary that implements the logic, and shared state that is managed safely across concurrent requests.

The shared library (lib.rs) defines the procedure identifiers, the request and response structures, and any client-side helper functions. This file is the contract. It is imported by both the server (to implement the procedures) and by clients (to call them). Because the library is shared, changes to it must be made carefully — a change to a request structure without a corresponding change to the procedure identifier will break compatibility between old clients and new servers.

The server binary (main.rs) implements the request handler — a function that dispatches incoming requests to the appropriate handler based on the procedure identifier. Each handler deserializes the request payload, performs the operation, and serializes the response. The server also initializes shared state, registers with discovery, and starts background tasks.

Shared state is wrapped in Arc<Mutex<T>> (or Arc<RwLock<T>> for read-heavy workloads) to allow safe concurrent access from multiple request handler threads. This is the standard Rust pattern for shared mutable state across async tasks.

Background Tasks

Most services need work done outside the request-response cycle. The caching service runs a background task to clean up expired entries. The monitoring service checks for stale heartbeats. The discovery service cleans up stale registrations. The storage service triggers compaction.

These background tasks follow a common pattern: spawn an async task that loops with a sleep interval, acquiring the shared state lock, performing maintenance, and releasing the lock. The key constraint is that background tasks must not hold locks for too long, or they will block request handlers.

Error Handling

Our implementations take a pragmatic approach to error handling. Internal errors (like deserialization failures on well-formed internal traffic) use expect — these indicate bugs, not runtime conditions. External errors (like network failures when contacting other services) are handled gracefully, typically by returning an error response to the client or retrying with backoff.

This distinction matters for operations. A panic from an expect means something is fundamentally wrong and the service should restart. A graceful error means the service is functioning correctly but encountered a transient problem in its environment.

Testing

The interface-first design pattern naturally supports testing. Because each service's interface is defined as typed structures, unit tests can construct request payloads, call handlers directly, and verify response payloads without starting a server or making network calls. Integration tests start the full server and make RPC calls to verify end-to-end behavior.

The most valuable tests for distributed systems are not unit tests or integration tests but fault injection tests: what happens when the discovery service is unavailable? What happens when a storage write fails? What happens when a consensus member crashes mid-replication? These tests verify the system's resilience, which is ultimately what matters at planetary scale.