Chapter 6: Routing
Knowing where a server is located is only half the problem of communicating with it. The other half is getting requests to that server efficiently, reliably, and without overwhelming it. The routing service bridges the gap between discovery (finding servers) and communication (exchanging data with them), adding connection pooling, load distribution, and a clean abstraction for clients.
We introduced the routing service in Chapter 1. Here we examine its implementation in detail: the connection pool that reuses TCP sockets, the semaphore that controls concurrency, and the dual-mode architecture that allows routing to run as either a standalone proxy or an embedded library.
Interface
routing/src/lib.rs
The routing interface consists of a single procedure: route. A client
specifies the name of the target system, the procedure to invoke, and
the payload. The routing service handles all the complexity of finding a
server, managing connections, and forwarding the request:
pub const ROUTE_PROCEDURE: ProcedureId = 1;
#[derive(Debug, Serializable, Deserializable)]
pub struct RouteArgs {
pub name: String,
pub procedure_id: i32,
pub payload: String,
}
#[derive(Debug, Serializable, Deserializable)]
pub struct RouteResult {
pub payload: String,
}
Connection Pool
Establishing a TCP connection requires a three-way handshake, which adds latency to every request. Connection pooling eliminates this overhead by reusing established connections across multiple requests. Each system that a client communicates with has its own pool of connections:
pub struct ConnectionPool {
system_name: String,
pool: Arc<Mutex<VecDeque<TcpStream>>>,
max_size: usize,
semaphore: Arc<Semaphore>,
}
impl ConnectionPool {
pub fn new(system_name: String, max_size: usize) -> Self {
Self {
system_name,
pool: Arc::new(Mutex::new(VecDeque::new())),
max_size,
semaphore: Arc::new(Semaphore::new(max_size)),
}
}
}
The pool uses a VecDeque (double-ended queue) to store idle connections
in FIFO order: the oldest idle connection is used first, ensuring that
connections are exercised regularly and stale ones are detected quickly.
The max_size parameter bounds the total number of concurrent connections
to a given system.
Concurrency Control
The Semaphore is the key to controlling concurrency. Before a request
can use a connection, it must acquire a permit from the semaphore. If all
permits are taken (meaning the maximum number of concurrent requests are
in flight), additional requests block until a permit becomes available:
pub async fn get(&self) -> Option<TcpStream> {
self.semaphore.acquire().await
.expect("Unable to acquire permit").forget();
let mut pool = self.pool.lock().await;
let mut conn = pool.pop_front();
if conn.is_none() {
let address = discovery::query(
self.system_name.clone()
).await;
conn = TcpStream::connect(&address.address).await.ok();
}
conn
}
pub async fn release(&self, conn: TcpStream) {
let mut pool = self.pool.lock().await;
if pool.len() < self.max_size {
pool.push_back(conn);
}
self.semaphore.add_permits(1);
}
The get method first acquires a semaphore permit, then attempts to
reuse an existing connection from the pool. If no idle connection is
available, it discovers a server address and establishes a new connection.
The release method returns the connection to the pool (if there is
room) and releases the semaphore permit so another request can proceed.
This design provides natural back-pressure: when a downstream system is slow, permits accumulate in the semaphore, causing upstream requests to queue. This prevents the client from opening an unbounded number of connections and overwhelming the downstream server.
Request Forwarding
With a connection in hand, the pool sends the request in the same format as a direct RPC call and reads the response:
pub async fn send_request(
&self,
procedure_id: ProcedureId,
payload: &str,
) -> Result<String, String> {
if let Some(mut socket) = self.get().await {
let serialized = format!("{}:{}\n", procedure_id, payload);
if let Err(e) = socket.write_all(serialized.as_bytes()).await {
return Err(format!("Failed to send request: {}", e));
}
let mut buffer = vec![0u8; 1024];
let n = socket.read(&mut buffer).await
.expect("Failed to read from socket");
let response_data =
String::from_utf8_lossy(&buffer[..n]).to_string();
self.release(socket).await;
Ok(response_data)
} else {
Err("Service not available".to_string())
}
}
Dual-Mode Architecture
The routing system can operate in two modes, offering different trade-offs for different use cases.
Proxy mode runs routing as a standalone server. Clients send their requests to the routing server, which forwards them to the appropriate backend. This centralizes connection management and makes it easy to add cross-cutting concerns like logging, rate limiting, and authentication. The proxy server listens on a well-known address and dispatches requests through its own set of connection pools:
async fn request_handler(
request: Request,
shared_state: Arc<Mutex<HashMap<String, ConnectionPool>>>,
) -> Response {
match request.procedure_id {
ROUTE_PROCEDURE =>
handlers::route(&request.payload, shared_state).await,
_ => Response {
payload: "Unknown procedure".to_string(),
},
}
}
Library mode embeds routing directly into the client process. The
Router struct maintains its own pools and manages connections without
an intermediary. This eliminates the extra network hop through a proxy
server, reducing latency at the cost of decentralized connection management:
pub struct Router {
pools: Arc<Mutex<HashMap<String, ConnectionPool>>>,
}
impl Router {
pub fn new() -> Self {
Self {
pools: Arc::new(Mutex::new(HashMap::new())),
}
}
pub async fn send_request(
&self,
name: String,
procedure_id: ProcedureId,
payload: &str,
) -> Response {
let mut pools = self.pools.lock().await;
let pool = pools.entry(name.clone())
.or_insert_with(||
ConnectionPool::new(name.clone(), 10));
route_request(
RouteArgs { name, procedure_id,
payload: payload.to_string() },
pool,
).await
}
}
Design Discussion
The choice between proxy and library mode depends on the deployment environment. A proxy is simpler for clients (they only need to know the proxy's address) and provides a single point for observability and policy enforcement. A library is faster (no extra hop) and more resilient (no single point of failure). Many production systems use both: a library for latency-sensitive internal communication and a proxy for external-facing traffic.
The pool size of 10 connections per system is a tuning parameter. Too few connections and requests queue unnecessarily; too many and the downstream server may be overwhelmed. The optimal pool size depends on the request rate, the request latency, and the downstream server's capacity. Little's Law provides a useful guideline: the number of in-flight requests equals the arrival rate multiplied by the average latency.
The connection pool does not currently perform health checking on idle connections. A connection that has been idle for a long time may have been closed by the server or an intermediary. Production connection pools typically include periodic health checks on idle connections and automatic replacement of broken ones.
Together with discovery, the routing service provides the communication backbone for the planetary scale computer. Discovery knows where servers are; routing knows how to talk to them efficiently. This separation of concerns allows each service to evolve independently while providing a reliable foundation for all inter-service communication.