Last week I shipped real-time updates for an incident management dashboard. The kind where you're staring at a screen during an outage and need to know immediately when something changes—not after a 30-second polling interval that feels like an eternity when production is on fire.

This is what I learned.

The Problem: Polling Sucks During Incidents

The initial implementation was simple: the dashboard fetched /api/v1/incidents every 10 seconds. It worked. But during an actual incident, 10 seconds of stale data is unacceptable. People were refreshing manually. Multiple browser tabs were hammering the API. We were DDOSing ourselves.

The obvious solution: WebSockets. Push updates the moment they happen. No polling, no wasted requests, no stale data.

Sounds simple. It mostly is. But there are a few patterns that took me some iteration to get right.

The Architecture

The system is built on Fastify with the @fastify/websocket plugin. The core idea is straightforward:

  1. Clients connect to /api/v1/stream
  2. Server maintains a set of active connections
  3. When state changes, broadcast to all relevant clients
  4. Clients can subscribe to specific incidents or get everything

Here's the actual route handler:

// src/api/stream.ts
import { FastifyInstance } from 'fastify';
import { WebSocket } from 'ws';
import { ScenarioEngine } from '../adapters/stubs/scenario-engine.js';
import { AdapterEvent, Subscription } from '../adapters/interface.js';
 
interface ClientState {
  socket: WebSocket;
  filter: string; // incident ID or 'all'
  subscription: Subscription;
}
 
export async function streamRoutes(
  fastify: FastifyInstance,
  options: { scenarioEngine: ScenarioEngine }
) {
  const { scenarioEngine } = options;
  const clients = new Set<ClientState>();
 
  fastify.get('/stream', { websocket: true }, (socket, request) => {
    const client: ClientState = {
      socket,
      filter: 'all',
      subscription: scenarioEngine.subscribe('*', (event: AdapterEvent) => {
        if (socket.readyState !== WebSocket.OPEN) return;
        if (client.filter !== 'all' && event.incidentId !== client.filter) return;
 
        socket.send(JSON.stringify({
          type: event.type,
          timestamp: event.timestamp.toISOString(),
          incidentId: event.incidentId ?? null,
          data: event.data,
        }));
      }),
    };
 
    clients.add(client);
 
    socket.on('close', () => {
      client.subscription.unsubscribe();
      clients.delete(client);
    });
 
    socket.send(JSON.stringify({ type: 'connected', filter: client.filter }));
  });
}

A few things to notice:

Each client gets its own subscription. When a client connects, we immediately subscribe them to the event system. This is cleaner than maintaining a global broadcast function that iterates over all clients—the subscription handles its own lifecycle.

We track client state, not just sockets. The ClientState interface bundles the socket, the filter preference, and the subscription handle. When a client disconnects, we have everything we need to clean up.

Guard against closed sockets. The readyState check before sending is critical. WebSocket connections can close at any moment—network issues, user navigation, whatever. Sending to a closed socket throws, and you don't want one dead connection to crash your event loop.

The Event System

The WebSocket layer is just a transport. The real work happens in the event system underneath. Here's the subscription interface:

// src/adapters/interface.ts
export type AdapterEventType = 
  | 'incident:created'
  | 'incident:updated'
  | 'incident:resolved'
  | 'timeline:event'
  | 'impact:changed'
  | 'resource:assigned'
  | 'resource:released';
 
export interface AdapterEvent {
  type: AdapterEventType;
  timestamp: Date;
  incidentId?: string;
  data: unknown;
}
 
export type EventHandler = (event: AdapterEvent) => void;
 
export interface Subscription {
  unsubscribe(): void;
}

And the implementation in the engine:

// src/adapters/stubs/scenario-engine.ts
private eventHandlers: Map<AdapterEventType | '*', Set<EventHandler>> = new Map();
 
subscribe(eventType: AdapterEventType | '*', handler: EventHandler): Subscription {
  if (!this.eventHandlers.has(eventType)) {
    this.eventHandlers.set(eventType, new Set());
  }
  this.eventHandlers.get(eventType)!.add(handler);
 
  return {
    unsubscribe: () => {
      this.eventHandlers.get(eventType)?.delete(handler);
    }
  };
}
 
emit(event: AdapterEvent): void {
  // Call specific handlers
  const handlers = this.eventHandlers.get(event.type);
  if (handlers) {
    handlers.forEach(handler => handler(event));
  }
  
  // Call wildcard handlers
  const wildcardHandlers = this.eventHandlers.get('*');
  if (wildcardHandlers) {
    wildcardHandlers.forEach(handler => handler(event));
  }
}

Opinion: I've seen codebases use EventEmitter for this. Don't. The Subscription pattern (returning an object with unsubscribe()) is dramatically better for lifecycle management. It makes cleanup explicit and testable. You can pass subscriptions around, store them, manage them however you want. With EventEmitter you're stuck passing the exact same function reference to removeListener, which gets awkward fast.

Client-Side Filtering

An important pattern: let clients filter their subscriptions after connecting. In the initial version, I made clients specify their filter in the connection URL (/stream?incident=inc-123). That works, but it means creating a new connection every time you want to watch a different incident.

Better approach—handle it via messages:

socket.on('message', (raw: Buffer | string) => {
  try {
    const msg = JSON.parse(raw.toString());
    if (msg.action === 'subscribe' && typeof msg.incidentId === 'string') {
      client.filter = msg.incidentId;
      socket.send(JSON.stringify({ type: 'subscribed', incidentId: msg.incidentId }));
    }
  } catch {
    // Ignore malformed messages
  }
});

Now a single connection can dynamically switch between incidents. The dashboard can show an incident list, user clicks one, we send { action: 'subscribe', incidentId: 'inc-123' }, and now that connection only receives events for that incident. Click back to the list, send { action: 'subscribe', incidentId: 'all' }.

One connection, multiple views. Efficient.

Server Setup

Getting WebSockets working with Fastify is trivial:

// src/index.ts
import Fastify from 'fastify';
import websocket from '@fastify/websocket';
 
const server = Fastify({ logger: true });
 
await server.register(websocket);
 
// Routes that use websocket: true will now work
server.register(streamRoutes, { prefix: '/api/v1', scenarioEngine });

That's it. The plugin handles the upgrade handshake, the ws instance management, all of it. The route handler just gets a WebSocket object to work with.

Gotcha: Make sure your load balancer or reverse proxy supports WebSocket upgrades. If you're behind nginx, you need explicit configuration. If you're on something like Render or Railway, check their docs—some platforms require specific ports or paths for WebSocket traffic.

Testing Strategy

Testing WebSockets is slightly awkward because you're dealing with persistent connections and async events. Here's what worked:

Unit test the event system independently. The subscribe/emit logic is pure—no network involved. Test it directly:

describe('Event subscription', () => {
  it('calls handlers on emit', () => {
    const engine = new ScenarioEngine();
    const received: AdapterEvent[] = [];
    
    engine.subscribe('incident:created', (event) => {
      received.push(event);
    });
    
    engine.emit({
      type: 'incident:created',
      timestamp: new Date(),
      incidentId: 'inc-1',
      data: { title: 'Test' }
    });
    
    expect(received).toHaveLength(1);
    expect(received[0].incidentId).toBe('inc-1');
  });
  
  it('respects unsubscribe', () => {
    const engine = new ScenarioEngine();
    let callCount = 0;
    
    const sub = engine.subscribe('*', () => { callCount++ });
    engine.emit({ type: 'incident:created', timestamp: new Date(), data: {} });
    
    sub.unsubscribe();
    engine.emit({ type: 'incident:created', timestamp: new Date(), data: {} });
    
    expect(callCount).toBe(1);
  });
});

Integration test the routes with Fastify's inject. The existing test suite tests HTTP routes this way:

describe('GET /api/v1/incidents', () => {
  it('returns an array', async () => {
    const response = await server.inject({
      method: 'GET',
      url: '/api/v1/incidents',
    });
 
    expect(response.statusCode).toBe(200);
    const body = JSON.parse(response.body);
    expect(Array.isArray(body)).toBe(true);
  });
});

For WebSocket-specific testing, you'd spin up the server on a test port and use a WebSocket client. But honestly, I found more bugs in the event system logic than in the WebSocket transport layer. Test where the complexity is.

Manual testing matters. For real-time features, nothing beats opening multiple browser tabs and watching updates flow. Tools like wscat are great for poking at the connection from the command line:

wscat -c ws://localhost:3000/api/v1/stream

Lessons Learned

Start with the event system, not the transport. I initially jumped straight into WebSocket code. Wrong order. Design your events first—what data needs to flow, what the payloads look like, who subscribes to what. The WebSocket layer should be a thin adapter over a well-designed event system.

Serialize everything immediately. Events should be JSON-serializable from the start. Don't pass around Date objects and expect to serialize later—you'll forget, and [object Object] will show up in your WebSocket messages.

Connection lifecycle is the hard part. The actual message sending is easy. Managing connections—tracking who's connected, cleaning up on disconnect, handling reconnection on the client, dealing with network hiccups—that's where bugs live.

Clients should handle reconnection. Don't assume WebSocket connections last forever. The client should reconnect on close, with exponential backoff. Most WebSocket client libraries have this built in.

Send a heartbeat. For long-lived connections, send a ping periodically. This keeps the connection alive through proxies that might otherwise timeout idle connections, and helps detect dead connections faster.

Log connection events. When something goes wrong in production, you want to know who was connected and when. Log connects, disconnects, and subscription changes. Don't log every message—that's too much.

What I'd Do Differently

If I were starting over, I'd probably use Server-Sent Events (SSE) instead of WebSockets for this use case. We're only sending data from the server to the client. The bidirectional capability of WebSockets is overkill. SSE is simpler, works over standard HTTP, handles reconnection automatically, and doesn't require special proxy configuration.

But WebSockets work fine, and the patterns I described apply to SSE too. The important thing is the event system underneath, not the transport on top.


The full implementation is in the incident-control-api repo. The WebSocket streaming is about 60 lines of code. The event system is another 40. Real-time doesn't have to be complicated.

React to this post: