Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

IO Guardian - Database Availability System

The IO Guardian system ensures that services across the infrastructure are aware of the availability of centralized databases (PostgreSQL and Redis) hosted on config.server.ioPrimaryHost. It provides graceful startup and shutdown coordination between the database host and dependent services on other servers.

Overview

The system consists of two components:

  1. Guardian Server (runs on client servers)

    • WebSocket server that listens for commands from the coordinator
    • Executes drain/undrain commands by controlling io-databases.target
  2. Guardian Client (runs on the IO Host)

    • WebSocket client that connects to all guardian servers
    • Sends undrain command after databases are online (start dependent services)
    • Sends drain command before database shutdown (stop dependent services)

How It Works

System Startup

  1. Client servers boot and run wait-for-io-databases.service
  2. This service waits (with retries) until PostgreSQL and Redis on the IO Host are reachable
  3. Once databases are confirmed available, the service completes
  4. The io-databases.target is now ready to be activated
  5. When the IO Hosts io-database-coordinator.service starts, it sends undrain to all clients
  6. Clients start io-databases.target, which starts all dependent services

Database Shutdown (Graceful Drain)

  1. When io-database-coordinator.service stops (before databases stop)
  2. It connects to all guardian servers via WebSocket
  3. Sends drain command to each server
  4. Guardian servers stop io-databases.target
  5. Dependent services stop gracefully before databases go down

Database Startup (Undrain)

  1. When databases come online on the IO Host
  2. io-database-coordinator.service starts
  3. It sends undrain command to all guardian servers
  4. Guardian servers start io-databases.target
  5. All dependent services start

Security

Communication is secured using a Pre-Shared Key (PSK) that must be at least 32 characters. All WebSocket connections must authenticate with this key before commands are accepted.

Generating the PSK

Generate a new PSK using OpenSSL:

openssl rand -base64 32

Adding the Secret

Add the generated PSK to hosts/server/secrets.yaml:

IO_GUARDIAN_PSK: <your-generated-key>

Then encrypt the file:

sops --encrypt --in-place hosts/server/secrets.yaml

Configuration

Port

The guardian WebSocket server listens on port 9876 by default. This port is automatically opened to local subnets on servers with database dependencies.

Dependent Services

Dependent Services will be automatically populated with service names where there is a systemd.service.<name> defined from the names in server.database.postgres or server.database.redis.

To manually add a service bind to the database availability target, add it to the server.database.dependentServices option:

{
  server.database.dependentServices = [
    "my-service"
    "another-service"
  ];
}

Services listed here will:

  • Start only when io-databases.target is active
  • Stop when io-databases.target stops
  • Restart when the target restarts

Systemd Units

On Client Servers

UnitTypeDescription
io-guardian.servicesimpleWebSocket server for receiving commands
io-databases.targettargetRepresents “databases are online”
wait-for-io-databases.serviceoneshotWaits for databases at boot (runs once)

On nixio

UnitTypeDescription
io-database-coordinator.serviceoneshotSends undrain on start, drain on stop

Troubleshooting

Checking Guardian Status

On client servers:

systemctl status io-guardian.service
systemctl status io-databases.target
systemctl status wait-for-io-databases.service
journalctl -u io-guardian.service -f

On IO Hosts:

systemctl status io-database-coordinator.service
journalctl -u io-database-coordinator.service

Manual Commands

To manually start dependent services on a client:

systemctl start io-databases.target

To manually stop dependent services:

systemctl stop io-databases.target

Common Issues

Guardian server won’t start:

  • Check that IO_GUARDIAN_PSK secret is properly configured
  • Verify the sops decryption is working: cat /run/secrets/IO_GUARDIAN_PSK

Services not starting after boot:

  • Check wait service logs: journalctl -u wait-for-io-databases.service
  • Verify network connectivity to an IO Host on ports 5432 (Postgres) and 6379 (Redis)
  • Ensure an IO Hosts coordinator has sent the undrain command

Authentication failures in logs:

  • Ensure the same PSK is deployed to all servers
  • Re-encrypt secrets if the key was changed

Protocol Reference

The guardian uses a simple JSON-based WebSocket protocol:

Authentication

// Client sends:
{"type": "auth", "key": "<psk>"}

// Server responds:
{"type": "auth", "status": "ok", "message": "Authentication successful"}
// or
{"type": "auth", "status": "error", "message": "Invalid key"}

Commands

// Coordinator sends:
{"type": "command", "action": "drain"}
// or
{"type": "command", "action": "undrain"}
// or
{"type": "command", "action": "ping"}

// Server responds:
{"type": "response", "action": "<action>", "status": "ok", "message": "..."}
// or
{"type": "response", "action": "<action>", "status": "error", "message": "..."}