IO Guardian - Database Availability System
The IO Guardian system ensures that services across the infrastructure are aware
of the availability of centralized databases (PostgreSQL and Redis) hosted on config.server.ioPrimaryHost.
It provides graceful startup and shutdown coordination between the database host and dependent services on other servers.
Overview
The system consists of two components:
-
Guardian Server (runs on client servers)
- WebSocket server that listens for commands from the coordinator
- Executes drain/undrain commands by controlling
io-databases.target
-
Guardian Client (runs on the IO Host)
- WebSocket client that connects to all guardian servers
- Sends
undraincommand after databases are online (start dependent services) - Sends
draincommand before database shutdown (stop dependent services)
How It Works
System Startup
- Client servers boot and run
wait-for-io-databases.service - This service waits (with retries) until PostgreSQL and Redis on the IO Host are reachable
- Once databases are confirmed available, the service completes
- The
io-databases.targetis now ready to be activated - When the IO Hosts
io-database-coordinator.servicestarts, it sendsundrainto all clients - Clients start
io-databases.target, which starts all dependent services
Database Shutdown (Graceful Drain)
- When
io-database-coordinator.servicestops (before databases stop) - It connects to all guardian servers via WebSocket
- Sends
draincommand to each server - Guardian servers stop
io-databases.target - Dependent services stop gracefully before databases go down
Database Startup (Undrain)
- When databases come online on the IO Host
io-database-coordinator.servicestarts- It sends
undraincommand to all guardian servers - Guardian servers start
io-databases.target - All dependent services start
Security
Communication is secured using a Pre-Shared Key (PSK) that must be at least 32 characters. All WebSocket connections must authenticate with this key before commands are accepted.
Generating the PSK
Generate a new PSK using OpenSSL:
openssl rand -base64 32
Adding the Secret
Add the generated PSK to hosts/server/secrets.yaml:
IO_GUARDIAN_PSK: <your-generated-key>
Then encrypt the file:
sops --encrypt --in-place hosts/server/secrets.yaml
Configuration
Port
The guardian WebSocket server listens on port 9876 by default. This port is automatically opened to local subnets on servers with database dependencies.
Dependent Services
Dependent Services will be automatically populated with service names where there
is a systemd.service.<name> defined from the names in server.database.postgres
or server.database.redis.
To manually add a service bind to the database availability target, add it to the
server.database.dependentServices option:
{
server.database.dependentServices = [
"my-service"
"another-service"
];
}
Services listed here will:
- Start only when
io-databases.targetis active - Stop when
io-databases.targetstops - Restart when the target restarts
Systemd Units
On Client Servers
| Unit | Type | Description |
|---|---|---|
io-guardian.service | simple | WebSocket server for receiving commands |
io-databases.target | target | Represents “databases are online” |
wait-for-io-databases.service | oneshot | Waits for databases at boot (runs once) |
On nixio
| Unit | Type | Description |
|---|---|---|
io-database-coordinator.service | oneshot | Sends undrain on start, drain on stop |
Troubleshooting
Checking Guardian Status
On client servers:
systemctl status io-guardian.service
systemctl status io-databases.target
systemctl status wait-for-io-databases.service
journalctl -u io-guardian.service -f
On IO Hosts:
systemctl status io-database-coordinator.service
journalctl -u io-database-coordinator.service
Manual Commands
To manually start dependent services on a client:
systemctl start io-databases.target
To manually stop dependent services:
systemctl stop io-databases.target
Common Issues
Guardian server won’t start:
- Check that
IO_GUARDIAN_PSKsecret is properly configured - Verify the sops decryption is working:
cat /run/secrets/IO_GUARDIAN_PSK
Services not starting after boot:
- Check wait service
logs:
journalctl -u wait-for-io-databases.service - Verify network connectivity to an IO Host on ports 5432 (Postgres) and 6379 (Redis)
- Ensure an IO Hosts coordinator has sent the undrain command
Authentication failures in logs:
- Ensure the same PSK is deployed to all servers
- Re-encrypt secrets if the key was changed
Protocol Reference
The guardian uses a simple JSON-based WebSocket protocol:
Authentication
// Client sends:
{"type": "auth", "key": "<psk>"}
// Server responds:
{"type": "auth", "status": "ok", "message": "Authentication successful"}
// or
{"type": "auth", "status": "error", "message": "Invalid key"}
Commands
// Coordinator sends:
{"type": "command", "action": "drain"}
// or
{"type": "command", "action": "undrain"}
// or
{"type": "command", "action": "ping"}
// Server responds:
{"type": "response", "action": "<action>", "status": "ok", "message": "..."}
// or
{"type": "response", "action": "<action>", "status": "error", "message": "..."}