Cluster Operations

Health Endpoints

GET /ping for lightweight liveness checks
GET /health/cluster for per-client connectivity and latency data

Observability Endpoints

GET /metrics for Prometheus scrape output
GET /router/registry for live route registry
client stats and drilldown routes under /admin/clients/*

Trace and log sinks

When tracing file sinks are enabled, Athena splits log output by severity:

non-ERROR events into success.log
ERROR events into error.log

Gateway requests also emit structured athena_rs::gateway_trace events. These events are designed for fast incident triage and include request identity, operation/table, status/outcome, duration, backend/cache hints, trace IDs, and row counters in one line.

Recommended Operational Checks

Verify logging and auth clients are configured.
Confirm route contract availability (/openapi.yaml, /openapi-wss.yaml).
Track queue depth and job statuses for deferred/backup workflows.
Inspect query optimization and vacuum health routes periodically.

Environment Baselines

Ensure ATHENA_CONFIG_PATH is explicit in production deployments.
Confirm Postgres tools are available (pg_dump, pg_restore).
Configure Redis and S3-compatible settings where required.

Failure Strategy

Use deferred queue for transient gateway pressure.
Use backup restore APIs for controlled recovery.
Keep audit/logging client healthy for governance visibility.