Cluster Operations
Health, metrics, and runbook-level Athena operations guidance.
Health Endpoints
GET /pingfor lightweight liveness checksGET /health/clusterfor per-client connectivity and latency data
Observability Endpoints
GET /metricsfor Prometheus scrape outputGET /router/registryfor live route registry- client stats and drilldown routes under
/admin/clients/*
Trace and log sinks
When tracing file sinks are enabled, Athena splits log output by severity:
- non-
ERRORevents intosuccess.log ERRORevents intoerror.log
Gateway requests also emit structured athena_rs::gateway_trace events. These
events are designed for fast incident triage and include request identity,
operation/table, status/outcome, duration, backend/cache hints, trace IDs, and
row counters in one line.
Recommended Operational Checks
- Verify logging and auth clients are configured.
- Confirm route contract availability (
/openapi.yaml,/openapi-wss.yaml). - Track queue depth and job statuses for deferred/backup workflows.
- Inspect query optimization and vacuum health routes periodically.
Environment Baselines
- Ensure
ATHENA_CONFIG_PATHis explicit in production deployments. - Confirm Postgres tools are available (
pg_dump,pg_restore). - Configure Redis and S3-compatible settings where required.
Failure Strategy
- Use deferred queue for transient gateway pressure.
- Use backup restore APIs for controlled recovery.
- Keep audit/logging client healthy for governance visibility.