Athena

Backups + Restore

End-to-end backup lifecycle with job and schedule control.

Athena supports S3-compatible backup and restore operations for registered PostgreSQL clients.

POST /admin/backups and POST /admin/backups/{key}/restore are queue-driven: they return 202 Accepted with a job_id, and execution continues in background workers. Clients should poll job state instead of waiting on the initial request.

The runtime is split into two scheduler loops backed by the shared athena-scheduler and athena-worker crates:

  • the schedule worker claims due backup_schedules rows and creates queued backup_jobs
  • the execution worker claims pending backup/restore jobs with a lease token, renews ownership, and finalizes or retries the job

Both loops use the Athena logging database as durable state, so jobs can resume or retry after process restarts without relying on in-memory timers.

Core Endpoints

  • POST /admin/backups
  • GET /admin/backups
  • POST /admin/backups/{key}/restore
  • GET /admin/backups/{key}/download
  • DELETE /admin/backups/{key}

Job + Schedule Endpoints

  • GET /admin/backups/jobs
  • GET /admin/backups/jobs/{id}
  • POST /admin/backups/jobs/{id}/cancel
  • GET /admin/backups/schedules
  • POST /admin/backups/schedules
  • PATCH /admin/backups/schedules/{id}
  • DELETE /admin/backups/schedules/{id}

Example: Create Backup

curl -X POST "http://localhost:4052/admin/backups" \
  -H "content-type: application/json" \
  -H "x-athena-client: athena_logging" \
  -H "x-athena-admin-key: $ATHENA_ADMIN_KEY" \
  -d '{
    "client_name":"athena_logging",
    "label":"daily"
  }'

Expected response (queued):

{
  "status": "success",
  "message": "Backup job queued",
  "data": {
    "job_id": 1234,
    "client_name": "athena_logging",
    "status": "pending"
  }
}

Example: Restore Backup

curl -X POST "http://localhost:4052/admin/backups/backups%2Fathena_logging%2Fabc.tar.gz/restore" \
  -H "content-type: application/json" \
  -H "x-athena-admin-key: $ATHENA_ADMIN_KEY" \
  -d '{
    "client_name":"athena_logging"
  }'

Expected response (queued):

{
  "status": "success",
  "message": "Restore job queued",
  "data": {
    "job_id": 1235,
    "key": "backups/athena_logging/abc.tar.gz",
    "client_name": "athena_logging",
    "status": "pending"
  }
}

Required Runtime Dependencies

  • ATHENA_PG_DUMP_PATH / ATHENA_PG_RESTORE_PATH (or binaries on PATH)
  • S3-compatible endpoint and credentials
  • Reachable target database clients

Worker Configuration

backup:
  worker_enabled: true
  execution_worker_poll_ms: 1500
  schedule_worker_poll_ms: 30000
  worker_max_attempts: 3
  worker_lease_ttl_minutes: 15

worker_enabled controls both backup scheduler loops. The execution worker polls queued backup_jobs; the schedule worker polls due backup_schedules. worker_max_attempts is persisted onto newly created jobs, and worker_lease_ttl_minutes bounds how long a crashed process can hold ownership before another worker can claim the job. Poll values are clamped to safe minima (execution_worker_poll_ms >= 250, schedule_worker_poll_ms >= 1000).

Operational Notes

  • Treat backup/restore jobs as async workflows.
  • Poll GET /admin/backups/jobs/{id} for queued/running/completed/failed status.
  • Queue workers persist lease and checkpoint state in backup_jobs for restart-safe recovery.
  • Scheduled jobs persist schedule_id on the queued backup_jobs row for traceability.
  • Deleting a schedule preserves existing backup_jobs history and clears schedule_id on linked jobs.