Typesense Search + Sync
Optional Typesense backends, Athena table sync jobs, freshness monitoring, and Athena-routed search endpoints.
Athena can optionally project Postgres-backed table data into Typesense for low-latency search. The integration is fully opt-in: nothing changes until you apply the Typesense schema and configure at least one backend plus one sync job.
What gets provisioned
Apply sql/typesense.sql or the equivalent section in sql/provision.sql to
the Athena logging database. That creates:
public.typesense_profilesfor saved Typesense backendspublic.typesense_sync_jobsfor Athena table-to-collection bindingspublic.typesense_sync_runsfor run history and troubleshootingpublic.typesense_sync_job_statusfor stale-state and health-oriented reads
These tables live in the logging database so the integration stays optional and operator-managed rather than becoming a hard runtime dependency for every client.
API routes
Athena exposes four Typesense routes:
POST /typesense/backends/testPOST /typesense/collections/listPOST /typesense/sync-jobs/{job_id}/runPOST /typesense/search
Use them to validate a Typesense node, inspect collections, trigger sync jobs manually, and execute search through a configured Athena binding.
Web app workflow
The web app exposes a dedicated Search page for the full operator flow:
- Add a backend in Backends with Typesense base URL, API key, and timeout.
- List collections to inspect the remote search cluster before binding data.
- Create a Sync Job that maps an Athena client + schema + table into a Typesense collection.
- Choose the document id column, included columns, queryable fields, sync interval, and stale threshold.
- Review recent runs and stale status directly from the sync-job status view.
- Use Live Search to query the bound collection through Athena.
On the first successful sync, Athena automatically creates the target Typesense collection when it does not already exist.
Freshness and staleness
Each sync job stores operational fields such as:
last_synced_atnext_sync_atlast_sync_statuslast_sync_errorlast_source_row_countlast_imported_countlast_failed_count
The typesense_sync_job_status view turns those into operator-friendly
freshness signals such as is_stale and sync_health, which the web app uses
to highlight jobs that need attention.
Search binding behavior
POST /typesense/search resolves a sync job either by:
- explicit
job_id, or source_client_name+source_table_name(+ optionalsource_schema_name)
That means callers can keep using Athena concepts such as logical clients and table bindings while Athena routes the actual query to Typesense under the hood. If no matching enabled job exists, Athena returns a not-found style response.
Auth requirements
Typesense routes require either:
ATHENA_KEY_12viaX-Athena-KeyorX-Athena-Admin-Key, or- a gateway API key with the
gateway.typesense_proxyright
Authentication runs before JSON parsing, so invalid or missing credentials still return 401 Unauthorized even if the request body is malformed.
Worker behavior
Athena also starts an optional background sync worker that checks for due jobs and runs them automatically when enabled.
This worker now runs on the same shared runtime pattern used by backup workers:
athena-scheduler drives the poll loop and athena-worker provides shared
tick semantics. That keeps due-job polling and restart behavior consistent
across Typesense sync and backup scheduling/execution.
Configure this in config.yaml under the typesense: section:
typesense:
allow_http: false
sync_worker_enabled: true
sync_worker_poll_ms: 30000
import_max_attempts: 3
import_retry_base_ms: 400
sync_saga_backup_enabled: trueIf you already have env-var based deployment tooling, you can still set these
values via ${ENV_VAR} placeholders inside config.yaml.
The worker is intended for scheduled freshness maintenance, while the manual run endpoint is useful for backfills, initial collection creation, or operator-driven recovery after a failed import.
Manifest + resume semantics
Every run now builds a run-scoped manifest snapshot in
public.typesense_sync_run_manifest_rows and imports from manifest sequence
windows. Progress checkpoints are persisted in public.typesense_sync_runs
(next_manifest_seq, manifest_processed_rows, resume_attempt_count, and
stage/message metadata), so a restarted worker can resume the same run instead
of restarting at batch 1.
When a mirror reacquires job lock ownership and finds an in-flight resumable run, Athena resumes from the stored checkpoint before creating any new run row. This avoids duplicate concurrent runs for the same job and keeps counters tied to one logical run.
Progress total semantics
Run and status payloads expose progress_total_kind with one of:
exact: hard manifest totals are known and progress is bounded by snapshot totals.estimated: totals come from catalog estimates and can drift as source data changes.unknown: no meaningful total is available.
Operators should treat only exact totals as strict denominators. Estimated
totals are surfaced as “about” values and may be lower than processed rows by
the end of long runs if catalog estimates were stale.
Retry and saga rollback behavior
Typesense imports now use a bounded retry + compensation flow per batch:
- Athena snapshots each document id in the current batch before writing.
- Athena tries the import request with exponential backoff retries.
- If retries are exhausted, Athena runs rollback compensation:
- restore documents that existed before this run, and
- delete documents that were created by this run.
- The sync run is then marked failed with rollback details for troubleshooting.
This prevents a mid-run request failure from silently leaving a partially applied index mutation in Typesense.
What operators can inspect
public.typesense_sync_runs now stores recovery metadata:
retry_attempt_countprogress_total_kind(estimated,exact,unknown)manifest_total_rowsmanifest_processed_rowsnext_manifest_seqresume_attempt_countrollback_attemptedrollback_status(not_needed,running,succeeded,failed)rollback_restored_countrollback_deleted_countrollback_error_message
The web app reads these fields in the Typesense page to show retry pressure and whether compensation succeeded when a run fails mid-import.
Reference
See Reference -> Athena API Reference -> typesense for exact request and response payloads.