Skip to content

Monitoring

The hub container exposes two endpoints for observing federation health:

Endpoint Format Purpose
/federation-status.json JSON Human-readable per-backend rollup
/metrics Prometheus text Scrape target for Prometheus/Grafana stacks

Both are written atomically every 60 seconds by a cron job inside the hub container. Neither endpoint requires authentication. Cache-Control: no-store is set on both.

/federation-status.json

{
  "generated_at": "2026-04-26T12:00:00Z",
  "poll_interval_seconds": 60,
  "backends": {
    "fulda": {
      "url": "https://fulda.example.com/api",
      "up": true,
      "latency_seconds": 0.043,
      "last_success": "2026-04-26T12:00:00Z",
      "last_import_at": "2026-04-25T03:12:00Z",
      "data_age_seconds": 118080
    }
  }
}

Fields:

Field Notes
generated_at ISO 8601 timestamp when this file was written. Use this to detect a frozen hub (generated_at older than 2 × poll_interval_seconds).
poll_interval_seconds Always 60 in the current release.
backends.<slug>.up true if the backend responded to GET /rpc/get_meta within 3 s.
backends.<slug>.latency_seconds Round-trip time of the last successful poll. 0 on failure.
backends.<slug>.last_success ISO 8601 timestamp of the last successful probe, preserved across failures. null if the backend was never reachable.
backends.<slug>.last_import_at Value of api.import_status.last_import_at from the backend DB. null if no import has run or backend is down.
backends.<slug>.data_age_seconds Seconds since last_import_at. null if last_import_at is null.

The hub UI reads this endpoint every 60 seconds and surfaces freshness labels in the instance drawer. It also shows a "stale observation" banner when generated_at is older than 2 × poll_interval_seconds (i.e. the hub cron has stopped writing).

/metrics

Prometheus text exposition format (version 0.0.4):

# HELP spielplatz_backend_up 1 if the backend responded to get_meta, 0 otherwise.
# TYPE spielplatz_backend_up gauge
spielplatz_backend_up{backend="fulda",url="https://fulda.example.com/api"} 1

# HELP spielplatz_backend_latency_seconds Round-trip time for the last get_meta call.
# TYPE spielplatz_backend_latency_seconds gauge
spielplatz_backend_latency_seconds{backend="fulda",url="https://fulda.example.com/api"} 0.043

# HELP spielplatz_backend_data_age_seconds Seconds since the backend last imported data.
# TYPE spielplatz_backend_data_age_seconds gauge
spielplatz_backend_data_age_seconds{backend="fulda",url="https://fulda.example.com/api"} 118080

# HELP spielplatz_poll_generated_timestamp Unix timestamp when this scrape was generated.
# TYPE spielplatz_poll_generated_timestamp gauge
spielplatz_poll_generated_timestamp 1745668800

spielplatz_backend_data_age_seconds is omitted for a backend when last_import_at is null (backend down or no import yet).

Recipes

Recipe 1 — External uptime monitor

Point any HTTP uptime tool (UptimeRobot, Better Stack, etc.) at:

GET https://<hub-host>/federation-status.json

Check that:

  1. The response is 200 OK.
  2. generated_at is within the last 5 minutes (guards against cron death).
  3. All expected backends.<slug>.up values are true.

Recipe 2 — BYO Prometheus + Grafana

Add a scrape job to your prometheus.yml:

scrape_configs:
  - job_name: spieli_hub
    static_configs:
      - targets: ['<hub-host>:80']
    metrics_path: /metrics
    scrape_interval: 60s

Useful PromQL expressions:

# Is any backend currently down?
spielplatz_backend_up == 0

# Alert if data is more than 48 hours old
spielplatz_backend_data_age_seconds > 172800

# Alert if the hub poll has stopped writing (cron died)
time() - spielplatz_poll_generated_timestamp > 300

Recipe 3 — Frontend error monitoring (Sentry)

The hub frontend does not ship in-repo error reporting. For production deployments, sign up for Sentry's free tier and add the Sentry browser SDK to your nginx serving config or config.js generation step. The DSN stays out of this repository.

See also