Monitoring¶

The hub container exposes two endpoints for observing federation health:

Endpoint	Format	Purpose
`/federation-status.json`	JSON	Human-readable per-backend rollup
`/metrics`	Prometheus text	Scrape target for Prometheus/Grafana stacks

Both are written atomically every 60 seconds by a cron job inside the hub container. Neither endpoint requires authentication. Cache-Control: no-store is set on both.

`/federation-status.json`¶

{
  "generated_at": "2026-04-26T12:00:00Z",
  "poll_interval_seconds": 60,
  "backends": {
    "fulda": {
      "url": "https://fulda.example.com/api",
      "up": true,
      "latency_seconds": 0.043,
      "last_success": "2026-04-26T12:00:00Z",
      "last_import_at": "2026-04-25T03:12:00Z",
      "data_age_seconds": 118080,
      "playground_count": 42,
      "complete": 28,
      "partial": 10,
      "missing": 4
    }
  }
}

Fields:

Field	Notes
`generated_at`	ISO 8601 timestamp when this file was written. Use this to detect a frozen hub (`generated_at` older than `2 × poll_interval_seconds`).
`poll_interval_seconds`	Always `60` in the current release.
`backends.<slug>.up`	`true` if the backend responded to `GET /rpc/get_meta` within 3 s.
`backends.<slug>.latency_seconds`	Round-trip time of the last successful poll. `0` on failure.
`backends.<slug>.last_success`	ISO 8601 timestamp of the last successful probe, preserved across failures. `null` if the backend was never reachable.
`backends.<slug>.last_import_at`	Value of `api.import_status.last_import_at` from the backend DB. `null` if no import has run or backend is down.
`backends.<slug>.data_age_seconds`	Seconds since `last_import_at`. `null` if `last_import_at` is `null`.
`backends.<slug>.playground_count`	Total playgrounds in the region. `null` on pre-P1 backends or when backend is down.
`backends.<slug>.complete`	Playgrounds with complete equipment data. `null` on pre-P1 backends.
`backends.<slug>.partial`	Playgrounds with partial equipment data. `null` on pre-P1 backends.
`backends.<slug>.missing`	Playgrounds with no equipment data. `null` on pre-P1 backends.

The hub UI reads this endpoint every 60 seconds and surfaces freshness labels in the instance drawer. It also shows a "stale observation" banner when generated_at is older than 2 × poll_interval_seconds (i.e. the hub cron has stopped writing).

`/metrics`¶

Prometheus text exposition format (version 0.0.4):

# HELP spielplatz_backend_up 1 if the backend responded to get_meta, 0 otherwise.
# TYPE spielplatz_backend_up gauge
spielplatz_backend_up{backend="fulda",url="https://fulda.example.com/api"} 1

# HELP spielplatz_backend_importing 1 while osm2pgsql is actively running on this backend, 0 otherwise.
# TYPE spielplatz_backend_importing gauge
spielplatz_backend_importing{backend="fulda",url="https://fulda.example.com/api"} 0

# HELP spielplatz_backend_latency_seconds Round-trip time for the last get_meta call.
# TYPE spielplatz_backend_latency_seconds gauge
spielplatz_backend_latency_seconds{backend="fulda",url="https://fulda.example.com/api"} 0.043

# HELP spielplatz_backend_data_age_seconds Seconds since the backend last imported data.
# TYPE spielplatz_backend_data_age_seconds gauge
spielplatz_backend_data_age_seconds{backend="fulda",url="https://fulda.example.com/api"} 118080

# HELP spielplatz_backend_playgrounds_total Total number of playgrounds in the region.
# TYPE spielplatz_backend_playgrounds_total gauge
spielplatz_backend_playgrounds_total{backend="fulda",url="https://fulda.example.com/api"} 42

# HELP spielplatz_backend_playgrounds_complete Playgrounds with complete equipment data.
# TYPE spielplatz_backend_playgrounds_complete gauge
spielplatz_backend_playgrounds_complete{backend="fulda",url="https://fulda.example.com/api"} 28

# HELP spielplatz_backend_playgrounds_partial Playgrounds with partial equipment data.
# TYPE spielplatz_backend_playgrounds_partial gauge
spielplatz_backend_playgrounds_partial{backend="fulda",url="https://fulda.example.com/api"} 10

# HELP spielplatz_backend_playgrounds_missing Playgrounds with no equipment data.
# TYPE spielplatz_backend_playgrounds_missing gauge
spielplatz_backend_playgrounds_missing{backend="fulda",url="https://fulda.example.com/api"} 4

# HELP spielplatz_poll_generated_timestamp Unix timestamp when this scrape was generated.
# TYPE spielplatz_poll_generated_timestamp gauge
spielplatz_poll_generated_timestamp 1745668800

spielplatz_backend_data_age_seconds and the four completeness gauges are omitted for a backend when the field is null (backend down, pre-P1 backend, or no import yet). spielplatz_backend_importing is always emitted when the backend is reachable; it defaults to 0 for older backends that predate the importing field in get_meta.

Recipes¶

Recipe 1 — External uptime monitor¶

Point any HTTP uptime tool (UptimeRobot, Better Stack, etc.) at:

GET https://<hub-host>/federation-status.json

Check that:

The response is 200 OK.
generated_at is within the last 5 minutes (guards against cron death).
All expected backends.<slug>.up values are true.

Recipe 2 — BYO Prometheus + Grafana¶

Add a scrape job to your prometheus.yml:

scrape_configs:
  - job_name: spieli_hub
    static_configs:
      - targets: ['<hub-host>:80']
    metrics_path: /metrics
    scrape_interval: 60s

Useful PromQL expressions:

# Is any backend currently down?
spielplatz_backend_up == 0

# Alert if data is more than 48 hours old
spielplatz_backend_data_age_seconds > 172800

# Alert if the hub poll has stopped writing (cron died)
time() - spielplatz_poll_generated_timestamp > 300

# Completeness ratio per backend (fraction of playgrounds with full data)
spielplatz_backend_playgrounds_complete / spielplatz_backend_playgrounds_total

# Total playgrounds with missing data across all backends
sum(spielplatz_backend_playgrounds_missing)

# Alert if completeness ratio drops below 50 % for any backend
spielplatz_backend_playgrounds_complete / spielplatz_backend_playgrounds_total < 0.5

# Which backends are currently importing (data in flux)?
spielplatz_backend_importing == 1

# Backend online but no data and not importing (degraded ring state)
spielplatz_backend_up == 1
  and spielplatz_backend_importing == 0
  and spielplatz_backend_playgrounds_total == 0

Recipe 3 — Frontend error monitoring (Sentry)¶

The hub frontend does not ship in-repo error reporting. For production deployments, sign up for Sentry's free tier and add the Sentry browser SDK to your nginx serving config or config.js generation step. The DSN stays out of this repository.