Prometheus Metrics

What it does

Exposes a /metrics endpoint in the standard Prometheus text format. Third observability pillar next to JSON Logging (logs) and the Audit Log (events) — numeric dashboards, SLO timers, and freshness alerts.

Designed to drop straight into any modern monitoring stack:

Prometheus / VictoriaMetrics / Grafana Cloud — add a scrape job
OpenMetrics-compatible agents (Datadog, New Relic, Dynatrace) — configure an OpenMetrics endpoint
Alertmanager — build alerts on the freshness gauges below

Authentication

/metrics uses the same Basic-Auth layer as the rest of the Syncer REST API:

Create a Syncer User for the scraper (Settings → Users → New).
Assign the API role Prometheus metrics scrape (metrics) — or Full rights (all) if the user should be able to reach every API endpoint.
Set a password and use it in the scraper's Basic-Auth config.

HTTPS is required (the same gate that protects the rest of the API); set ALLOW_INSECURE_API_AUTH = True in local_config.py only for non-production debugging.

Scrape config

scrape_configs:
  - job_name: cmdbsyncer
    metrics_path: /metrics
    scheme: https
    static_configs:
      - targets: ['syncer.example.com']
    basic_auth:
      username: prometheus            # the Syncer user name
      password_file: /etc/prometheus/cmdbsyncer.pass

Metrics

Info

Name	Labels	Meaning
`cmdbsyncer_info`	`customer`, `license_id`, `exp`	Constant 1 carrying license metadata

Cron groups

One time-series per CronGroup:

Name	Meaning
`cmdbsyncer_cron_group_enabled`	`1` if enabled in UI, else `0`
`cmdbsyncer_cron_group_running`	`1` while a run is in flight
`cmdbsyncer_cron_group_failure`	`1` if the last completed run failed
`cmdbsyncer_cron_group_last_start_timestamp_seconds`	Unix timestamp of the last start
`cmdbsyncer_cron_group_last_end_timestamp_seconds`	Unix timestamp of the last end
`cmdbsyncer_cron_group_last_success_timestamp_seconds`	Unix timestamp of the last successful end
`cmdbsyncer_cron_group_last_duration_seconds`	Duration of the last completed run
`cmdbsyncer_cron_group_next_run_timestamp_seconds`	Unix timestamp when the group is next eligible to run

Hosts

Name	Labels	Meaning
`cmdbsyncer_hosts_total`	—	Total host documents (excluding object-mode)
`cmdbsyncer_hosts_stale_24h_total`	—	Hosts not seen by any importer in the last 24 h

Self

Name	Meaning
`cmdbsyncer_metrics_scrape_duration_seconds`	Time this scrape spent building the body
`cmdbsyncer_scrape_error`	`1` if the scrape failed (only emitted then)

Example alerts

Sync hasn't succeeded in 90 minutes — combines last_success_timestamp_seconds with wall clock:

- alert: CmdbSyncerCronStale
  expr: time() - cmdbsyncer_cron_group_last_success_timestamp_seconds > 5400
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: 'CMDBsyncer cron group {{ $labels.group }} has not succeeded for 90 min'

Last run failed:

- alert: CmdbSyncerCronFailure
  expr: cmdbsyncer_cron_group_failure == 1
  for: 1m
  labels:
    severity: error
  annotations:
    summary: 'CMDBsyncer cron group {{ $labels.group }} failed its last run'

Hosts rotting (importer is silent):

- alert: CmdbSyncerStaleHosts
  expr: cmdbsyncer_hosts_stale_24h_total > 10
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: '{{ $value }} hosts have not been seen by any importer in the last 24h'

Design notes

Metrics are built on scrape from the Mongo state. No in-process counters to persist; HA-safe across replicas.
Scrape cost scales linearly with the number of CronGroups (one query + one loop). At typical sizes (< 1000 groups) a single scrape is well under 100 ms.
Counters for "events since start" (audit, notifications, webhook triggers) are deliberately not exposed here — they depend on in-process state that doesn't survive a restart and would diverge between replicas. Use the Audit Log CSV export or your JSON-log aggregator for that axis.