Skip to content

Prometheus metrics

When METRICS_DIR points at a writable directory inside the container, every worker also writes a restic_<job>.prom text file alongside the last-<job>.json. Mount that directory into the host and point a node-exporter --collector.textfile.directory at it — no push gateway required.

Enabling

environment:
  METRICS_DIR: /var/log/textfile_collector
volumes:
  - ./metrics:/var/log/textfile_collector

Then on the host:

node_exporter --collector.textfile.directory=./metrics

That's the whole setup. Files are written via *.tmp + mv, so a node-exporter scrape never observes a partially-written file.

Files produced

One file per worker that has run at least once. Files are overwritten each run.

${METRICS_DIR}/
├── restic_backup.prom
├── restic_check.prom
├── restic_prune.prom
├── restic_replicate.prom
├── restic_restore.prom
├── restic_snapshot_export.prom
└── restic_forget_preview.prom

Always-emitted gauges

Per worker <job>backup, check, prune, replicate, restore, snapshot_export, forget_preview:

Metric Meaning
restic_<job>_last_exit_code{hostname="…"} Exit code of the most recent run.
restic_<job>_last_success{hostname="…"} 1 when exit code was 0, else 0.
restic_<job>_last_duration_seconds{hostname="…"} Wall-clock duration of the run.
restic_<job>_last_finished_timestamp{hostname="…"} Unix epoch seconds at which the run ended.
restic_<job>_last_started_timestamp{hostname="…"} Unix epoch seconds at which the run started.

The hostname label comes from the container hostname (set explicitly in Compose / Kubernetes with hostname:). Set one container per host so the label is unique.

Worker-specific extras

Extra numeric fields in last-<job>.json are emitted as restic_<job>_last_<key>. Non-numeric extras (human-formatted byte strings like "1.234 MiB", the masked repository) are intentionally skipped to keep the textfile strictly typed for Prometheus.

Worker Extra metrics
backup restic_backup_last_files_new, _files_changed, _files_unmodified, _dirs_new, _dirs_changed, _dirs_unmodified, _bytes_added, _bytes_stored (when restic produced bytes as a number).
replicate restic_replicate_last_replicate_jobs_processed, _replicate_jobs_failed.
restore, snapshot_export restic_restore_last_files_restored, _bytes_restored (when restic produced them as numbers).

Useful PromQL

Time since last successful backup

time() - restic_backup_last_finished_timestamp{hostname="backup-node"} > 26 * 3600

Fires when no backup has finished in the last 26 hours (a typical threshold for a daily 02:00 cron with some slack).

Backup failed yesterday

restic_backup_last_success{hostname="backup-node"} == 0

Restic check skipped or stale

time() - restic_check_last_finished_timestamp{hostname="backup-node"} > 8 * 24 * 3600

If you schedule CHECK_CRON weekly, this alert means it has been skipped or repeatedly failing for over a week.

Replicate jobs failing

restic_replicate_last_replicate_jobs_failed > 0

Backup running long

restic_backup_last_duration_seconds > 6 * 3600

Combine with the time() filter to catch a runaway backup that finished recently rather than alerting on every historical long run.

Example alert rules

groups:
  - name: restic-backup-helper
    rules:
      - alert: BackupOverdue
        expr: time() - restic_backup_last_finished_timestamp > 26*3600
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Backup overdue on {{ $labels.hostname }}"
          description: "Last successful backup was over 26h ago."

      - alert: BackupFailed
        expr: restic_backup_last_success == 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Backup failed on {{ $labels.hostname }}"
          description: "Exit code {{ $value }}; see /var/log/last-backup.json."

      - alert: ReplicateJobsFailed
        expr: restic_replicate_last_replicate_jobs_failed > 0
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "{{ $value }} replicate jobs failed on {{ $labels.hostname }}"
          description: "See /var/log/last-replicate.json for details."

Compose metrics profile

The reference scripts/docker-compose.yml ships a metrics Compose profile that adds a node-exporter sidecar bound to 127.0.0.1:9100 and scraping the backup-logs volume's textfile_collector/ subdirectory:

docker compose --profile metrics up
curl -fsS http://127.0.0.1:9100/metrics | grep restic_backup_last

No host-level node-exporter required.

See also