Prometheus metrics¶
When METRICS_DIR points at a writable directory inside the container,
every worker also writes a restic_<job>.prom text file alongside the
last-<job>.json. Mount that directory into the host and point a
node-exporter --collector.textfile.directory at it — no push gateway
required.
Enabling¶
environment:
METRICS_DIR: /var/log/textfile_collector
volumes:
- ./metrics:/var/log/textfile_collector
Then on the host:
That's the whole setup. Files are written via *.tmp + mv, so a
node-exporter scrape never observes a partially-written file.
Files produced¶
One file per worker that has run at least once. Files are overwritten each run.
${METRICS_DIR}/
├── restic_backup.prom
├── restic_check.prom
├── restic_forget.prom # only when FORGET_CRON is set
├── restic_prune.prom
├── restic_replicate.prom
├── restic_restore.prom
├── restic_snapshot_export.prom
├── restic_forget_preview.prom
├── restic_mount_snapshot.prom
├── restic_unlock.prom # only when /bin/unlock has been run
├── restic_sources_report.prom # only when /bin/sources-report has been run
├── restic_init_repo.prom # only when /bin/init-repo has been run
├── restic_notify_test.prom # only when /bin/notify-test has been run
└── restic_restore_test.prom # only when /bin/restore-test has been run
Always-emitted gauges¶
Per worker <job> ∈ backup, check, forget, prune, replicate,
restore, snapshot_export, forget_preview, mount_snapshot,
unlock, sources_report, init_repo, notify_test, restore_test:
| Metric | Meaning |
|---|---|
restic_<job>_last_exit_code{hostname="…"} |
Exit code of the most recent run. |
restic_<job>_last_success{hostname="…"} |
1 when exit code was 0, else 0. |
restic_<job>_last_duration_seconds{hostname="…"} |
Wall-clock duration of the run. |
restic_<job>_last_finished_timestamp{hostname="…"} |
Unix epoch seconds at which the run ended. |
restic_<job>_last_started_timestamp{hostname="…"} |
Unix epoch seconds at which the run started. |
The hostname label comes from the container hostname (set explicitly in
Compose / Kubernetes with hostname:). Set one container per host so the
label is unique. The label value is escaped before writing, so unusual
hostnames containing quotes, backslashes or newlines do not break the
Prometheus textfile format.
Worker-specific extras¶
Extra numeric fields in last-<job>.json are emitted as
restic_<job>_last_<key>. Non-numeric extras (human-formatted byte
strings like "1.234 MiB", the masked repository) are intentionally
skipped to keep the textfile strictly typed for Prometheus.
| Worker | Extra metrics |
|---|---|
backup |
restic_backup_last_files_new, _files_changed, _files_unmodified, _dirs_new, _dirs_changed, _dirs_unmodified, _bytes_added, _bytes_stored (when restic produced bytes as a number). |
replicate |
restic_replicate_last_replicate_jobs_processed, _replicate_jobs_failed. |
restore, snapshot_export |
restic_restore_last_files_restored, _bytes_restored (when restic produced them as numbers). |
Useful PromQL¶
Time since last successful backup¶
Fires when no backup has finished in the last 26 hours (a typical threshold for a daily 02:00 cron with some slack).
Backup failed yesterday¶
Restic check skipped or stale¶
If you schedule CHECK_CRON weekly, this alert means it has been skipped
or repeatedly failing for over a week.
Replicate jobs failing¶
Backup running long¶
Combine with the time() filter to catch a runaway backup that finished
recently rather than alerting on every historical long run.
Example alert rules¶
groups:
- name: restic-backup-helper
rules:
- alert: BackupOverdue
expr: time() - restic_backup_last_finished_timestamp > 26*3600
for: 5m
labels:
severity: warning
annotations:
summary: "Backup overdue on {{ $labels.hostname }}"
description: "Last successful backup was over 26h ago."
- alert: BackupFailed
expr: restic_backup_last_success == 0
for: 0m
labels:
severity: critical
annotations:
summary: "Backup failed on {{ $labels.hostname }}"
description: "Exit code {{ $value }}; see /var/log/last-backup.json."
- alert: ReplicateJobsFailed
expr: restic_replicate_last_replicate_jobs_failed > 0
for: 0m
labels:
severity: warning
annotations:
summary: "{{ $value }} replicate jobs failed on {{ $labels.hostname }}"
description: "See /var/log/last-replicate.json for details."
Compose metrics profile¶
The reference scripts/docker-compose.yml
ships a metrics Compose profile that adds a node-exporter sidecar
bound to 127.0.0.1:9100 and scraping the backup-logs volume's
textfile_collector/ subdirectory:
docker compose --profile metrics up
curl -fsS http://127.0.0.1:9100/metrics | grep restic_backup_last
No host-level node-exporter required.
See also¶
- JSON summaries — the source of every numeric metric.
- Webhooks — push-based alternative.
- Filesystem layout — where the files live inside the container.