Prometheus metrics reference¶
When METRICS_DIR is set, every worker writes a restic_<job>.prom
text file alongside its last-<job>.json. This page is the canonical
reference for the metric names and labels. See
Configuration → Prometheus metrics for
setup and example PromQL.
Files produced¶
One file per worker that has run at least once. Files are overwritten
each run and written via *.tmp + mv so a scrape never sees a
partial document.
${METRICS_DIR}/
├── restic_backup.prom
├── restic_check.prom
├── restic_forget.prom # only when FORGET_CRON is set
├── restic_prune.prom
├── restic_replicate.prom
├── restic_restore.prom
├── restic_snapshot_export.prom
├── restic_forget_preview.prom
├── restic_mount_snapshot.prom
├── restic_unlock.prom # only when /bin/unlock has been run
├── restic_sources_report.prom # only when /bin/sources-report has been run
├── restic_init_repo.prom # only when /bin/init-repo has been run
├── restic_notify_test.prom # only when /bin/notify-test has been run
└── restic_restore_test.prom # only when /bin/restore-test has been run
Always-emitted gauges¶
For each <job> ∈ backup, check, forget, prune, replicate,
restore, snapshot_export, forget_preview, mount_snapshot,
unlock, sources_report, init_repo, notify_test, restore_test:
| Metric | Type | Description |
|---|---|---|
restic_<job>_last_exit_code{hostname="…"} |
gauge | Exit code of the most recent run. |
restic_<job>_last_success{hostname="…"} |
gauge | 1 when exit code was 0, else 0. |
restic_<job>_last_duration_seconds{hostname="…"} |
gauge | Wall-clock duration of the run. |
restic_<job>_last_started_timestamp{hostname="…"} |
gauge | Unix epoch seconds at start. |
restic_<job>_last_finished_timestamp{hostname="…"} |
gauge | Unix epoch seconds at finish. |
The hostname label comes from the container's hostname (set
explicitly in Compose / Kubernetes with hostname:). Run one container
per logical job so the label is unique. The label value is escaped
before writing, so unusual hostnames containing quotes, backslashes or
newlines do not break the textfile format.
Worker-specific gauges¶
Extra numeric fields in last-<job>.json are emitted as
restic_<job>_last_<key>. Non-numeric extras (human-formatted byte
strings like "1.234 MiB", the masked repository) are intentionally
skipped to keep the textfile strictly typed for Prometheus.
| Worker | Metric | Source |
|---|---|---|
backup |
restic_backup_last_files_new |
files_new |
backup |
restic_backup_last_files_changed |
files_changed |
backup |
restic_backup_last_files_unmodified |
files_unmodified |
backup |
restic_backup_last_dirs_new |
dirs_new |
backup |
restic_backup_last_dirs_changed |
dirs_changed |
backup |
restic_backup_last_dirs_unmodified |
dirs_unmodified |
backup |
restic_backup_last_bytes_added |
bytes_added (when numeric) |
backup |
restic_backup_last_bytes_stored |
bytes_stored (when numeric) |
backup |
restic_backup_last_forget_exit_code |
forget_exit_code — inline forget result (0 ok, 11 skipped because another host held the exclusive lock, other = restic failure). Only emitted when an inline forget actually ran (i.e. RESTIC_FORGET_ARGS set and FORGET_CRON empty). |
forget |
restic_forget_last_exit_code |
Top-level exit_code of the dedicated worker (0, 2 for empty policy, 11 for multi-host lock race, other = restic failure). Same alerting target as restic_backup_last_forget_exit_code for deployments that opted into FORGET_CRON. |
replicate |
restic_replicate_last_replicate_jobs_processed |
replicate_jobs_processed |
replicate |
restic_replicate_last_replicate_jobs_failed |
replicate_jobs_failed |
restore, snapshot_export, restore_test |
restic_<job>_last_files_restored |
files_restored |
restore, snapshot_export, restore_test |
restic_<job>_last_bytes_restored |
bytes_restored (when numeric) |
snapshot_export |
restic_snapshot_export_last_archive_size_bytes |
archive_size_bytes |
restore_test |
restic_restore_test_last_canary_total |
canary_total |
restore_test |
restic_restore_test_last_canary_passed |
canary_passed |
restore_test |
restic_restore_test_last_canary_failed |
canary_failed |
The bytes_* metrics are only emitted when the underlying JSON field
is a number. Restic's textual bytes_added="1.234 MiB" is not
mapped to a metric — preserve the typed view for Prometheus and consult
the JSON for the human-formatted variant.
Sample .prom file¶
# HELP restic_backup_last_exit_code Exit code of the most recent backup run.
# TYPE restic_backup_last_exit_code gauge
restic_backup_last_exit_code{hostname="backup-node"} 0
# HELP restic_backup_last_success 1 if the most recent backup succeeded, 0 otherwise.
# TYPE restic_backup_last_success gauge
restic_backup_last_success{hostname="backup-node"} 1
# HELP restic_backup_last_duration_seconds Wall-clock duration of the most recent backup run.
# TYPE restic_backup_last_duration_seconds gauge
restic_backup_last_duration_seconds{hostname="backup-node"} 312
# HELP restic_backup_last_started_timestamp Unix epoch at which the most recent backup started.
# TYPE restic_backup_last_started_timestamp gauge
restic_backup_last_started_timestamp{hostname="backup-node"} 1762828800
# HELP restic_backup_last_finished_timestamp Unix epoch at which the most recent backup finished.
# TYPE restic_backup_last_finished_timestamp gauge
restic_backup_last_finished_timestamp{hostname="backup-node"} 1762829112
# HELP restic_backup_last_files_new Files added by the most recent backup run.
# TYPE restic_backup_last_files_new gauge
restic_backup_last_files_new{hostname="backup-node"} 12
# HELP restic_backup_last_files_changed Files changed by the most recent backup run.
# TYPE restic_backup_last_files_changed gauge
restic_backup_last_files_changed{hostname="backup-node"} 4
# HELP restic_backup_last_files_unmodified Files unmodified in the most recent backup run.
# TYPE restic_backup_last_files_unmodified gauge
restic_backup_last_files_unmodified{hostname="backup-node"} 21034
node-exporter scrape¶
For Compose: docker compose --profile metrics up spins a sidecar
that scrapes the volume directly — see Docker Compose.
Useful PromQL¶
| Question | Query |
|---|---|
| Time since last successful backup | time() - restic_backup_last_finished_timestamp{hostname="backup-node"} |
| Did the latest backup fail? | restic_backup_last_success{hostname="backup-node"} == 0 |
| Backup running long | restic_backup_last_duration_seconds > 6*3600 |
| Replicate partial failure | restic_replicate_last_replicate_jobs_failed > 0 |
| Restore zero-match warning | restic_restore_last_files_restored == 0 and restic_restore_last_success == 1 (look at last-restore.json include_zero_match to confirm) |
| Restore rehearsal failed | restic_restore_test_last_success == 0 (drill into /var/log/last-restore-test.json to see whether it was a canary mismatch vs file-count floor vs restic error) |
| Restore rehearsal stale | time() - restic_restore_test_last_finished_timestamp > 7*86400 (no successful rehearsal in the last 7 days) |
Stability promise¶
Metric names and label sets are part of the public API surface. Adding new metrics is a MINOR bump; renaming or removing a metric is a MAJOR bump.