Troubleshooting¶
Common symptoms and what to check first. When in doubt, run
/bin/doctor — it rolls up most of the checks below
into a single command.
Backup behaviour¶
Backup exits immediately with Missing RESTIC_TAG
RESTIC_TAG="" (explicitly empty) is a hard failure since 1.14.0.
Pick something meaningful (daily, ${HOSTNAME}-data, …) so
snapshots can be filtered later.
Backup logs success but restic snapshots shows zero / tiny snapshots
Walk this list:
- Confirm the host volume backing
BACKUP_ROOT_DIRis actually mounted into the container: - When you use
--files-fromor--exclude-fileinRESTIC_JOB_ARGS, verify those files exist inside the container and contain real, in-container paths. - Inspect
last-backup.jsonforfiles_newandbytes_added: - Run
restic snapshots latest --jsonfor the canonical file/byte counts.
A misspelled bind mount, an --files-from referring to host paths
the container cannot see, or an over-broad --exclude-file all
produce a successful but empty backup.
Empty or wrong backup content
Set BACKUP_ROOT_DIR and/or RESTIC_JOB_ARGS paths intentionally;
empty both yields a degenerate restic backup invocation that
snapshots… nothing.
Container startup¶
Container exits with Repository probe failed for '…' with exit code N
Restic could reach the repository service but the repository itself is unhealthy. Restic exit codes you may see:
| Code | Cause |
|---|---|
12 |
Wrong password. |
| other | Network, DNS, TLS, auth or upstream service error. |
Read the restic stderr in the container log; the entrypoint
deliberately refuses to run restic init for any non-10 exit so
that a transient failure cannot silently re-init a healthy remote.
As a last resort, set RESTIC_CHECK_REPOSITORY_STATUS=OFF to
bypass the probe — you lose the auto-init safety net but unblock
troubleshooting.
TLS / certificate errors against the repository or a corporate proxy
Mount the PEM bundle into the container and set RESTIC_CACERT to
its path. The flag is appended to every restic invocation
automatically; config-check will fail when the path is
unreadable.
NFS mount fails at startup
The container aborts with exit 1 when NFS_TARGET is set but the
mount fails. Check:
- The NFS server hostname resolves from inside the container.
- The container has
cap_add: [SYS_ADMIN]. - The export allows the container's outbound IP.
Notifications¶
Webhook never reaches the endpoint and the cron log is silent
Confirm WEBHOOK_URL is set inside the container:
Container logs only show scheme://host/.... Test connectivity from
inside the container:
Hook never returns / blocks the next cron run
Set HOOK_TIMEOUT to a positive integer (seconds). The hook is
wrapped in timeout; exit 124 is logged as a timeout but does
not fail the underlying restic job.
Mail goes nowhere and cron.log does not mention msmtp
Check that MAILX_RCPT is set, then verify msmtp config:
docker exec restic-backup-helper ls -la /etc/msmtprc
docker exec restic-backup-helper sendmail -t <<EOF
From: test@example.com
To: ${MAILX_RCPT}
Subject: msmtp test
Test body.
EOF
msmtp refuses to read a config that is group- or world-readable;
chmod 600 on the host file.
Locking and overlapping ticks¶
Restic reports unable to create lock in backend: repository is already locked
List the locks and confirm whose they are:
Since 1.12.0 the helper no longer auto-unlocks after a failure
(safer for multi-host repos). Set RESTIC_AUTO_UNLOCK=ON to
restore the previous behaviour only if you back up from one
host.
Cron tick logs ⏭ <job> skipped: previous run still active
The previous backup/check/replicate/rotate is still holding its
local flock. Confirm the long-running PID inside the container:
Either wait, kill it, or widen the cron interval. If the lock process is gone but the flock is somehow still held, restart the container.
Time and timezones¶
Cron fires at the wrong local time
Set TZ and restart the container. busybox crond reads TZ from
its process environment at startup, so changing TZ after the
container has started does not affect the running cron daemon.
Mail subject timestamps are off
The subject uses the container's TZ. Set TZ=UTC if you prefer
everything in UTC and have multi-region operators.
Rclone and replicate¶
Rclone auth keeps breaking after a token refresh
Ensure rclone.conf is on a writable mount. Some providers
(Google Drive, Jottacloud, OneDrive, …) write back to
rclone.conf when the access token is refreshed. A read-only
bind-mount means rclone cannot persist the refreshed token and
must re-authenticate on every run.
Bisync recovery deleted data on the destination
The default bisync recovery (copy both → bisync --resync) can
propagate one-sided deletes. Two safety knobs:
- Set
REPLICATE_BISYNC_CHECK_ACCESS=ONand seed anRCLONE_TESTmarker file on both endpoints. Rclone aborts loudly when the marker is missing. - Switch to
MODE=syncorMODE=copyin your job file when you don't actually need bidirectional behaviour. One-way modes skip the destructive copy-both recovery.
See Replicate worker.
Permissions¶
Permission denied reading source paths
Either:
- Match UID/GID of the host filesystem on the mounted volume (the container runs as root by default, so this typically only happens with rootless Docker or restrictive SELinux/AppArmor).
- Add
cap_add: [DAC_READ_SEARCH]so the container can bypass DAC restrictions for reading (does not allow writes).
Permission denied writing /restore after a restore
The restore wrapper does not chown by default. Add
--owner UID:GID to set ownership of the restored tree, or write a
/hooks/post-restore.sh that does whatever you need.
/bin/mount-snapshot exits with fusermount: mount failed: Permission denied
restic mount (FUSE) needs all of the following on the
container that runs the helper, and /bin/mount-snapshot
pre-flights every one of them — the abort message names the
specific knob that is wrong:
--cap-add SYS_ADMIN(compose:cap_add: [SYS_ADMIN]; Kubernetes:securityContext.capabilities.add: [SYS_ADMIN]). The helper checksCapEffin/proc/self/statusfor bit 21 (0x200000); when the bit is missing it aborts with the observedCapEffvalue so you can spot which capability set you ended up with.--device /dev/fuse(compose:devices: [/dev/fuse:/dev/fuse]; Kubernetes: ahostPath/dev/fusevolume plusvolumeDevices)./usr/bin/fusermountmust keep its setuid bit at runtime. Starting the container with--security-opt no-new-privileges:true(compose:security_opt: [no-new-privileges:true]) leaves the on-disk bit alone but tells the kernel to ignore it at exec; the helper readsNoNewPrivsfrom/proc/self/statusand aborts when it is1.- The AppArmor profile must allow
mount(2). On Ubuntu/Debian hosts (and any host shipping Docker's default AppArmor template) the active profile isdocker-default (enforce), which deniesmount(2)even withCAP_SYS_ADMIN, so FUSE fails with the samePermission denied. The helper reads/proc/self/attr/currentand aborts when the profile is enforcing; addsecurity_opt: [apparmor:unconfined](compose),--security-opt apparmor=unconfined(docker run),container.apparmor.security.beta.kubernetes.io/<container>: unconfinedannotation (Kubernetes ≤1.29) orsecurityContext.appArmorProfile.type: Unconfined(Kubernetes ≥1.30) for this container. - The image must ship the
fuseapk package so that/usr/bin/fusermountexists at all. The current helper image installs it in the Dockerfile; if you are seeing❌ /usr/bin/fusermount is missing from PATHyour image is older than the package addition — rebuild from the current sources orapk add --no-cache fusefor a one-shot smoke test.
If you keep the cron-driven container hardened with
no-new-privileges:true or apparmor=docker-default, run
mount-snapshot from a separate short-lived container without
those flags, with --cap-add SYS_ADMIN --device /dev/fuse
--security-opt apparmor=unconfined and your normal repository
env. See Mount snapshot →
Troubleshooting.
Networking¶
Pull / push fails via corporate proxy to a private registry or LAN host
Add the registry hostname or LAN ranges to NO_PROXY /
no_proxy:
Verify TLS to internal registries; corporate CAs need to be on the
host (for docker pull) and inside the container (via
RESTIC_CACERT for repository TLS).
When you've tried everything¶
- Run
/bin/doctorand read every section. It is designed to surface the 90% of problems above without you having to remember which env var / path / hook to check first. - Run
docker exec restic-backup-helper tail -n 200 /var/log/cron.logfor the cron-side narrative. - Open an issue at
github.com/marc0janssen/restic-backup-helper/issues
with the doctor output, the relevant
last-<job>.json, and the tail ofcron.log. Sensitive values are already masked by the helper.