Requirements
A readable summary of what backupctl does, what it requires, and how its features are organized. For the complete, unabridged requirements document, see docs/initial/prd.md.
Overview
backupctl is a standalone, generic, database-agnostic backup orchestration service. It runs as an independent Docker container alongside a dedicated PostgreSQL audit database, manages scheduled backups for multiple projects via a single YAML config file, syncs encrypted and deduplicated snapshots to a Hetzner Storage Box via restic over SFTP, and provides a full CLI for every operational task.
backupctl is not tied to any specific project, database engine, or notification provider. Everything is configurable through config/projects.yml and .env.
Goals
- Generic — supports any database (PostgreSQL, MySQL, MongoDB) and notification channel (Slack, Email, Webhook) via the adapter pattern. Adding a new database or notifier means implementing a single port interface.
- Multi-project — a single service instance backs up multiple projects, each with its own schedule, database connection, asset paths, retention policy, encryption settings, and notification configuration.
- Hexagonal architecture — domain logic has zero infrastructure dependencies. Ports define contracts; adapters fulfill them. The domain layer contains no NestJS imports, no TypeORM, no decorators.
- TDD — full test coverage with Jest. Domain logic, orchestration flows, adapters, persistence, and CLI commands are all independently testable.
- CLI-first management — all operations are available through 14 nest-commander CLI commands, including restic passthrough per project.
- Production-ready — GPG encryption, pre/post hooks, dump verification, configurable retries with exponential backoff, a persistent audit trail with JSONL fallback, crash recovery on startup, and per-project file-based locking.
Non-Goals (v1)
These are explicitly out of scope for the initial version:
- Web-based dashboard or UI
- Multi-node / distributed backup coordination
- Streaming / WAL-based continuous backup (point-in-time recovery)
- Backup of non-filesystem assets (e.g., S3 buckets)
- Multiple notification channels per project (single channel only)
- Automatic database import on restore (files are extracted; the user runs
pg_restore/mysql/mongorestoremanually, with optional--guideinstructions)
Functional Requirements
Backup Orchestration
The core backup flow is an 11-step pipeline, executed per project:
0. Acquire per-project file-based lock
0b. AuditLogPort.startRun() → returns runId
1. Notify started
2. Execute pre-backup hook (if configured)
3. Dump database ┐
4. Verify dump │ retryable stages
5. Encrypt dump (if enabled) │ (exponential backoff)
6. Sync to remote storage (restic) │
7. Prune old snapshots │
8. Clean up local dumps ┘
9. Execute post-backup hook (if configured)
10. Finalize audit record (fallback to JSONL if DB down)
11. Notify success or failure (fallback to JSONL if fails)
12. Release lock (always, even on failure)Key behaviors:
- Per-project file-based lock —
.lockfiles at{BACKUP_BASE_DIR}/{project}/.lockprevent concurrent backups. Cron overlap queues behind the running backup. CLI collision rejects with exit code2. - Retry with exponential backoff — stages 3–8 retry up to
BACKUP_RETRY_COUNTtimes (default 3) withBACKUP_RETRY_DELAY_MS(default 5000ms) exponential backoff. Hooks, audit, and notification stages are non-retryable. - Timeout alerting — if
timeout_minutesis configured and exceeded, a warning notification fires. The backup is NOT killed — it continues running. - Dry run mode —
--dry-runvalidates config, checks DB connectivity, verifies restic repo access, tests SSH, checks disk space, and validates GPG key availability — without dumping, syncing, or modifying anything. - Stage progress tracking —
AuditLogPort.trackProgress(runId, stage)is called at each step, updatingcurrent_stagein real time. - Sequential multi-project runs —
run --allexecutes projects in YAML order. If one project fails, the rest still run. Exit code5indicates partial success.
Database Support
| Database | Dump tool | Compression | Verification |
|---|---|---|---|
| PostgreSQL | pg_dump | --format=custom | pg_restore --list |
| MySQL | mysqldump | piped through gzip | gunzip -t |
| MongoDB | mongodump | --gzip | mongorestore --dryRun |
Compression is always enabled — there is no toggle. Each dumper adapter uses the best compression method for its database type.
Remote Storage
- Restic over SFTP to a Hetzner Storage Box.
- Snapshot tagging — snapshots are tagged with
project:{name}andbackupctl:combined,backupctl:db, orbackupctl:assets:{path}depending on snapshot mode. - Snapshot modes —
combinedcreates one snapshot containing both database dump and asset paths.separatecreates individual snapshots for the dump and each asset path. - Retention-based pruning — configurable
keep_daily,keep_weekly,keep_monthlyper project. - Cache management — CLI commands to inspect and clear restic's local cache per project.
- Restic passthrough —
backupctl restic <project> <cmd>runs any restic command with the project's credentials pre-configured.
Notifications
| Channel | Transport | Config |
|---|---|---|
| Slack | Webhook URL (HTTP POST) | webhook_url |
SMTP with explicit TLS control (smtp_secure) | smtp_host, smtp_port, smtp_secure, to, from, password | |
| Webhook | HTTP POST with JSON + markdown body | url |
Events: backup_started, backup_success, backup_failed, backup_warning, daily_summary.
If notification delivery fails, the payload is written to the JSONL fallback file and retried on next startup. Notification failure is never a backup failure.
Encryption
- GPG encryption of dump files before remote sync.
- Auto-import of GPG public keys from the mounted
gpg-keys/directory on every startup. - Per-project or global — encryption can be enabled globally via
.envor per-project in YAML. Project settings override globals. - Manual import —
backupctl config import-gpg-key <file>for ad-hoc key additions.
Audit Trail
A PostgreSQL audit database (separate container) tracks every backup run with real-time stage progress.
- Insert + update pattern —
startRun()inserts a row withstatus: started, each stage updatescurrent_stage, andfinishRun()writes the final status with all result fields. - Orphan detection — records with
status = 'started'andcompleted_at IS NULLare detected during crash recovery and marked asfailed. - JSONL fallback — when the audit DB is unavailable, results are appended to
{BACKUP_BASE_DIR}/.fallback-audit/fallback.jsonland replayed on next startup. - Schema management — explicit TypeORM migrations, never
synchronize: true.
Crash Recovery
RecoverStartupUseCase runs on every container start (onModuleInit):
- Orphaned records — query for
status = 'started'withcompleted_at IS NULL, mark asfailedwitherror_stage = 'crash_recovery' - Orphaned dump files — scan project directories for dump files not associated with a successful audit record
- Stale lock files — remove
.lockfiles left behind by crashed processes - Restic repo unlock — auto-unlock all enabled project repos (safe because no backups run during startup)
- JSONL fallback replay — replay pending audit and notification entries into their respective systems
- GPG key import — auto-import all
.gpgpublic key files from the mountedgpg-keys/directory
CLI
14 commands via backupctl <command>:
| Command | Description |
|---|---|
run <project> [--all] [--dry-run] | Trigger backup or simulate |
status [project] [--last n] | Backup status with current_stage visibility |
health | Check audit DB, restic repos, disk space, SSH |
restore <project> <snap> <path> [--only db|assets] [--decompress] [--guide] | Restore snapshots with optional decompress and import guidance |
snapshots <project> [--last n] | List restic snapshots with tags |
prune <project> / --all | Manual restic prune |
logs <project> [--last n] [--failed] | Query audit log |
config validate / show / reload / import-gpg-key <file> | Config management |
cache <project> [--clear] / --clear-all | Restic cache management |
restic <project> <cmd> [args...] | Restic passthrough with auto-configured credentials |
Exit codes:
| Code | Meaning |
|---|---|
0 | Success |
1 | General failure |
2 | Backup already in progress (lock held) |
3 | Configuration validation error |
4 | Connectivity error (DB, SSH, restic unreachable) |
5 | Partial success (run --all — some succeeded, some failed) |
Configuration
config/projects.yml— per-project settings: database, schedule, assets, retention, encryption, hooks, notifications..env— global defaults and secrets: audit DB credentials, Hetzner SSH, restic password, notification defaults, retry settings.${VAR_NAME}resolution — variable references in YAML are resolved from.envat load time. Secrets always live in.envand are referenced via${}in YAML.- Resolution order — project YAML >
.envglobal > hardcoded defaults. - No hot-reload — config changes require an explicit
backupctl config reloadcommand.
Audit Schema
The BackupLog entity tracks every backup run:
| Column | Type | Description |
|---|---|---|
id | UUID | Primary key, generated at backup start (used as run identifier) |
project_name | VARCHAR | Project identifier |
status | ENUM | started, success, failed |
current_stage | VARCHAR | Current or last stage being executed (updated per step) |
started_at | TIMESTAMP | Backup start time |
completed_at | TIMESTAMP | Backup end time (null while in progress) |
dump_size_bytes | BIGINT | Size of the database dump |
encrypted | BOOLEAN | Whether the dump was encrypted |
verified | BOOLEAN | Whether the dump was verified |
snapshot_id | VARCHAR | Restic snapshot ID (null if failed before sync) |
snapshot_mode | VARCHAR | combined or separate |
files_new | INT | New files in snapshot |
files_changed | INT | Changed files in snapshot |
bytes_added | BIGINT | Bytes added to the restic repo |
prune_snapshots_removed | INT | Snapshots removed during prune |
local_files_cleaned | INT | Local dump files removed |
error_stage | VARCHAR | Stage where failure occurred (null if success) |
error_message | TEXT | Error details (null if success) |
retry_count | INT | Number of retries attempted |
duration_ms | BIGINT | Total backup duration in milliseconds |
created_at | TIMESTAMP | Record creation time |
Notification Formats
Backup Started
🔄 Backup started — locaboo
Time: 2026-03-18 00:00:00 ISTBackup Success
✅ Backup completed — locaboo
DB: locaboo_db | Dump: 245 MB | Encrypted: Yes | Verified: Yes
Snapshot: a1b2c3d4 | Mode: combined
New files: 12 | Changed: 3 | Added: 52 MB
Pruned: 2 snapshots | Local cleaned: 1 file
Duration: 3m 12sBackup Failed
❌ Backup failed — locaboo
Stage: restic sync | Retry: 3/3
Error: connection timeout to storage box
Dump file: /data/backups/locaboo/backup_2026-03-18_000000.sql.gz
Duration: 5m 42sBackup Timeout Warning
⚠️ Backup timeout warning — locaboo
Elapsed: 35m | Timeout threshold: 30m
Current stage: restic sync
Backup is still running — this is a warning, not a failure.Daily Summary
📊 Daily Backup Summary — 2026-03-18
✅ locaboo — 245 MB — 3m 12s — a1b2c3d4
✅ project-x — 128 MB — 1m 45s — e5f6g7h8
❌ project-y — FAILED — restic sync timeout
Total: 2/3 successful | Next run: per project scheduleWebhook Payload
The webhook notifier POSTs application/json with a text field containing the same markdown-formatted report, plus structured data:
{
"event": "backup_success",
"project": "locaboo",
"text": "✅ Backup completed — locaboo\nDB: locaboo_db | Dump: 245 MB ...",
"data": {
"run_id": "uuid",
"project_name": "locaboo",
"status": "success",
"snapshot_id": "a1b2c3d4",
"dump_size_bytes": 257949696,
"encrypted": true,
"verified": true,
"duration_ms": 192000,
"timestamp": "2026-03-18T00:03:12+01:00"
}
}Events: backup_started, backup_success, backup_failed, backup_warning, daily_summary.
Docker Setup
backupctl runs as two containers via Docker Compose:
| Container | Image | Purpose |
|---|---|---|
backupctl | Node.js 20 Alpine + DB clients + restic + GPG | Backup orchestration, CLI, scheduler, HTTP endpoints |
backupctl-audit-db | PostgreSQL 16 Alpine | Audit trail database |
Volumes:
| Mount | Container path | Mode | Purpose |
|---|---|---|---|
| Host backup dir | /data/backups | rw | Dump storage, lock files, fallback audit, logs |
./config | /app/config | ro | Project YAML configuration |
./ssh-keys | /root/.ssh | ro | SSH key pair and known_hosts for Hetzner |
./gpg-keys | /app/gpg-keys | ro | GPG public keys for dump encryption |
| Asset paths | Varies | ro | Project asset directories referenced in YAML |
The two containers communicate over a shared Docker network (backupctl-network). The audit database is not exposed to the host — only the backupctl container connects to it. The HTTP endpoints (health, status) are internal-only; there is no authentication.
Full PRD
This page is a summary. For the complete, unabridged requirements document with full domain port interfaces, TypeScript code examples, testing strategy, Docker configuration, and deployment scripts, see docs/initial/prd.md.
What's Next
- Get running — Installation walks through deploying backupctl on a fresh server.
- Understand the design — Architecture explains the hexagonal layer structure and key design decisions.
- Quick reference — Cheatsheet has copy-paste commands for daily operations.