Introduction
Duplicity is the best backup system for Linux, and probably for anything else it runs on. It is highly functional, mostly automatic, and free.
The advantage of highly functional software is that it will work successfully in thousands of different situations, the disadvantage is that it will work successfully in thousands of different situations! Which means it needs custom scripts to “tune” it to each situation. This is one such script.
Architecture
The script backs up my Desktop user data to a separate NAS drive. There are a number of possibilities of how to do this, so here are the architectural decisions:
- It runs as a multi-function shell script triggered daily by other scripts in cron tables.
- Once a week, on a pre-set Backup day, the script runs its backup code; on other days it runs its tidy-up code that checks there is sufficient space in the archive and that the backup store is clean and tidy.
- Part of the script allows for the fact that I might not have turned on the Desktop on backup day. So if the scheduled day is missed by up to 3 days the script will run the backup rather than the tidyup code on a day subsequent to a missed backup day.
- It accesses the NAS drive using NFS, so that the NAS file system is mounted under the Desktop. Other protocols were tried, such as rsynch, and SAMBA, but the way Duplicity works makes them very slow.
- The NAS drive is mounted by the script just before backup and unmounted just after. This is for security reasons and ensures the NAS is not normally accessible in case the desktop is compromised.
- There is extensive checking of the mounted drive to ensure the script is seeing the actual NAS drive and NOT the unmounted mount point as it would if the NAS drive were offline when the script triggered.
- Duplicity keeps backup sets in “chains”. Each chain starts with a full backup followed by a number of incrementals. The longer the chain – the longer the recovery time. But I rarely need to recover data so long chains are OK.
- Full Backups, which start new chains, are taken a few times a year, the rest are incremental. On my system incremental backups are less than 1% of the size of a full backup, this is because most of my data does not change on a regular basis.
- On Backup day the script checks when the last full backup was taken, if more than a set limit a new backup chain is started. On my system that 13 weeks, so I get 4 full backups a year.
- When a new backup chain is started the existing one is left intact on the archive, this gives me a number of older chains as historical backup data.
- How far back the historical data goes depends largely on how often a full backup is taken. Taking full backups monthly gave me less than 12 months of historical archive. Taking them 3 monthly should give me around two years.
- The size of the archive area on the NAS drive would expand forever if left to itself. So the script includes daily checks on both the space available and the “tidyness” of the Duplicity archive.
- The space calculation ensures the space available is at least the size of the current backup chain. When this check fails, as it usually does the day after a new chain was started, the script will delete the oldest chain it can find. It does NOT do this recursively, but the next day the script will run again and make the same space calculation.
- Duplicity has a built-in tidy-up feature that allows broken chains and other “lost” data to be removed. The tidyness check always runs before the space calculation, as there is no point in allowing room for broken chains and other detritus.
- The script sends me an email whenever it does anything significant.
- There are copious log files that tell you what the script is doing. See examples below.
- The script has a “debug” setting that allows you to run the script without actually triggering a duplicity backup. However the tidy-up code does run and uses duplicity clean-up features.
- If the tidy-up function does run it send its output to a dedicated log file so that you can see which oldest backup it decided to delete, this is very verbose output hence the separate file.
This is an example of the logfile as it appears on a non-backup day:
=====================================================
#@(#) dupbackup.sh Ver 0.6.1 16/11/22 Chris Ray
/opt/Backup/dupbackup.sh Thu 9 Feb 09:46:08 GMT 2023
Running as username:root
Running as UID: 0
No lock file, no duplicity running, is it Backup Day?
Today is Thu 9 Feb 09:46:08 GMT 2023 1675935968
Not the day for a backup, did we miss one by 3 days or less?
Next backup date is/was Wed 15 Feb 03:00:00 GMT 2023 1676430000
Have not yet reached the scheduled backup day, checking space....
Starting mount point /mnt/BackupDesktop with systemctl mnt-BackupDesktop.mount...
Testing presence of NFS mount
NFS mount started by systemctl
Check Mount Data: hardlinks=1 should be 1, UID= 98 should be 98, GID= 401 should be 401, size= 12 should be less than 100
creating Access-test file
Sent: /opt/Backup/Access-test : 1675935969 Access-test created Thu 9 Feb 09:46:09 GMT 2023
Returned: /opt/Backup/tmp/Access-test : 1675935969 Access-test created Thu 9 Feb 09:46:09 GMT 2023
Successfully Wrote and Read back from /mnt/BackupDesktop/Ubuntu NAS share
Getting Collection-status from NAS....
duplicity collection-status ran cleanly
backup sets are clean, so no incomplete or chainless sets to remove.
Checking space...
Space used by current Primary Backup: 537600000 KB
Now checking space on the NAS drive...
Space available on Archive : 672919680 KB
plenty of space for the next full backup
Stopping NFS mount for security
NFS mount now offline
Duplicity Backup Process Finished at Thu 9 Feb 09:46:25 GMT 2023
-------------------------------
And here is an example of the log on a backup day:
=====================================================
#@(#) dupbackup.sh Ver 0.6.1 16/11/22 Chris Ray
/opt/Backup/dupbackup.sh Wed 8 Feb 10:45:21 GMT 2023
Running as username:root
Running as UID: 0
No lock file, no duplicity running, is it Backup Day?
Today is Wed 8 Feb 10:45:21 GMT 2023 1675853121
Taking Backup.....
creating Lock file
Starting mount point /mnt/BackupDesktop with systemctl mnt-BackupDesktop.mount...
Testing presence of NFS mount
NFS mount started by systemctl
Check Mount Data: hardlinks=1 should be 1, UID= 98 should be 98, GID= 401 should be 401, size= 12 should be less than 100
creating Access-test file
Sent: /opt/Backup/Access-test : 1675853122 Access-test created Wed 8 Feb 10:45:22 GMT 2023
Returned: /opt/Backup/tmp/Access-test : 1675853122 Access-test created Wed 8 Feb 10:45:22 GMT 2023
Successfully Wrote and Read back from /mnt/BackupDesktop/Ubuntu NAS share
Running Duplicity to the NFS mount
Wed 8 Feb 10:45:31 GMT 2023
--full-if-older-than 12W --no-encryption --asynchronous-upload --max-blocksize 65536 --verbosity 4 --volsize 800 --exclude-filelist /opt/Backup/exclusions / file:///mnt/BackupDesktop/Ubuntu
Reading globbing filelist /opt/Backup/exclusions
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Wed Jan 4 11:26:20 2023
--------------[ Backup Statistics ]--------------
StartTime 1675853147.71 (Wed Feb 8 10:45:47 2023)
EndTime 1675853491.74 (Wed Feb 8 10:51:31 2023)
ElapsedTime 344.03 (5 minutes 44.03 seconds)
SourceFiles 732936
SourceFileSize 676936280456 (630 GB)
NewFiles 2637
NewFileSize 3708446737 (3.45 GB)
DeletedFiles 323
ChangedFiles 910
ChangedFileSize 950470608 (906 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 3870
RawDeltaSize 4101919610 (3.82 GB)
TotalDestinationSizeChange 3616869239 (3.37 GB)
Errors 0
-------------------------------------------------
duplicity ran cleanly, calculating date of next backup....
Wed 15 Feb 03:00:00 GMT 2023
Stopping NFS mount for security
NFS mount now offline
removing Lock...
Duplicity Backup Process Finished at Wed 8 Feb 10:51:53 GMT 2023
-------------------------------
Environment
Before you install this script you must have a NAS drive somewhere on your local network. I use a Netgear Ready NAS 102 with two 6TB disks running as a mirror pair. On that I have set up a Share with NFS network access and configured a username and password.
The directory share on the NAS drive must be visible from your desktop. This means you will have an entry in /etc/fstab like this:
# Attach the NAS shares as NFS mounts
#
bz-nas:/data/BackupDesktop /mnt/BackupDesktop nfs noauto,defaults 0 0
The noauto will ensure that the drive does not go live” when the desktop boots. You can use systemctl to check its presence:
fred@Desktop:/opt/Backup$ sudo systemctl status mnt-BackupDesktop.mount
○ mnt-BackupDesktop.mount – /mnt/BackupDesktop
Loaded: loaded (/etc/fstab; generated)
Active: inactive (dead)
Where: /mnt/BackupDesktop
What: bz-nas:/data/BackupDesktop
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
fred@Desktop:/opt/Backup$
This is important as the script uses a systemctl command to start the mount point just before the backup is taken.
Installation
First pick the place where you wish to keep the scripts, config settings and log files. In my system this is /opt/Backup
create a tmp directory in the chosen area.
Install the scripts and config files in the chosen area. These should consist of:
dupbackup.sh | the actual script |
runbackup | the script triggered by Cron that runs dupbackup.sh |
exclusions | a config file listing all the files and directories you DON’T want backed up. |
passwd | a file containing the NFS password to the NAS drive |
README | info about the system |
Edit the exclusions file to fit your needs. This file lists all the files and directories that you do NOT want backed up.
Ensure the passwd file has permissions of 400:root:root and edit it appropriately. You may need to temporarily change the permissions. It contains the password duplicity will use to access the NAS drive
Issues
My Desktop gets turned on and off during the day and often goes days without being turned on at all. So I cannot use cron (designed for always-on servers) to trigger the script as there is no guarantee the Desktop will be on at the scheduled time. Anacron was designed for this situation, but has the problem that any script placed in the “daily” config triggers just past midnight each day.
I am a night-owl. I often work past midnight. Which means the Backup script will trigger as midnight passes. But a full backup can take many hours to complete, which means if I turn off the desktop the backup is interrupted. The runbackup script uses the UBUNTU built in “systemd-inhibit” command that should prevent system shutdown while the backup is running, or at least give a warning that shutdown will corrupt it.
However, if a backup IS interrupted it will automatically resolve itself the next day. On the next run dupbackup.sh will detect the presence of a lock file and, knowing the backup was interrupted, will run duplicity in backup mode (rather than cleanup mode) to recover the situation. After an interrupted run duplicity will detect the partial backup and complete it automatically.