And Replication

Introduction

Duplicity is the best backup system for Linux, and probably for anything else it runs on. It is highly functional, mostly automatic, and free.

The advantage of highly functional software is that it will work successfully in thousands of different situations, the disadvantage is that it will work successfully in thousands of different situations! Which means it needs custom scripts to “tune” it to each situation. This is one such script.

You therefore need to understand the basics of duplicity backups. At least what chain sets of Full and Incremental backups are.

History

This script was first written to back up a number of WordPress websites on a daily basis onto a NAS drive. As the sites are WordPress I have no control over when the sites are updated, so I decided to back them all up daily. Luckily, as I use duplicity as the archive software and I can set the Full backup frequency, most backups are only incremental, and use hardly any space on the actual archive on the NAS drive.

In mid-2020, right in the middle of the pandemic, we decided to move house. As all the websites run on a server in my garage, this would mean they would all be down during the month or so it would take me to move the servers and get them running again. Luckily I have a rented server somewhere in the cloud. It normally operates as the MX failover server for my mailsystem and runs just postfix. But with the addition of apache, PHP and mysql it would handle my websites as well.

So I initially wrote a separate script to take a copy of the backup files and replay them onto the secondary server to make replica websites. They do this every day, which means the replica websites are only ever 48 hours behind the real websites. I tried moving one of my own websites to check the operation.

At this point I realised there was a problem. Every night a Replica website is restored to a replica of the Primary that is on the Archive. It does this automatically as part of the script, but if anyone were to try updating a Replica site any updates would be lost overnight as the site restores itself to a copy of the Primary.

So I have now combined the original backup script with the newer replicate script to make a single backup-replicate script that works both ways, and uses markers to decide which is the Primary and which is the Replica site.

So now, a week before move day, I just need to tell my users to follow these steps.

Stop updating their websites, contact me, and wait 48 hours.
After 48 hours repoint their DNS setting to the secondary server and wait for these DNS changes to replicate. They should see no difference in the website.
When I tell them, carry on updating the website as normal. Updating their website now occurs on the secondary server.

Why 48 hrs and not 24? Because the same script runs every night on all servers. Let’s say a client does his last update on a Monday and then calls me. Those updates will appear on the Backup Archive on Monday night, but that might occur slightly AFTER the replicate script has already run. So the latest updates will not copy across to the Replica until Tuesday night’s run. Hence 48 hours.

When they contact me I wait for the last copy of their website to backup to the archive, which it does overnight. I then wait for the replicate script to copy this to the secondary server, which might be the next overnight. Once I am sure the website is a good copy of their last updated site, I change a setting on the secondary server so that it now thinks it is the Primary copy. Finally I tell them to continue updating as normal. ( And in the background check they are NOT still using the website on the OLD Primary server!)

So now, as each site owner successfully moves onto the secondary server I change over the Primary/Replica markers on each server and the script now takes backups from the site on the secondary server onto the NAS. The script on the original server then replicates any sites on the NAS marked as Primary onto itself – if it’s online. When I am ready to bring the garage server back online, I simply boot it up, wait for the script to run, and it will soon contain an exact replica of every site on the Secondary server. I then just follow the three step DNS change to move all the sites back to the Garage server.

Of course, if I am happy to keep two servers running I could just move half the sites. This would mean that a server failure would just take down half the sites and even those would be back again on the other server with only the loss of any updates in the previous 24 hours.

February 2023

With the recent Energy price increases I have calculated that running a 300Watt server 24 hrs/day is slightly more expensive that renting one from Fasthosts! So the garage server is still offline.

Just before the websites were transferred onto the cloud server in 2021 I installed a VPN from the cloud server to the home network. I didn’t want backups and other data going over the public Internet. This used OpenVPN technology installed on the cloud server and a virtual server on the home network that acted as the VPN gateway. This all went offline with the house move.

Although I still don’t have the garage servers working again I have migrated the VPN onto the Draytek 2862 Router I use on the home network. So all backups from the Cloud server to the home NAS now go over that.

But a few weeks later I discovered that the permanently active VPN was causing the router to reboot with regular monotony. A drastic impact on Netflix!

So I now have a small script on the cloud server that turns the VPN on and off as required. This has solved the Netflix issue. But requires every script that calls home, like the Website Backup script, to turn on and off the VPN as it starts and stops.

June 2024

Well, the backup/replicate system has been running for a year or so and I haven’t had to touch it.

However, back in May I updated a WordPress site and the update failed causing the whole site to go offline. No access to anything. “That’s fine”, I think, “I have a backup system”. But when I take a closer look I realise there is no automatic RESTORE function. I can do it manually, but since the inclusion of databases and Certificates, it can get very messy. So I have modified the script.

You can now give it a -X switch and name a specific site. This will cause the script to RECOVER that site FROM the NAS drive back onto the server, including the database and certificates. It does this using the last known version of backup on the NAS. However you can add a -d (Time) switch to the command line along with the -X switch to tell the script how far to go back in time for a recovery.

In my case the website failed a few weeks ago and I had a week’s holiday in the meantime. Meanwhile the script was taking nightly backups of the faulty site. Going back 35 days still restored a non-functioning website, but 40 days was fine.

So “sudo ./websitebackupreplicate.sh -X sitename -d 40D” did the trick.

The Actual scripts are here:

Architecture

The script backs up a number of WordPress websites on various servers to a large NAS archive. Each website has a “primary” copy on a server. This is backed up every night to the NAS archive. Other servers have a “replica” copy that is extracted every night from the backup on the NAS archive. So that each Replica is an exact copy of the current site.

There is now a RECOVER switch intended to be run manually to recover a specific website back to a specific date.

The same shell script does all three of these functions.

Here are the architectural decisions:

It runs as a multi-function shell script triggered daily by other scripts in cron tables.
The same script runs on every server whether primary or replica.
The script uses Duplicity to take the actual Backups and replicas.
Duplicity keeps backup sets in “chains”. Each chain starts with a full backup followed by a number of incrementals. The longer the chain – the longer the recovery time.
Full Backups, which start new chains, are taken once a month by script default, or more or less frequently if the site has a file called “full_backup_freq” containing period data acceptable to Duplicity. e.g. 1M, 1W, 28D, 3M, etc.
Every day, the same script runs on every server. It looks in the /srv/www/vhosts area for virtual websites under Apache.
All websites exist in the File structure under /srv/www/vhosts/exampleSiteName, with the WordPress binaries, or just plain HTML pages, in the subdirectory SITE.
All of the site directory is backed up, i.e. everything under /srv/www/vhosts/exampleSiteName. Including the WordPress “root” directory SITE.
It checks to see which of these sites are using WordPress. Once it has a list of WordPress sites, it starts up a VPN, if one exists, and mounts the remote NAS archive as an NFS share using a password. Other protocols were tried, such as rsynch, and SAMBA, but the way Duplicity works makes them very slow.
At any time you can manually run the script using the -X switch to recover a site. If you add the -d switch you can specify how far back the recovery should go.

Operation

For each WordPress website it then:
Checks if the site is Primary on this server. A file called “PRIMARY_IS_servername” will exist saying which server is Primary. The script checks if servername in the filename matches its own hostname.
Checks the size of the Duplicity archive on the NAS store for this site, against a value in MB stored on the site’s file system in the file “max_NAS_space”. This defaults to 1000 MB set in the script.
If the space used on the NAS archive exceeds the space allocated for the site, the script will start its cleanup/delete section.
- The Cleanup/Delete section will do ONE ONLY of the following actions per run. i.e. per day.
- It first checks to see if the Duplicity chain set on the archive is clean and tidy. If it is not it will delete orphaned or corrupted chains.
- If the chain set is clean and tidy the script will delete the incremental backups from the oldest chain that still contains them. The script will NOT delete incremental backups from the Primary chain, or the latest few Secondary chains – the exact number depending on a script setting.
- If the chain set is clean and tidy, and contains no incremental backups on chains older than a default set in the script, the script will delete the oldest full backup chain it can find.
- This Cleanup or Deletion of backups will continue until the space taken on the archive is less than the allocated space. BUT ONLY ONE ITEM PER DAY.
Finally, if the site is a Primary, Duplicity is run to save a backup on the NAS archive. Normally this is an Incremental Backup appended to the current primary chain. Occasionally a Full backup is taken starting a new primary chain, and the old primary chain becomes the youngest of the secondary chains.
Depending on space available this sequence gives a number of older chains as historical backup data.
How far back the historical data goes depends largely on the relative size of full and incremental backups, and the allocated space this site has on the NAS archive.
If the size of a full backup is significantly greater than the incremental backups of the site, the deletion of incremental backups is bypassed and the scripts starts deleting the oldest complete chain immediately.
To compute this ratio the youngest secondary chain is examined. If the number of duplicity volumes used by the Full backup is still bigger than the total number of duplicity volumes used by all the Incremental backups, incremental deletion is bypassed.
To RECOVER a site from Backup, the script first checks the named site it is a known Primary site, and then adds an artificial REPLICA flag to the site causing the script to work backwards and RESTORE the site. There are various checks to prevent this happening accidentally.

External (non-script) arrangements

The NAS drive is mounted by the script just before backup and unmounted just after. This is for security reasons and ensures the NAS is not normally accessible in case a server is compromised.
There is extensive checking of the mounted drive to ensure the script is seeing the actual NAS drive and NOT the unmounted mount point as it would if the NAS drive were offline when the script triggered.
The space calculation also checks that the total free space on the NAS archive is at least ten times the space allocation of any individual site.
The script sends me an email whenever it does anything significant.
There are copious log files that tell you what the script is doing. See examples below.
The script has a “debug” setting that allows you to run the script without actually triggering a duplicity backup or deletion.
If the tidy-up function does run it sends a copy of the Duplicity chain status to a dedicated log file so that you can see which oldest backup it decided to delete, this is very verbose output hence the separate file.

This is an example of the logfile:


=====================================================
#@(#) dupbackup.sh  Ver 0.6.1  16/11/22 Chris Ray
/opt/Backup/dupbackup.sh Thu 9 Feb 09:46:08 GMT 2023
Running as username:root
Running as UID: 0
No lock file, no duplicity running, is it Backup Day?
Today is Thu 9 Feb 09:46:08 GMT 2023 1675935968
Not the day for a backup, did we miss one by 3 days or less?
Next backup date is/was Wed 15 Feb 03:00:00 GMT 2023 1676430000
Have not yet reached the scheduled backup day, checking space....
Starting mount point /mnt/BackupDesktop with systemctl mnt-BackupDesktop.mount...
Testing presence of NFS mount
NFS mount started by systemctl
Check Mount Data: hardlinks=1 should be 1, UID= 98 should be 98, GID= 401 should be 401, size= 12 should be less than 100
creating Access-test file
Sent: /opt/Backup/Access-test : 1675935969 Access-test created Thu 9 Feb 09:46:09 GMT 2023
Returned: /opt/Backup/tmp/Access-test : 1675935969 Access-test created Thu 9 Feb 09:46:09 GMT 2023
Successfully Wrote and Read back from /mnt/BackupDesktop/Ubuntu NAS share
Getting Collection-status from NAS....
duplicity collection-status ran cleanly
backup sets are clean, so no incomplete or chainless sets to remove.
Checking space...
Space used by current Primary Backup: 537600000 KB
Now checking space on the NAS drive...
Space available on Archive :          672919680 KB
plenty of space for the next full backup

Stopping NFS mount for security
NFS mount now offline
Duplicity Backup Process Finished at Thu 9 Feb 09:46:25 GMT 2023
-------------------------------

Environment

Before you install this script you must have a NAS drive somewhere on your local network. I use a Netgear Ready NAS 102 with two 6TB disks running as a mirror pair. On that I have set up a Share with NFS network access and configured a username and password.

The directory share on the NAS drive must be visible from your desktop. This means you will have an entry in /etc/fstab like this:

# Attach the NAS shares as NFS mounts
# 
bz-nas:/data/BackupDesktop			/mnt/BackupDesktop	nfs	noauto,defaults		0	0

The noauto will ensure that the drive does not go live” when the desktop boots. You can use systemctl to check its presence:

fred@Desktop:/opt/Backup$ sudo systemctl status mnt-BackupDesktop.mount
○ mnt-BackupDesktop.mount – /mnt/BackupDesktop
Loaded: loaded (/etc/fstab; generated)
Active: inactive (dead)
Where: /mnt/BackupDesktop
What: bz-nas:/data/BackupDesktop
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
fred@Desktop:/opt/Backup$

This is important as the script uses a systemctl command to start the mount point just before the backup is taken.

Installation

First pick the place where you wish to keep the scripts, config settings and log files. In my system this is /opt/Backup

create a tmp directory in the chosen area.

Install the scripts and config files in the chosen area. These should consist of:

dupbackup.sh	the actual script
runbackup	the script triggered by Cron that runs dupbackup.sh
exclusions	a config file listing all the files and directories you DON’T want backed up.
passwd	a file containing the NFS password to the NAS drive
README	info about the system

Edit the exclusions file to fit your needs. This file lists all the files and directories that you do NOT want backed up.

Ensure the passwd file has permissions of 400:root:root and edit it appropriately. You may need to temporarily change the permissions. It contains the password duplicity will use to access the NAS drive

Issues

My Desktop gets turned on and off during the day and often goes days without being turned on at all. So I cannot use cron (designed for always-on servers) to trigger the script as there is no guarantee the Desktop will be on at the scheduled time. Anacron was designed for this situation, but has the problem that any script placed in the “daily” config triggers just past midnight each day.

I am a night-owl. I often work past midnight. Which means the Backup script will trigger as midnight passes. But a full backup can take many hours to complete, which means if I turn off the desktop the backup is interrupted. The runbackup script uses the UBUNTU built in “systemd-inhibit” command that should prevent system shutdown while the backup is running, or at least give a warning that shutdown will corrupt it.

However, if a backup IS interrupted it will automatically resolve itself the next day. On the next run dupbackup.sh will detect the presence of a lock file and, knowing the backup was interrupted, will run duplicity in backup mode (rather than cleanup mode) to recover the situation. After an interrupted run duplicity will detect the partial backup and complete it automatically.