Ransomware-Resistant Backups with SQLite and GitHub Actions
The current version of my website is mostly static HTML, but it does include some custom server-side functionality for the newsletter, and I also use the server to dogfood various side projects that I'm working on. All of this runs on a single server, so SQLite was a natural choice for the database. Here's how I back up that database, verify its integrity, and reduce the potential risk of credential exposure.
I chose to make daily off-site copies of the database as my backup strategy. I also considered using Litestream, but determined that periodic full copies made more sense based on the size of the database and my desire to keep the server as lean as realistically possible. The entire backup and verification process is handled within a scheduled GitHub Action, which starts off like so:
.github/workflows/backup.yml
name: Backup
on:
schedule:
- cron: '0 2 * * *' # 2:00 AM
env:
AWS_REGION: ${{ vars.AWS_REGION }}
AWS_S3_BUCKET: ${{ vars.AWS_S3_BUCKET }}
jobs:
backup:
name: Backup SQLite
runs-on: ubuntu-latest
environment: prod
steps:
#...
So far this is a pretty basic GitHub Actions YAML file. The most important thing to note is the top-level `on` key dictates that the workflow is scheduled to run at 2:00 AM daily. The GitHub Actions documentation indicates that start times are not guaranteed, and indeed, according to my testing, workflows oftentimes start as much as an hour late.
Let's continue reading the workflow and take a look at the first step of the job.
.github/workflows/backup.yml
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4.2.1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ vars.AWS_REGION }}
Here we authenticate with AWS. Backups will be stored in S3. I created a new IAM user which is only used for managing backups and assigned them a custom permission policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-backups-s3-bucket",
"arn:aws:s3:::my-backups-s3-bucket/*"
]
}
]
}
Then I created a private S3 bucket with Object Lock enabled. Object Lock basically prevents files from being overwritten for a configured time duration. These minimal IAM permissions combined with Object Lock make the database backups immutable for a rolling time window. This means that even if an attacker hypothetically compromises my GitHub Actions secrets, they won't be able to delete or mutate the backups, which is required to pull off a ransomware attack.
Let's continue reading the backup workflow.
.github/workflows/backup.yml
- name: Download previous backup
run: |
export PREVIOUS_BACKUP_KEY=$(aws s3api list-objects-v2 --bucket "$AWS_S3_BUCKET" --query 'reverse(sort_by(Contents, &LastModified))[:1].Key' --output=text)
aws s3 cp "s3://$AWS_S3_BUCKET/$PREVIOUS_BACKUP_KEY" previous.sqlite.gz
gzip -d previous.sqlite.gz
This is where the real work starts. The above downloads the most recent backup from S3 and decompresses it. We'll use this later for verification.
.github/workflows/backup.yml
- name: Backup
env:
SSH_CONNECTION: ${{ vars.SSH_CONNECTION }}
SSH_PRIVATE_KEY: ${{ secrets.SSH_PRIVATE_KEY }}
run: |
echo "$SSH_PRIVATE_KEY" > id_ed25519_evanbyrne_ci
chmod 600 id_ed25519_evanbyrne_ci
export BACKUP_DATE=$(date +"%Y-%m-%d_%H%M%S")
export BACKUP_FILENAME="evanbyrne-backend_$BACKUP_DATE.sqlite.gz"
ssh -i id_ed25519_evanbyrne_ci -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null "$SSH_CONNECTION" "sqlite3 /var/lib/docker/volumes/evanbyrne-backend-db/_data/evanbyrne-backend.sqlite '.backup /root/backups/latest.sqlite'"
scp -i id_ed25519_evanbyrne_ci -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null "$SSH_CONNECTION:/root/backups/latest.sqlite" latest.sqlite
gzip -k latest.sqlite
aws s3 cp latest.sqlite.gz "s3://$AWS_S3_BUCKET/$BACKUP_FILENAME"
In the above step, the script backs up the database over SSH using SQLite's built-in utility (`sqlite3 <database> '.backup <path/to/backup>'`), compresses it, and uploads it to S3. The SSH key is stored as a GitHub Actions secret.
And that brings us to the last step in the workflow: verification. For this, SQL is used directly. SQLite has a neat feature where it allows you to connect to multiple databases at once, so the script attaches to both the latest backup and the previous backup, then compares them to ensure there has not been data loss or manipulation. If the query does not return a value of `"true"`, then the workflow errors out.
.github/workflows/backup.yml
- name: Verify
env:
VERIFICATION_SQL: ${{ vars.VERIFICATION_SQL }}
run: |
ls -Al
sqlite3 --version
sqlite3 previous.sqlite "$VERIFICATION_SQL" | grep -q "true"
The SQL query that verifies the integrity of my newsletter subscribers table looks something like this:
attach 'latest.sqlite' as latest;
select case
when (
exists (select * from main.subscriber as previous_subscriber
where not exists (select * from latest.subscriber as latest_subscriber
where previous_subscriber.email = latest_subscriber.email
-- more verification conditions
)
)
) then 'false' else 'true' end as passing;
Get the latest articles in your inbox: