Backing up jenkins seems like it should be a solved problem and yet.

The Jenkins User Handbook -> System Administration -> Backing-up/Restoring Jenkins has a nice overview of everything you need to back up jenkins.

I would have assumed I could just find a script or install a plugin, but none of them did the critical thing of testing the backup, and a lot of them just included the master.key which the jenkins handbook specifically says don't keep in the backups.

So I had to write my own.

Current issues:

  • Testing the backup consists of starting it and then checking for a 200 from the login page, so not very thorough
  • Running the jenkins under test on the host controller causes issues
    • The controller reaches out to talk to some build nodes now, but I don't like it doing that when under test
    • I can change the http port, but not the build agent connection port or the SSH CLI port, so those just throw errors
    • Running in docker could solve a lot of this, but the controller doesn't have docker and I don't want to copy the master.key off the controller
  • It would be cool to watch the boot logs to figure out when it's ready instead of just hardcoding a time but that's hard.
  • Lots of hardcoding everywhere

It boils down to backup.sh:

tar --force-local -czpf "$tarball" \
    -C "$(dirname "$jenkinsHome")" "$(basename "$jenkinsHome")" \
        --exclude="jobs" \
        --exclude=".*" \
        --exclude="*cache" \
        --exclude="workspace" \
        --exclude="secrets/master.key" \
    -C "$(dirname "$jenkinsWar")" "$(basename "$jenkinsWar")" \
    -C "$DIR" README.md

and test.sh

tar xf "$jenkinsTarball"
cp "$jenkinsMasterKey" jenkins/secrets/master.key

# run jenkins in background
JENKINS_HOME="$(pwd)/jenkins" timeout "$((jenkinsStartupDelay+5))" java -jar "$(pwd)/jenkins.war" --httpPort=9999 &

# TODO: instead of sleep watch output and look for "Jenkins is fully up and running"
sleep $jenkinsStartupDelay

if [[ "$(curl -s -w "%{http_code}" http://localhost:9999/login -o /dev/null)" != "200" ]]; then
    echo "jenkins didn't come online"
    exit 1
fi

jenkinsStartupDelay is just a time that jenkins will probably start up in, so we run jenkins with timeout of jenkinsStartupDelay + 5 in the background, sleep for jenkinsStartupDelay seconds, then check if it's online

This is all called from a jenkins pipeline to run the backup and test on the controller, stash the tarball, unstash it on a build node with a backup hard drive, and then copy it to the backup dir and remove old backups.


Unrelated fun fact: If you have colon in a filename in tar, tar will think it's some kind of remote thing without --force-local

       --force-local
              Archive file is local even if it has a colon.