Mmmm.... geeky

Subject:

Here's a sure way to alienate my loyal readership - all three of you. But I'm doing my part to give back to the geek collective. Here's how I figured out snapshot-style backups on OS X. Tune in next week for my usual postings covering topics like bitching about the elderly, documenting my general incompetence, etc.



Basically, I've created a script that creates snapshots of volumes on Mac OS X. The idea is adapted from Mike Rubel's excellent writeup. Being a lazy ass, I'm not going to be nearly as thorough as he was. If you're really interested in how all this stuff works, I suggest a read through his page. In a nutshell, it lets you create what looks like multiple copies of your data at different points in time, without actually having to have multiple copies on your disk. This is done by creating hard links from each snapshot to the files. Only when files are added or modified will you increase the amount of data you're storing. (you'll need at least an equal amount of space as your live data occupies, the extra depends on the amount of churn on your volumes).



There are two problems with running Mr. Rubel's setup on OS X. First, unix tools like cp and rsync don't handle Mac resource forks very well. Second, the version of cp supplied with OS X doesn't support hard links, which are necessary for this whole thing to work.



Rather than cp, I've used cpio. In place of the standard rsync I'm using psync. Panther users make note that you'll need to patch the psync source before compiling to get it to run. Psync is limited in that it can't copy across networks. This wasn't a problem for me as we're syncing data to another drive on the same host, but if you need the ability to sync to another host, you might want to try rsyncx, which is an HFS compatible version of rsync. I've not tried this, but it should be an easy substitution.



What the script does is



1) Rotate hourly, daily or weekly snapshots, deleting the oldest, and shifting each set back one position.
mv "/path/to/backup/hourly.1" "/path/to/backup/hourly.2"


2) Make a hard link copy of the last snapshot (which is a mirror of the live data)
cd "/path/to/backup/hourly.0" && find . -print | cpio -dplm "/path/to/backup/hourly.1"


3) Use psync to sync the live data with the first snapshot. The -d option tells psync to delete files from the snapshot that are no longer on the live volume. Since we're using hard links in step 2, these files aren't actually deleted - until you delete every link back to it (i.e., every snapshot).
psync -d "/path/to/live/data" "/path/to/backup/hourly.0"


The same concept applies for daily and weekly snapshots, except rather than using psync, we're just making a hard-link copy of the day's or week's last snapshot.
cd "/path/to/backup/hourly.1" && find . -print | cpio -dplm "/path/to/backup/daily.1"


Here's the finished script. You'll need to change the LIVEBASE and BACKBASE paths to match the paths to your live data and backup directory, respectively.



Run the script like so:
snapshot.sh [directory] [hourly|daily|weekly]

And the script itself. I'm not responsible if this deletes all your data, kicks your dog, burns your hair off, and/or gets your ass fired! Feel free to use it, modify it, tattoo it on your butt, but don't come crying to me when it's busted. :-)

#!/bin/csh
 
#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#
#
# snapshot.sh - a script to create and rotate snapshots
# 		on an OS X system, using cpio and psync
#
# Chris Yates
# Lowcountry Newspapers
# cyates[pleasedontspamme]lowcountrynewspapers.com
#
# 2/12/2004
#
# CONFIGURATION
# Change the variables below to match the base path for
# your live data and for your backup data
#
setenv LIVEBASE /Volumes
#
setenv BACKBASE /backup
#
# Uncomment the variable below to debug
#set echo
#
#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#+#
 
set VOL="$1"
 
setenv TYPE $2
 
if ($TYPE == "hourly") then
# Perform hourly snapshot
 
	# Create a base directory on first run
	if (! (-e "$BACKBASE/$VOL") ) then
		mkdir "$BACKBASE/$VOL"
	endif
 
	# Delete oldest hourly snapshot
	if ( -e "$BACKBASE/$VOL/hourly.3" ) then
		rm -rf "$BACKBASE/$VOL/hourly.3"
	endif
 
	# Shift two middle snapshots
	mv "$BACKBASE/$VOL/hourly.2" "$BACKBASE/$VOL/hourly.3"
 
	mv "$BACKBASE/$VOL/hourly.1" "$BACKBASE/$VOL/hourly.2"
 
	# Make a hard-link only copy of the latest snapshot
	if ( -e "$BACKBASE/$VOL/hourly.0" ) then
		cd "$BACKBASE/$VOL/hourly.0" && find . -print | cpio -dplm "$BACKBASE/$VOL/hourly.1"
	else
		mkdir "$BACKBASE/$VOL/hourly.0"
	endif
 
	# Sync the live data into the latest snapshot folder
	psync -d "$LIVEBASE/$VOL" "$BACKBASE/$VOL/hourly.0"
 
else if ($TYPE == "daily") then
# Perform daily snapshot
 
	# Delete oldest daily snapshot
	if ( -e "$BACKBASE/$VOL/daily.3" ) then
		rm -rf "$BACKBASE/$VOL/daily.3"
	endif
 
	# Shift middle snapshots
	mv "$BACKBASE/$VOL/daily.2" "$BACKBASE/$VOL/daily.3"
 
	mv "$BACKBASE/$VOL/daily.1" "$BACKBASE/$VOL/daily.2"
 
	# Create a hard-link copy of last hourly snapshot into daily
	if ( -e "$BACKBASE/$VOL/hourly.1" ) then
		cd "$BACKBASE/$VOL/hourly.1" && find . -print | cpio -dplm "$BACKBASE/$VOL/daily.1"
	else
		mkdir "$BACKBASE/$VOL/daily.1"
	endif
 
else
# Perform weekly snapshot
 
	# Delete oldest weekly snapshot
	if ( -e "$BACKBASE/$VOL/weekly.3" ) then
		rm -rf "$BACKBASE/$VOL/weekly.3"
	endif
 
	# Shift middle snapshots
	mv "$BACKBASE/$VOL/weekly.2" "$BACKBASE/$VOL/weekly.3"
 
	mv "$BACKBASE/$VOL/weekly.1" "$BACKBASE/$VOL/weekly.2"
 
	# Create a hard-link copy of last daily snapshot into weekly
	if ( -e "$BACKBASE/$VOL/daily.1" ) then
		cd "$BACKBASE/$VOL/daily.1" && find . -print | cpio -dplm "$BACKBASE/$VOL/weekly.1"
	else
		mkdir "$BACKBASE/$VOL/weekly.1"
	endif
 
 
endif



The automatic part comes from running this script with cron. Here's a copy of the crontab to run daily and hourly snapshots.



### Run hourly snapshots
45 0,1,10-23 * * * /usr/local/sbin/snapshot.sh 'HH Newsroom' hourly
### Run daily snapshots
30 2 * * * /usr/local/sbin/snapshot.sh 'HH Newsroom' daily




Note that the volume name is in single quotes - you'll need to quote directory names with spaces in them.