Backup theory for startups

about contact

Mar 4, 2011

Backup theory for startups

You run a startup. Your company has been given data by users, data that it would be embarrassing to lose. You make backups. You're aware of the best-practices:

Make backups.

Make backups automatic. Or they won't happen.

Make backups yourself. Outsourcing them is for suckers.

Regularly restore from your backups. A backup doesn't exist if it's never read.

Incomplete

This paranoia is useful, but it's intended for personal data. For business data it's incomplete. Businesses have automation. Lots of automation. Dumb automation that can do stupid things, like corrupting data. Businesses also develop nooks and crannies where important data may go unread for periods of time. The combination of automation and dusty corners can cause insidious data loss: some obscure-yet-critical corner of your data gets deleted or corrupted, and you don't notice until the corruption has infected the backup copy.

Stale is good

You can guard against catastrophic data loss with just a regular backup at a remote location. Insidious loss is harder to guard against. Insidious data loss is the reason companies have more than one backup, the reason journalled file systems have multiple .snapshot directories, the reason slicehost and linode provide a daily and a weekly backup. Daily/weekly is perhaps the simplest backup cascade. It gives you a week to detect corrupted data in your server. As operations get complex you'll want longer cascades with more levels. The lower levels backup frequently to capture recent changes, and higher levels backup less frequently as a cushion to detect data corruption.

More best-practices

Make cascading backups.

Make sure you can detect corruption to anything important before it reaches the highest level of your cascade. The 'height' of your cascade defines how far back in time you can undo damage.

Failure modes

Whatever your backup strategy, ask yourself what it can't handle. I can think of two scenarios cascades can't handle: extremely insidious corruption that doesn't get detected in time, and short-lived data. If many records in your database get deleted everyday, most of them may never make it into a weekly backup.

What else?

comments

Kartik Agaram, 2013-03-24: Case study: http://jefferai.org/2013/03/24/too-perfect-a-mirror

Comments gratefully appreciated. Please send them to me by any method of your choice and I'll include them here.