Mar 4, 2011
Backup theory for startups

You run a startup. Your company has been given data by users, data that it would be embarrassing to lose. You make backups. You're aware of the best-practices:

  • Make backups.
  • Make backups automatic. Or they won't happen.
  • Make backups yourself. Outsourcing them is for suckers.
  • Regularly restore from your backups. A backup doesn't exist if it's never read.


    This paranoia is useful, but it's intended for personal data. For business data it's incomplete. Businesses have automation. Lots of automation. Dumb automation that can do stupid things, like corrupting data. Businesses also develop nooks and crannies where important data may go unread for periods of time. The combination of automation and dusty corners can cause insidious data loss: some obscure-yet-critical corner of your data gets deleted or corrupted, and you don't notice until the corruption has infected the backup copy.

    Stale is good

    You can guard against catastrophic data loss with just a regular backup at a remote location. Insidious loss is harder to guard against. Insidious data loss is the reason companies have more than one backup, the reason journalled file systems have multiple .snapshot directories, the reason slicehost and linode provide a daily and a weekly backup. Daily/weekly is perhaps the simplest backup cascade. It gives you a week to detect corrupted data in your server. As operations get complex you'll want longer cascades with more levels. The lower levels backup frequently to capture recent changes, and higher levels backup less frequently as a cushion to detect data corruption.

    More best-practices

  • Make cascading backups.
  • Make sure you can detect corruption to anything important before it reaches the highest level of your cascade. The 'height' of your cascade defines how far back in time you can undo damage.

    Failure modes

    Whatever your backup strategy, ask yourself what it can't handle. I can think of two scenarios cascades can't handle: extremely insidious corruption that doesn't get detected in time, and short-lived data. If many records in your database get deleted everyday, most of them may never make it into a weekly backup.

    What else?

  • Making the big picture easy to see, in software and in society at large.
    Prose (shorter; favorites)