Mar 4, 2011
Backup theory for startups
You run a startup. Your company has been given data by users, data that it
would be embarrassing to lose. You make backups. You're aware of the
Make backups automatic.
Or they won't happen.
Make backups yourself.
Outsourcing them is for suckers.
from your backups. A backup doesn't exist if it's never read.
This paranoia is useful, but it's intended for personal data. For business
data it's incomplete. Businesses have automation. Lots of automation. Dumb
automation that can do stupid things, like corrupting data. Businesses also
develop nooks and crannies where important data may go unread for periods of
time. The combination of automation and dusty corners can cause
insidious data loss: some obscure-yet-critical corner of your data
gets deleted or corrupted, and you don't notice until the corruption has
infected the backup copy.
Stale is good
You can guard against catastrophic data loss with just a regular backup at a
remote location. Insidious loss is harder to guard against. Insidious data
loss is the reason companies have more than one backup, the reason journalled
file systems have multiple .snapshot directories, the reason
slicehost and linode provide a daily and a weekly backup. Daily/weekly is
perhaps the simplest backup cascade. It gives you a week to detect
corrupted data in your server. As operations get complex you'll want longer
cascades with more levels. The lower levels backup frequently to capture
recent changes, and higher levels backup less frequently as a cushion to
detect data corruption.
Make cascading backups.
Make sure you can detect corruption to anything important before it
reaches the highest level of your cascade. The 'height' of your cascade
defines how far back in time you can undo damage.
Whatever your backup strategy, ask yourself what it can't handle. I can think
of two scenarios cascades can't handle: extremely insidious corruption that
doesn't get detected in time, and short-lived data. If many records in your
database get deleted everyday, most of them may never make it into a weekly
Dec 19, 2010
What do Moody's and the patent office have in common?
They're in the business of fielding complex applications from just about
anybody. They can't just pick the best ones—they
must accept or deny every last application.
This is a really bad situation to be in. To understand why, put yourself in
the shoes of an employee at the patent office
who must judge these applications.
Each application you see has its own deep complexities and needs time and
expertise to understand, perhaps even expertise you don't have and experts
you must track down. But when you're done understanding one application you
haven't learned anything that helps you process the next one faster. Your
supervisor is shielded from these complexities and inevitably judges you on
the basis of one metric: number of applications processed.
Inevitably, employees in this situation employ a heuristic: they try to race
through the simple applications, and half-ass the complex ones. There's now
an incentive for poor applicants to craft complex applications; if your idea
stinks a simple application is certain to be denied, while a complex,
weighty-looking application has some chance of randomly being granted.
Meanwhile, genuinely complex applications now face more randomness and may be
Over time applicants complain, usually the deserving but complex ones.
Everytime you the employee catch heat for such a complaint you loosen your
constraints. Complex applications get handled more and more
perfunctorily. Instead you spend more
and more time with the simple applications, probing them intensely for
them in more and more adversarial fashion, looking for the vaguest of
undeniable reasons to deny applications. After all, the acceptance rate is
going up in that complex pile. The denials have to come from somewhere, lest
you stand out in somebody's metrics.
Predator vs Prey
Over time applicants start to notice that even genuine applications have more
chance of being granted if they are large and complex and seem
weighty. As more and more applications grow complex, standards change.
What was complex 10 years ago is now considered simple, and more likely to be
denied. Nothing's being read in any sort of detail anymore. Every applicant
wants to end up in the complex half of the pile. They aren't antagonistic to
the application process anymore, but to each other. You the employee are now
in a position of power, like a lion in the savannah, culling the herds of
their weakest, least weighty-seeming applications. Your prey isn't trying to
convince you of anything anymore, just to outgun some other application.
It starts being taken for granted that ‘you have to spend a month on
the application,’ no matter how clear and deserving it is at the
outset. “Standards” go up. You have to pay a lawyer to craft it.
The number of lawyers, the expense of the lawyers, the armies of paralegals,
everything's spiralling up, until not everybody can afford a patent anymore.
Like the Red
Queen, everybody is running as hard and as fast as they can just to stay
in the same place.
Well, not everybody has managed to stay on the treadmill. The one thing
that's been lost is the line in the sand showing a good patent application.
This dynamic has played out throughout history. It is how bureaucracies are
created. To some extent it is inevitable; humans don't yet know a better way
to deal with complexity. If you're planning an application process, be aware
of these pitfalls. Consider ranking rather than judging, so that you don't
have to grow your employees with the number of incoming applications.
Consider some sort of limit on application length and complexity so that you
don't have to grow your employees faster than the number of incoming
applications. Think hard about how your employees will judge applications.
Consider bounties for finding misjudged applications. Any of these
‘outs’ will help you avoid the worst-case scenario: high costs,
which grow faster than incoming applications.
0. Thanks to Walter Chang for the conversation
that got me to write this.
1. Two other examples: financial
regulators and the US
EB1 green card process.
2. Sure, processes will grow up around them, but
less and less of the process will be devoted to actually reading
Dec 17, 2010
As a teenager it drove me crazy that my father would never apologize to me.
Ever. Even when something was obviously his fault. I swore to myself that I
would be more intellectually honesti,
that I would admit when I was wrong. That emphasis on intellectual honesty
gave me a scientific bent and took me to engineering college, and to grad
school. For 12 years I unquestioningly assumed the virtue — and
importance — of intellectual honesty. Coincidentally, I also spent most
of those years working alone.
Now that I've worked in teams for a while I'm starting to change my mind. In
many social situations being apologetic sucks. It makes others around you feel
awkward. If you're leading a team it makes you seem weak. If you're the rookie
you sound like you're making excuses. If others aren't intimately familiar
with the details it can magnify your screwups and make things seem worse than
they are. And always it's a distraction, diluting your focus and that of your
team. I'm learning to not apologize until it's clearly expected. Better to err
on that side.
All this may seem crazy obvious to you. Apologizing isn't really part of
western culture. I can remember others telling me dozens of times, "don't
apologize." Somehow it never sunk in. Perhaps it's not even an eastern thing,
just a personal fetish.
Looking back, I have a different perspective on my father. I realize he was
an army officer who spent much of his day telling subordinates what to do.
You can't be apologizing in that situation. You just don't think about whose
fault it was, because the entire focus is on adjusting to a
constantly-changing situation, and on what needs to be done next. I want that
Oct 24, 2010
The anti-roles of Romanticism
The Age of
Enlightenment in the 18th century weakened the grasp of the church over
society, and tried to replace authorities like God, church and king with
critical thought. Enlightenment led to the Age of Revolution,
primarily the American and French. But the French revolution seemed to expose
the failure of the Enlightenment's worldview, one that could cause both
utopian liberation and tyrannous oppression.
It felt like a new Fall of
Man. The world lost its value; life lost its meaning; the individual no
longer had grounds to reason about right and wrong. Those who articulated
this dissatisfaction were the early Romantics,
and they ushered in a new artistic, literary, and intellectual movement. In
the process they created several iconic anti-roles that we still recognize in
The Byronic hero
The Byronic hero appears as the wanderer, the outcast, the Wandering Jew, the
whose crime is never explained. The tremendous appeal of Byron's poems
throughout Europe and America shows how widespread was the feeling of
The Visionary was the first stage of recovery and the first positive Romantic
anti-role. The word often used at the time was “mystic.” The
Visionary tries to observe the world so intensely as to get to the essence
outside of all mental categories. It was felt to be the special task and
privilege of the artist
and poet to communicate
The Bohemian is perhaps the most
modern of the anti-roles, characterized by a fascination with alcohol and
drugs and sexual experimentation as ways to shift and change consciousness,
put the mind through permutations of perceptions which are impossible for the
square who is boxed in by his social role. Similar is the interest in
non-bourgeouis modes of living, in indifference to middle-class standards of
dress, furnishing, and cleanliness.
The Visionary avoided role-playing; the Bohemian defied it; the Virtuoso and
Dandy transcended it, the one by fantastic mastery, the other by irony.
Paganini was the first
great Virtuoso and for decades the anti-role model and ideal. Other examples
Burton the Virtuoso traveler and translator of the Arabian Nights, and
the Virtuoso mountain climber who performs sublime feats of superhuman effort
“because it was
The Dandy transforms the
role not by excess but by irony. The role of the highest status in European
society is that of the aristocratic gentleman of leisure. By willfully playing
this role better than those born and trained to it, the Dandy reveals the
pointlessness of the socially adapted. The social type with the highest status
spends his life in play and pettiness. The Dandy instead offers perfection and
elegance without content, without social function. By stealing the clothes of
society, he reveals its nakedness. He demands a greater exquisiteness and
perfection than society can achieve. This explains the irritation of society
with the Dandy, its efforts to deprive him of his ironic authority, the moral
nastiness with which England relished the downfall of Oscar Wilde.
Sep 24, 2010
How I query Apache logs from the commandline
When I build services or write online I want to get feedback. Did anybody use
them? Read my latest post? When my site has just a little traffic Google
Reader summarizes too much. I want to be able to get my hands dirty with the
data, to be able to drill down to individual user sessions to see how people
interact with my site. How many real users did I have yesterday? Did somebody
link to my latest blog post? How many people clicked on that link on Hacker News? Did any of them stick
around and browse to other pages?
After several attempts at hard-coded scripts to answer such questions, I came
up with a little collection of scripts that can be composed using pipes.
Here's an example session on my commandline:
How many uniques did I get yesterday?
$ cat_logs access.log | dump_field IP | sort | uniq | wc -l
Focus on real human beings.
$ cat_logs access.log | skip_crawlers | dump_field IP | sort | uniq
Wow, 15 IPs? Did they stay long?
$ cat_logs access.log | skip_crawlers | dump_field IP | sort | freq
Hmm, so 4 users browsed several pages. Where are they coming from?
$ cat_logs access.log | skip_crawlers | dump_field REFERER
Ah, they're all coming from http://news.ycombinator.com/item?id=1702108.
So people actually clicked on that comment of mine, even though there were no
votes or responses. Interesting..
This one guy viewed 10 pages. What did he see?
$ cat_logs access.log | filter_field IP xxx.xx.xxx.xxx
So he visited twice yesterday, once in the morning and once late at night. And
clicked through to different sites each time.
You get the idea. It's just a bunch of shell scripts that read and write YAML.
And I can string them together using shell pipes rather than writing one-off
scripts like I used to.
Once I put these scripts together I found most queries had a similar format:
parsing apache logfiles using cat_logs, filtering bot user-agents
through skip_crawlers, some stages of filtering by or grouping by
certain fields, leaving YAML using dump_field, followed by non-YAML
summarization — de-duping (uniq), counting (wc -l), or
Try it out and tell me what you think: Yam.
Aug 13, 2010
Advertising vs Spam
You just built something and are trying to get the word out. What are the
ethics of telling a bunch of strangers about it? Is all unsolicited
communication spam? If I send a message to three people, is that bulk? What
if I send a million mails, each email by itself? What if the wording of the
messages is different? How different does it need to be?
Calling @addressed tweets and facebook events “spam” is
increasingly meaningless; let's reserve the word for truly egregious
messages. Instead, if you're considering telling acquaintances or strangers
about something new, this formula may be useful:
Likelihood the receiver will find it undesirable * Volume of messages
The first component measures harm to the receiver and the second measures
harm to the service provider. Let's try out a sanity check: A random email
from your spam folder. It is undesirable to you, and the sender clearly knows
it. They've sent out another batch a few hours ago with negligible
click-through rate, they've been sending these messages for months, maybe
even decades. They're forging headers and adding nonsense words to try to
evade spam filters. And it's going out to a few million people and
significantly adding to internet traffic. Verdict: definitely spam.
If you're considering telling acquaintances or strangers something, this
formula translates to advice:
- Is there almost no chance they'll like it? Stop.
- Is it very uncertain they'll like it? Tell just a few people. If it does
well you can increase volume in a subsequent campaign.
- Are you seeing signs that they didn't like it? 1 in 100 people complained
about it? Only 1 in 300 responded? Stop transmitting.
If you genuinely think some of your recipients may find it useful, if you're
not trying to tell too many people all at once, and if you're prepared to
stop and take stock of how the first batch did — go for it.
There are no numbers in this reasoning, but I don't consider that wiggle
room. It's hard to translate the sender's opinion into a number. If you are
wrong in your opinion you'll find out from the recipients of your initial
small batch. If they tell you and you don't heed them, you're spam. If you
try to sidestep their comments in superficial ways, you're spam.
The volume measure is even more relative. What would be considered spam 10
years ago wouldn't today. Especially if you stop transmitting after one
One corollary of this analysis: if it's unsolicited and in bulk, and you're
sending it anonymously, it is spam. Period. Advertising requires
(Triggered by this discussion
on HN. Thanks to Jonathan Tang,
Ranganathan Sankaralingam, Ke Chen, Srikanth Agaram, and Jonathan Nelson for reading drafts of this.)
Apr 4, 2010
Perspectives on happiness
Happiness is the absence of the striving for happiness.
Ferriss: The opposite of happiness is boredom.
Yudkowsky: When people complain about the empty meaningless void, it is
because they have at least one problem that they aren't thinking about
solving — perhaps because they never identified it.
Alex Krupp: Given
perfect freedom people have a tendency to do just enough to make themselves
minimally happy, even if greater happiness is ultimately attainable.
me: There is no 'minimally happy'. Different things either make you
happy or they don't. However, happiness from a source can last a long or
short duration, ebb faster or slower.
Paul Graham: Unproductive
pleasures pall eventually.
Credit: HN thread on existential
Apr 2, 2010
Uptake happens fast — or not at all
Most of today’s massive consumer web properties experienced exponential
growth fairly shortly after launch. A few thousand users over a few months is
probably sufficient to find out it you have hit a nerve.
Whatever the reasoning, more and more developers, projects and firms were
transitioning away from centralized to decentralized. And happier for it. The
trendline was clear, which is why we weren’t exactly going out on a limb
predicting the ascension of Git, Mercurial and their brethren.
Feb 20, 2010
Thinking critically about the ideal of a techno-utopia
Technology can compromise resolve. East Germans who watched West
German television were paradoxically more satisfied with life in their
country. The fact that Dresden—where the 1989 protests started—lies too
far and too low to have received Western broadcasts may partly explain the
rebellious spirit of the city's inhabitants. While we fret about the
Internet's contribution to degrading the civic engagement of American kids,
all teenagers in China or Iran are presumed to be committed citizens who use
the Web to acquaint themselves with human rights violations committed by
their governments. For the vast majority of Internet users, increased access
to information is not always liberating. With their endless supply of
entertainment, Twitter and Facebook might make political mobilization harder,
Technology empowers all sides equally. We cling to the view that all
non-state power in authoritarian countries is good, while state power is evil
and always leads to suppression. Not all opponents of the Russian or Chinese
or even Egyptian state fit the neoliberal pattern. Nationalism, extremism and
religious fanaticism abound. Facebook and Twitter empower all groups—not
just the pro-Western groups that we like.
Technology drives decentralization;
demonstration requires centralization. Thanks to the decentralization
afforded by the Internet, Iran's Green Movement couldn't collect itself on
the eve of the 31st anniversary of the Islamic revolution. It simply drowned
in its own tweets.
Technology increases noise and misinformation. We assume the Internet
makes it easy for citizens to see who else is opposing a regime and then act
collectively based on that shared knowledge. In the age of the Spinternet,
cheap online propaganda can easily be bought with the help of pro-government
bloggers. Add to that the growing surveillance capacity of modern
authoritarian states—greatly boosted by information collected through
Technology shines a harsh light. Diplomacy is, perhaps, one element of
the U.S. government that should not be subject to the demands of "open
government"; whenever it works, it is usually because it is done behind
—Paraphrasing Evgeny Morozov
Jan 23, 2010
Tyranny of the majority, or regulatory capture? Just be more agile.
Tocqueville, et al.: In a democracy, the greatest concern is that the
majority will tyrannize and exploit diverse smaller interests.
Narrow and well-organized
minorities are more likely to assert their interests over those of the
Neil Freeman: Just redistrict
the states after each census.
me: Can this idea be generalized? Minorities can be oppressed or
powerful; strive to so intertwine motivations that minorities are eliminated.
Track minority power and standard deviation of group size as a
quality metric for democracy.
But maintain diversity.
And don't allow collusion
to foster bubbles.
 Credit: James Fallows.
Related comments: Hacker News