Mar 14, 2012
The opposite of exponential backoff

Every few days I'm on IM with some coworker from a different building. The conversation goes something like this:

Them: Lunch?
(6 minutes)
Me: Sounds good! Meet downstairs?
(1 minute)
Me: Now?
(2 minutes)
Them: Yes!
(1 minute)
Them: You there?
(3 minutes)
Me: Oops, sorry. Heading down now.

Now I never know whether to wait for a response, or to run down because I'm already late. But today I finally figured out the answer:

Me: New rule. Head down when both of us say 'ready' within 1 minute of each
other.

That's it. No more ambiguity. It's kind of a silly example, but the general idea feels deep, complementary somehow to exponential backoff. Exponential backoff is the ideal game-theoretic strategy for competitive situations where two people need to contend for a common resource, like trying to call someone back after a dropped call and getting a busy signal. Both try to diverge away from a network-defined window of conflict by waiting longer and longer. Here both parties are trying to converge into an agreed window of agreement. Defining the size of the window by diktat should be handy anytime two parties (people, computers, ..) need to cooperate synchronously atop an asynchronous channel.

I'm sure there's a paper on this idea, perhaps something like that one by Lamport pdf on Buridan's ass. If you have a pointer I'd love to hear about it.

comments

* *
Jul 14, 2011
Evolution of a rails programmer

Idiomatic rails action for registering a user if he doesn't exist:

After a year of programming in lisp, I find it most natural to write:

Is this overly concise/obfuscated? I like it because it concisely expresses the error case as an early exit; most of the space is devoted to the successful save, which is straight-line code without distracting branches. It's clearer that we either pick an existing user or create a new one. Form follows function.

comments

* *
May 12, 2011
My latest project

Two of us have been building hackerstream, a real-time UI for Hacker News that a dozen addicts have been using since March. This won't matter to you if you don't frequent HN, or even if you swing by just once a day. But try it out if you go to HN every couple of hours like I do. You'll see comments on all the stories on the entire HN frontpage as they stream in. You can slice and dice the stream by story or by author. You can even set it up to highlight, say, all comments by security guy Thomas Ptacek and all comments about today's silicon valley brouhaha. If you use Twitter the UI will seem familiar.

Is such a firehose useful? Is it too much of a good thing? I find I see more of HN for the same time investment, and I waste less time scanning stories I've already read. What's more, when I started using it I found my comments getting more votes and more responses. It turns out the biggest factor affecting responses is not how good my comment is but just how early it shows up on a story. By biasing my reading to be more timely I was giving my few comments improved odds of a response.

It's not for everyone, but hackerstream is geared to help you have higher-quality conversation on Hacker News. Try it out and tell me what you think.

comments

* *
Mar 4, 2011
Backup theory for startups

You run a startup. Your company has been given data by users, data that it would be embarrassing to lose. You make backups. You're aware of the best-practices:

  • Make backups.
  • Make backups automatic. Or they won't happen.
  • Make backups yourself. Outsourcing them is for suckers.
  • Regularly restore from your backups. A backup doesn't exist if it's never read.

    Incomplete

    This paranoia is useful, but it's intended for personal data. For business data it's incomplete. Businesses have automation. Lots of automation. Dumb automation that can do stupid things, like corrupting data. Businesses also develop nooks and crannies where important data may go unread for periods of time. The combination of automation and dusty corners can cause insidious data loss: some obscure-yet-critical corner of your data gets deleted or corrupted, and you don't notice until the corruption has infected the backup copy.

    Stale is good

    You can guard against catastrophic data loss with just a regular backup at a remote location. Insidious loss is harder to guard against. Insidious data loss is the reason companies have more than one backup, the reason journalled file systems have multiple .snapshot directories, the reason slicehost and linode provide a daily and a weekly backup. Daily/weekly is perhaps the simplest backup cascade. It gives you a week to detect corrupted data in your server. As operations get complex you'll want longer cascades with more levels. The lower levels backup frequently to capture recent changes, and higher levels backup less frequently as a cushion to detect data corruption.

    More best-practices

  • Make cascading backups.
  • Make sure you can detect corruption to anything important before it reaches the highest level of your cascade. The 'height' of your cascade defines how far back in time you can undo damage.

    Failure modes

    Whatever your backup strategy, ask yourself what it can't handle. I can think of two scenarios cascades can't handle: extremely insidious corruption that doesn't get detected in time, and short-lived data. If many records in your database get deleted everyday, most of them may never make it into a weekly backup.

    What else?

    comments

    * *
  • Dec 19, 2010
    What do Moody's and the patent office have in common? [0]

    They're in the business of fielding complex applications from just about anybody. They can't just pick the best ones—they must accept or deny every last application.

    This is a really bad situation to be in. To understand why, put yourself in the shoes of an employee at the patent office[1] who must judge these applications.

    Half-assing

    Each application you see has its own deep complexities and needs time and expertise to understand, perhaps even expertise you don't have and experts you must track down. But when you're done understanding one application you haven't learned anything that helps you process the next one faster. Your supervisor is shielded from these complexities and inevitably judges you on the basis of one metric: number of applications processed.

    Inevitably, employees in this situation employ a heuristic: they try to race through the simple applications, and half-ass the complex ones. There's now an incentive for poor applicants to craft complex applications; if your idea stinks a simple application is certain to be denied, while a complex, weighty-looking application has some chance of randomly being granted. Meanwhile, genuinely complex applications now face more randomness and may be undeservedly denied.

    Bike-shedding

    Over time applicants complain, usually the deserving but complex ones. Everytime you the employee catch heat for such a complaint you loosen your constraints. Complex applications get handled more and more perfunctorily.[2] Instead you spend more and more time with the simple applications, probing them intensely for weaknesses, ‘bike-shedding’ them in more and more adversarial fashion, looking for the vaguest of undeniable reasons to deny applications. After all, the acceptance rate is going up in that complex pile. The denials have to come from somewhere, lest you stand out in somebody's metrics.

    Predator vs Prey

    Over time applicants start to notice that even genuine applications have more chance of being granted if they are large and complex and seem weighty. As more and more applications grow complex, standards change. What was complex 10 years ago is now considered simple, and more likely to be denied. Nothing's being read in any sort of detail anymore. Every applicant wants to end up in the complex half of the pile. They aren't antagonistic to the application process anymore, but to each other. You the employee are now in a position of power, like a lion in the savannah, culling the herds of their weakest, least weighty-seeming applications. Your prey isn't trying to convince you of anything anymore, just to outgun some other application.

    Red Queen

    It starts being taken for granted that ‘you have to spend a month on the application,’ no matter how clear and deserving it is at the outset. “Standards” go up. You have to pay a lawyer to craft it. The number of lawyers, the expense of the lawyers, the armies of paralegals, everything's spiralling up, until not everybody can afford a patent anymore. Like the Red Queen, everybody is running as hard and as fast as they can just to stay in the same place.

    Well, not everybody has managed to stay on the treadmill. The one thing that's been lost is the line in the sand showing a good patent application.

    Zoom out

    This dynamic has played out throughout history. It is how bureaucracies are created. To some extent it is inevitable; humans don't yet know a better way to deal with complexity. If you're planning an application process, be aware of these pitfalls. Consider ranking rather than judging, so that you don't have to grow your employees with the number of incoming applications. Consider some sort of limit on application length and complexity so that you don't have to grow your employees faster than the number of incoming applications. Think hard about how your employees will judge applications. Consider bounties for finding misjudged applications. Any of these ‘outs’ will help you avoid the worst-case scenario: high costs, which grow faster than incoming applications.

    footnotes

    0. Thanks to Walter Chang for the conversation that got me to write this.

    1. Two other examples: financial regulators and the US EB1 green card process.

    2. Sure, processes will grow up around them, but less and less of the process will be devoted to actually reading them.

    comments

    * *
    Dec 17, 2010
    Don't apologize

    As a teenager it drove me crazy that my father would never apologize to me. Ever. Even when something was obviously his fault. I swore to myself that I would be more intellectually honesti, that I would admit when I was wrong. That emphasis on intellectual honesty gave me a scientific bent and took me to engineering college, and to grad school. For 12 years I unquestioningly assumed the virtue — and importance — of intellectual honesty. Coincidentally, I also spent most of those years working alone.

    Now that I've worked in teams for a while I'm starting to change my mind. In many social situations being apologetic sucks. It makes others around you feel awkward. If you're leading a team it makes you seem weak. If you're the rookie you sound like you're making excuses. If others aren't intimately familiar with the details it can magnify your screwups and make things seem worse than they are. And always it's a distraction, diluting your focus and that of your team. I'm learning to not apologize until it's clearly expected. Better to err on that side.

    All this may seem crazy obvious to you. Apologizing isn't really part of western culture. I can remember others telling me dozens of times, "don't apologize." Somehow it never sunk in. Perhaps it's not even an eastern thing, just a personal fetish.

    Looking back, I have a different perspective on my father. I realize he was an army officer who spent much of his day telling subordinates what to do. You can't be apologizing in that situation. You just don't think about whose fault it was, because the entire focus is on adjusting to a constantly-changing situation, and on what needs to be done next. I want that mindset.

    comments

    * *
    Oct 24, 2010
    The anti-roles of Romanticism paraphrasing Morse Peckham

    The Age of Enlightenment in the 18th century weakened the grasp of the church over society, and tried to replace authorities like God, church and king with critical thought. Enlightenment led to the Age of Revolution, primarily the American and French. But the French revolution seemed to expose the failure of the Enlightenment's worldview, one that could cause both utopian liberation and tyrannous oppression. It felt like a new Fall of Man. The world lost its value; life lost its meaning; the individual no longer had grounds to reason about right and wrong. Those who articulated this dissatisfaction were the early Romantics, and they ushered in a new artistic, literary, and intellectual movement. In the process they created several iconic anti-roles that we still recognize in popular culture.

    The Byronic hero

    The Byronic hero appears as the wanderer, the outcast, the Wandering Jew, the mysterious criminal whose crime is never explained. The tremendous appeal of Byron's poems throughout Europe and America shows how widespread was the feeling of malaise.

    The Visionary


    Caspar Friedrich,
    “Wanderer above the Sea of Fog”
    .

    The Visionary was the first stage of recovery and the first positive Romantic anti-role. The word often used at the time was “mystic.” The Visionary tries to observe the world so intensely as to get to the essence outside of all mental categories. It was felt to be the special task and privilege of the artist and poet to communicate that experience.

    The Bohemian

    The Bohemian is perhaps the most modern of the anti-roles, characterized by a fascination with alcohol and drugs and sexual experimentation as ways to shift and change consciousness, put the mind through permutations of perceptions which are impossible for the square who is boxed in by his social role. Similar is the interest in non-bourgeouis modes of living, in indifference to middle-class standards of dress, furnishing, and cleanliness.

    The Virtuoso

    The Visionary avoided role-playing; the Bohemian defied it; the Virtuoso and Dandy transcended it, the one by fantastic mastery, the other by irony. Paganini was the first great Virtuoso and for decades the anti-role model and ideal. Other examples are Richard Burton the Virtuoso traveler and translator of the Arabian Nights, and the Virtuoso mountain climber who performs sublime feats of superhuman effort “because it was there.”

    The Dandy

    The Dandy transforms the role not by excess but by irony. The role of the highest status in European society is that of the aristocratic gentleman of leisure. By willfully playing this role better than those born and trained to it, the Dandy reveals the pointlessness of the socially adapted. The social type with the highest status spends his life in play and pettiness. The Dandy instead offers perfection and elegance without content, without social function. By stealing the clothes of society, he reveals its nakedness. He demands a greater exquisiteness and perfection than society can achieve. This explains the irritation of society with the Dandy, its efforts to deprive him of his ironic authority, the moral nastiness with which England relished the downfall of Oscar Wilde.

    comments

    * *
    Sep 24, 2010
    How I query Apache logs from the commandline

    When I build services or write online I want to get feedback. Did anybody use them? Read my latest post? When my site has just a little traffic Google Reader summarizes too much. I want to be able to get my hands dirty with the data, to be able to drill down to individual user sessions to see how people interact with my site. How many real users did I have yesterday? Did somebody link to my latest blog post? How many people clicked on that link on Hacker News? Did any of them stick around and browse to other pages?

    After several attempts at hard-coded scripts to answer such questions, I came up with a little collection of scripts that can be composed using pipes. Here's an example session on my commandline:

    How many uniques did I get yesterday?

    $ cat_logs access.log | dump_field IP | sort | uniq | wc -l
    

    Focus on real human beings.

    $ cat_logs access.log | skip_crawlers | dump_field IP | sort | uniq
    

    Wow, 15 IPs? Did they stay long?

    $ cat_logs access.log | skip_crawlers | dump_field IP | sort | freq
    

    Hmm, so 4 users browsed several pages. Where are they coming from?

    $ cat_logs access.log | skip_crawlers | dump_field REFERER
    

    Ah, they're all coming from http://news.ycombinator.com/item?id=1702108</a>. So people actually clicked on that comment of mine, even though there were no votes or responses. Interesting..

    This one guy viewed 10 pages. What did he see?

    $ cat_logs access.log | filter_field IP xxx.xx.xxx.xxx
    

    So he visited twice yesterday, once in the morning and once late at night. And clicked through to different sites each time.

    You get the idea. It's just a bunch of shell scripts that read and write YAML. And I can string them together using shell pipes rather than writing one-off scripts like I used to.

    Once I put these scripts together I found most queries had a similar format: parsing apache logfiles using cat_logs, filtering bot user-agents through skip_crawlers, some stages of filtering by or grouping by certain fields, leaving YAML using dump_field, followed by non-YAML summarization — de-duping (uniq), counting (wc -l), or frequency-distribution (freq).

    Try it out and tell me what you think: Yam.

    comments

    * *
    Aug 13, 2010
    Advertising vs Spam

    You just built something and are trying to get the word out. What are the ethics of telling a bunch of strangers about it? Is all unsolicited communication spam? If I send a message to three people, is that bulk? What if I send a million mails, each email by itself? What if the wording of the messages is different? How different does it need to be?

    Calling @addressed tweets and facebook events “spam” is increasingly meaningless; let's reserve the word for truly egregious messages. Instead, if you're considering telling acquaintances or strangers about something new, this formula may be useful:

    Likelihood the receiver will find it undesirable * Volume of messages

    The first component measures harm to the receiver and the second measures harm to the service provider. Let's try out a sanity check: A random email from your spam folder. It is undesirable to you, and the sender clearly knows it. They've sent out another batch a few hours ago with negligible click-through rate, they've been sending these messages for months, maybe even decades. They're forging headers and adding nonsense words to try to evade spam filters. And it's going out to a few million people and significantly adding to internet traffic. Verdict: definitely spam.

    Advice

    If you're considering telling acquaintances or strangers something, this formula translates to advice:

    • Is there almost no chance they'll like it? Stop.
    • Is it very uncertain they'll like it? Tell just a few people. If it does well you can increase volume in a subsequent campaign.
    • Are you seeing signs that they didn't like it? 1 in 100 people complained about it? Only 1 in 300 responded? Stop transmitting.

    If you genuinely think some of your recipients may find it useful, if you're not trying to tell too many people all at once, and if you're prepared to stop and take stock of how the first batch did — go for it.

    There are no numbers in this reasoning, but I don't consider that wiggle room. It's hard to translate the sender's opinion into a number. If you are wrong in your opinion you'll find out from the recipients of your initial small batch. If they tell you and you don't heed them, you're spam. If you try to sidestep their comments in superficial ways, you're spam.

    The volume measure is even more relative. What would be considered spam 10 years ago wouldn't today. Especially if you stop transmitting after one batch.

    One corollary of this analysis: if it's unsolicited and in bulk, and you're sending it anonymously, it is spam. Period. Advertising requires accountability.

    (Triggered by this discussion on HN. Thanks to Jonathan Tang, Ranganathan Sankaralingam, Ke Chen, Srikanth Agaram, and Jonathan Nelson for reading drafts of this.)

    comments

    * *
    Apr 4, 2010
    Perspectives on happiness

    Chuang-Tzu: Happiness is the absence of the striving for happiness.

    Tim Ferriss: The opposite of happiness is boredom.

    Eliezer Yudkowsky: When people complain about the empty meaningless void, it is because they have at least one problem that they aren't thinking about solving — perhaps because they never identified it.

    Alex Krupp: Given perfect freedom people have a tendency to do just enough to make themselves minimally happy, even if greater happiness is ultimately attainable.

    me: There is no 'minimally happy'. Different things either make you happy or they don't. However, happiness from a source can last a long or short duration, ebb faster or slower.

    Paul Graham: Unproductive pleasures pall eventually.

    Credit: HN thread on existential angst.

    comments

    * *
    mission
    Making the big picture easy to see, in software and in society at large.
    about me
    Code (contributions)
    Prose (shorter; favorites)
    favorite insights
    Life
    Making
    Work
    Cognition
    Economics
    History
    Startups
    Social Software

    © mmxiv ak