Feb 25, 2018
Nobody's just reading your code

A guest post by Stephen Malina, my partner in crime on Mu.

Most programmers agree that we don't read enough code. The interviews in Peter Seibel's book, “Coders at work” highlight a comical contradiction: almost all the programmers interviewed by Seibel recommend that others read code for fun, but none of them routinely do so themselves. Seibel even asked Hal Abelson (of SICP fame) directly about this phenomenon:

I want to dig a little deeper on this. You, like many other people, say programmers should read code. Yet when I ask what code have you read for fun or edification, you—also like many other people—answer that you read students’ code, which is your job, and review code at Google, which is also your job. But it doesn’t sound like you sit down of an evening with a nice printout and read it.

Seibel, James Hague and others have all tried to justify why code reading is so uncommon, and they make good points. But perhaps the conversation is led astray by use of the word ‘read’. I wonder if Abelson and the others would have had more examples if Seibel had asked them what code they had learned about for fun. Perhaps the word ‘read’ put them in a passive frame of mind, causing them to filter out programs they'd hacked on?

We all read code already; it’s just that we usually read when we want to edit. And the comprehension that questions about reading are really concerned with—it comes from both reading and writing, interleaved in complex ways.

That hacking produces better comprehension than passive, linear reading fits with what we know about learning. Barbara Oakley, Herbert Simon, Cal Newport, and Anders Ericsson all describe how solid understanding emerges from active exploration, critical examination, repetition, and synthesis. Hacking beats passive reading on three out of four of these criteria:

  1. Active exploration: When you hack, you want to eventually produce a change in the codebase. This desire guides your path through the code. When you read passively you let the code’s linear flow guide you.

  2. Critical examination: When you hack, you evaluate existing code in light of the change you want to make. Deciding what to use and remove keeps you from accepting the existing system as canon. When you read linearly, you lack a goal against which you can critically examine the existing code.

  3. Synthesis: To change the program as you desire, you synthesize existing code with new code.

  4. Repetition: Neither hacking nor linear reading involve useful repetition, unless you treat your change to make like a kata and mindfully re-implement it multiple times.

Learning through hacking also leverages the natural structure of a codebase. Good books guide their readers through series of questions and their answers, but codebases are inherently non-linear, like a map. You can ask an infinite number of questions of a map. How far is it from A to B? Which is the nearest town to C? But you can’t expect a map to tell you what questions to ask, and it makes no sense to read a map linearly from top to bottom, left to right.

Reframing reading as ‘navigation’ suggests that our conventional discussions of clean code and interfaces ignore the things that actually make unfamiliar code accessible to outsiders. Clean, solidified abstractions are like well-marked, easy-to-follow paths through a forest — very useful if they lead in the direction we need to go, but less useful when we want to blaze arbitrary new paths through the forest.

Instead, let's focus on guiding exploration, making it easier for readers to answer their own questions about codebases. I’m still figuring out how to do this; so far I have just a couple of preliminary ideas:

  • Suggest features in your code that make good exercises for re-implementation. Provide an initial Git commit without the feature, give them hints where necessary, and link them to the actual change plus others’ attempts at producing it.

  • Rather than conceiving of documentation as something that explains individual modules, focus on overviews of how the modules fit together (like Fabien Sanglard's for Git).


Others have explored similar ideas from different perspectives:

RSS (?)
twtxt (?)
Station (?)