Software libraries suck

about contact

Nov 12, 2012

Software libraries suck

Here's why, in a sentence: they promise to be abstractions, but they end up becoming services. An abstraction frees you from thinking about its internals every time you use it. A service allows you to never learn its internals. A service is not an abstraction. It isn't 'abstracting' away the details. Somebody else is thinking about the details so you can remain ignorant.

Programmers manage abstraction boundaries, that's our stock in trade. Managing them requires bouncing around on both sides of them. If you restrict yourself to one side of an abstraction, you're limiting your growth as a programmer.¹ You're chopping off your strength and potential, one lock of hair at a time, and sacrificing it on the altar of convenience.

A library you're ignorant of is a risk you're exposed to, a now-quiet frontier that may suddenly face assault from some bug when you're on a deadline and can least afford the distraction. Better to take a day or week now, when things are quiet, to save that hour of life-shortening stress when it really matters.

You don't have to give up the libraries you currently rely on. You just have to take the effort to enumerate them, locate them on your system, install the sources if necessary, and take ownership the next time your program dies within them, or uncovers a bug in them. Are these activities more time-consuming than not doing them? Of course. Consider them a long-term investment.

Just enumerating all the libraries you rely on others to provide can be eye-opening. Tot up all the open bugs in their trackers and you have a sense of your exposure to risks outside your control. In fact, forget the whole system. Just start with your Gemfile or npm_modules. They're probably lowest-maturity and therefore highest risk.

Once you assess the amount of effort that should be going into each library you use, you may well wonder if all those libraries are worth the effort. And that's a useful insight as well. “Achievement unlocked: I've stopped adding dependencies willy-nilly.”

Update: Check out the sequel. Particularly if this post left you scratching your head about what I could possibly be going on about.

(This birth was midwifed by conversations with Ross Angle, Dan Grover, and Manuel Simoni.)

notes

1. If you don't identify as a programmer, if that isn't your core strength, if you just program now and then because it's expedient, then treating libraries as services may make more sense. If a major issue pops up you'll need to find more expert help, but you knew that already.

comments

David Barbour, 2012-11-13: Link to discussion on LtU

John Cowan, 2012-11-13: As has been said elsewhere, all libraries aren't alike. The OS is a library, but most people can and indeed must treat it as a service, not an abstraction; the same for glib and newlib and mscorlib.

Kartik Agaram, 2012-11-13: Those libraries can indeed be treated as a service far more than say ruby gems. But I don't understand why you say they *must* be treated as a service. More programmers knowing how their system works is always better, no? It's better for them because it empowers them, and it's better for us all because it distributes expertise more widely.

(I flinched when you called the OS a library. That's the one place where current language is actually closer to referring to it as a set of services, and I'm loath to give that up. Even if trying to get people to refer to libc as a service is tilting at windmills.)

John Cowan, 2012-11-13: Well, *must* is doubtless too strong. But there is such a thing as knowing too much. I'd rather not, on the whole, know how certain things are done inside the Linux Kernel, much less the Windows kernel; I might be just too revolted. I'd rather work from the API documentation instead, since I know (at least for Linux) that it's sound.

"What the deuce is it to me?" [Holmes] interrupted [Watson] impatiently; "you say that we go round the sun. If we went round the moon it would not make a pennyworth of difference to me or to my work."

Kartik Agaram, 2012-11-14: Oh look, a fellow Holmes fan! :)

Perhaps living in SF is getting to me, but I find myself in the unfamiliar position of invoking a trite moral argument. As the world changes faster and faster and we're all bound together in more and more ways, it becomes our increasingly urgent civic duty to know how the meat is made in all walks of life. For a long time I kept my world at arms length and said, somebody make it Just Work. But I'm starting to think that's unsustainable. We should all think about how the meat is made.

Prioritize the areas around our professions first ('stewards of a profession' isn't just an empty phrase), and prioritize areas that tend to change more rapidly. On both counts, how the code is written is pretty high priority for me. And if the API is good it's usually not too revolting. You might think they're independent, but in practice it's hard to have a nice API around a crap implementation, and the decay in the two tends to track quite nicely.

Otherwise I fear the tragedy of the commons will creep up through the cracks between our individual responsibilities and gobble us up. It's not just software that is vulnerable.

Anonymous, 2012-11-27: I disagree with this. A non-buggy abstraction doesn't require you to have to learn its internals, and a buggy abstraction does require you to learn its internals.

So you're basically saying that libraries suck because they have bugs, and when you encounter them, you are now having to debug other people's code?

Kartik Agaram, 2012-11-27: It's not just bugs. Some misfeatures take a long time to come to light. C programmers shouldn't use strcpy() even though it has no bugs, because it's susceptible to buffer overflows. It took us 20 years to realize this. I fear we will _never_ be rid of it even though alternatives now exist and are portable, and there will always be software that uses it and bites us in the ass at the most inopportune moment.

One of my favorite science fiction novels has this great story about a code archeologist digging into his system and finding, kilo-layers down, a little routine that counts the seconds from a time 30,000 or so years ago. They're talking about time(), but I suspect strcpy()'s there as well, somewhere in that parallel universe. I hope someday I will be able to build a copy of linux without strcpy().

Anonymous, 2012-11-27: Since the susceptibility to buffer overruns is well documented (man strcpy) I wouldn't say it's hidden by using it. I don't need to know anything about the strcpy implementation to know that.

strcpy() we are stuck with since you can't be compliant to the ISO standard without including it (though you can wean people away from it: gets() for example is nearly dead; that is partly due to documentation, partly due to lint tools warning on every use of it).

Now one thing I have observed is that in a library you almost never want to use the function with the best name, since that was probably the first one written, it likely has misfeatures.

For the most part though, I treat libraries like a black box until I run into problems with them. Doing anything else is a recipe for spending so much time worrying that I never get shit done. I once ran into a bug in malloc() and it was a pain to debug and fix (maybe 3 days). On the other hand if every time I had to study the source of malloc() for every system I've ever developed on, I would have wasted a lot more than 3 days of time doing so.

Kartik Agaram, 2012-11-27: Yeah you shouldn't have to learn the source code of malloc() before you can use it. I am (tentatively and respectfully) suggesting that you could have recompiled libc with a reasonable strcpy() the first time you learned that it was a crappy idea. No reason for ANSI to hold sway over your server with its limited dependencies. No reason to give up control over _names_ in your system.

This isn't practical, of course. Not in the world we live in. But I want to question if it is an ideal worth moving towards.

$3618952, 2012-12-19: That's an odd definition of abstraction... the whole point of abstractions is to generalise and hide details, to "remain ignorant" as you put it. So a service is an abstraction. But I'm not really interested in debating definitions.

I agree with your premise on the risks of relying on third-party software. I should know, having to fight against the numerous bugs in wxWidgets for 3 years. It's a trade-off.

Anonymous, 2013-02-14: I agree with the main point here and think this issue deserves far more attention than it gets. Abstractions should be conveniences. You should be able to bypass them if you know what you are doing. This is especially important for immature abstractions. Abstractions instead commonly serve as barriers that decrease productivity when you need to bypass them. Barriers are less of an issue, and may be helpful, when the abstraction is mature.

I disagree with shaurz. The whole point of abstractions is convenience: to increase productivity without sacrificing anything in return. Hiding details, "remaining ignorant", separation of concerns, etc is NOT the whole point. That's the tail wagging the dog. Hiding details is not helpful if it results in loss of productivity in the long run.

I understand the attraction of "information hiding" and its historical evolution. Programming theory originated with mathematics. In that field, from the point of view of using abstractions, there is essentially no practical difference between two equivalent functions "implemented" differently. The will both execute in zero time. With real computers, there may be important practical differences between two equivalent functions. As computer scientists, I think we may have taken our preference for mathematical purity a bit too far. Also, information hiding was perceived to be a necessary mechanism to prevent tight coupling. However, the problems of tight coupling can be mitigated in other ways. The issue that has stymied that realization for so long is that compiled, statically typed languages generally provide no possible way to loosely couple modules. And yet, we still use those for our key abstraction layers (operating systems interfaces and standard OS libraries). Maybe we should be try solving that problem rather than fighting abstractions the same way for another 3 or 4 decades.

Kartik Agaram, 2013-02-15: Thanks! You raise an interesting point about coupling. Codebases without information hiding are paradoxically _more_ likely to be loosely coupled than going through contortions to play to a fixed interface.

Another way to look at it: designing and software organizing software is hard enough to do right. Minimizing impedance mismatch between pieces gives us half a chance of focussing on the actual problem of loose coupling.

Kartik Agaram, 2013-04-22: Gregor Kiczales put this far better in '92 (especially the Summary at the end).

Comments gratefully appreciated. Please send them to me by any method of your choice and I'll include them here.