Nov 10, 2017
The cargo cult of versioning

[Update Nov 27: This post had issues, and I retract some of my more provocative claims. See the errata at the end.]

All software comes with a version, some sequence of digits, periods and characters that seems to march ever upward. Rarely are the optimistically increasing versions accompanied by a commensurate increase in robustness. Instead, upgrading to new versions often causes regressions, and the stream of versions ends up spawning an extensive grapevine to disseminate information about the best version to use. Unsatisfying as this state of affairs is to everyone, I didn't think that the problem lay with these version numbers themselves. They're just names, right? However, over the past year I've finally had my attention focused on them, thanks to two people:

  • Rich Hickey pointed out last year that the convention of bumping the major version of a library to indicate incompatibility conveys no more actionable information than just changing the name of the library.

  • I recently encountered this post from 2012 by Steve Losh, pointing out that if version numbers were any good, we'd almost always be looking to use the latest version number. But that doesn't happen. In the real world we're constantly trying to hold the stream of versions back, to clamp version numbers to some ceiling. Pinning our systems to use a specific version or range of versions.
  • Why is version pinning so prevalent? The proximal reason is that modern package managers uniformly1 fail to provide the sane default of "give me the latest compatible version, excluding breaking changes."

  • RubyGems defaults to newest version available. So if you don't specify a version for a dependency and they create a breaking v3.0 when you've been implicitly using v2.2, boom. Result: best practice of pinning major version using the twiddle-waka or pessimistic operator all over the place, and Steve Losh shaking head sadly.

  • NPM for Node.js defaults to version tagged "latest". Again, if you go with the default version your project is liable to go boom at some future date. Best practice is to pin major versions using either the tilde operator or tags. Cue more sadness for Steve Losh. [Edit: The fact that you're required to provide some sort of version string at least raises the question of what it should be. It's still a gotcha that the empty string "" means "latest", but it's mitigated by the use of npm install --save to record dependencies rather than editing package.json directly.]

  • Leiningen for Clojure once again defaults to the latest version. [Correction: a comment pointed out that Clojure has no default, but always requires a version. This approach suffers from the opposite problem: you never get any bugfixes or security fixes unless you explicitly mess with package.clj.]

  • …and so on.
  • These are all deep, competent projects. Why are their defaults so uniformly useless and misleading? The underlying reason is the traditional format of version numbers: mashing together multiple numbers into a single string, and more importantly separating the version string from the name of a package. A dependency that is just a name provides no hint on what version you want compatibility with, so a package manager has no easy way to pick a good default version.

    Towards a better approach

    To begin with, it's weird that versions are strings. Parsing versions is non-trivial. Let's just make them a tuple. Instead of "3.0.2", we'll say "(3, 0, 2)".

    Next, move the major version to part of the name of a package. "Rails 5.1.4" becomes "Rails-5 (1, 4)". By following Rich Hickey's suggestion above, we also sidestep the question of what the default version should be. There's just no way to refer to a package without its major version.

    Since we always want to provide the latest version by default, the distinction between minor versions and patch levels is moot. Just combine the 2-tuple into a single number. "LeftPad 17.5.0" would now become something like "LeftPad-17 37".

    At this point you could even get rid of the version altogether and just use the commit hash on the rare occasions when we need a version identifier. We're all on the internet, and we're all constantly running npm install or equivalent. Just say "Leftpad-17", it's cleaner.

    And that's it. Package managers should provide no options for version pinning.

    A package manager that followed such a proposal would foster an eco-system with greater care for introducing incompatibility2. Packages that wantonly broke their consumers would "gain a reputation" and get out-competed by packages that didn't, rather than gaining a "temporary" pinning that serves only to perpetuate them. The occasional unintentional breakage would necessitate people downstream cloning repositories and changing dependency URLs, which would create a much more stringent atmosphere of accountability for the breaking package. As a result, breaking changes wouldn't live so long that they gain new users.

    In particular, Semantic Versioning is misguided, an attempt to fix something that is broken beyond repair. The correct way to practice semantic versioning is without any version strings at all, just Rich Hickey's directive: if you change behavior, rename. Or ok, keep a number around if you really need a security blanket. Either way, we programmers should be manually messing with version numbers a whole lot less. They're a holdover from the pre-version-control pre-internet days of shrink-wrapped software, vestigial in a world of pervasive connectivity and constant push updates. All version numbers do is provide cover for incompatibilities to hide under.

    Update (Nov 27)

    This post aroused a lot of great feedback on Hacker News and Lobste.rs. After a day of engaging with comments my conclusion was that I should have been more explicit about my focus: the flow for upgrading (and testing) software in development, not deploying to production. In particular, the post doesn't make any claims about versions in production. Reproducible builds are great! But you just need a hash for them. Right?

    A prolonged exchange with Joel Parker Henderson convinced me that it's just not feasible to separate operational concerns from development concerns. A common question when managing software in production is, "what version is this running?" And that question quickly requires drilling down to the constituent pieces of a release and their versions. A hash makes that too hard. And you can't have separate version strings for development and deployment either, that's just a recipe for confusion. Therefore, if you take operational considerations into account, my claim that we don't need versions at all is invalid.

    What, if anything, remains of value in this post? Package managers should by default never upgrade dependencies past a major version.

    The design goal of a package manager should be that a dependency once added to Gemfile or package.json should never need to be modified until it's deleted. What people specify manually goes there, what the package manager deduces goes somewhere else (like Gemfile.lock). If people are editing version strings en masse in Gemfile or equivalent, that is a smell.

    In the next mainstream platform, the versions people specify for dependencies should consist of just a major version, because that's the part that the package manager can never deduce. SemVer is a siren here because it conflates pieces from multiple jurisdictions. The major version is the user's responsibility, and minor and patch versions are the package manager's. Why coalesce the two? That just necessitates baroque syntax like twiddle-waka to do the safe thing.

    (And oh, if RubyGems and NPM are smelly, the Clojure approach totally stinks. Clojure requires manual intervention to pull in compatible/security fixes for dependencies. It follows the existing Java approach, but Java's eco-system predates the advance of package managers, half of whose reason for existence — after installing dependencies — is updating dependencies. I may still be unaware of some design rationale here, but for now I think Leiningen really missed an opportunity to improve on Java here.)


    footnotes

    1. One exception here is Go, where the standard go get command requires no versions, and always grabs the head of the repo. However, the community seems to be turning from the light to the darkness with a proposal for a tool called go dep. It's unclear to me if this is due to a failure of communication on the part of the original authors of Go, or if there's a deeper justification for go dep that I'm missing. If you know, set me straight in the comments below.

    2. Following Steve Losh, we'll allow packages ending in "-0" to make incompatible changes to their heart's content.

    Making the big picture easy to see, in software and in society at large.
    selected
    Code
    Prose (shorter; favorites)