On the first project in which I used a Distributed Version Control System, I wondered what was such a big deal. Version Control is, it seems to me, one of the less interesting aspects of software development, an unfortunate artifact of the extraordinary plasticity of our work, shared source code, and the reality of dead ends and bugs. Now I’m fully converted. But I’m getting ahead of the story.
We used to say that using version control was one of the tell-tales between amateur and professional developers. In those days, version control used a “check-out, check-in” model. It was like a library: If you needed to touch a function, you made a request to the central repository. If no one else had checked out the file, you would be granted the global lock, which you’d retain until you checked the file back in.
It was undeniably intrusive, especially in the pre-object-orientation days when modules were generally structured in functional categories or abstraction levels. While a trivial programming task might only touch a file or two, harder tasks were generally spread broadly throughout the codebase, and it was common to be thwarted by the need to make a single-line correction in some large file that had been checked out for days by someone struggling with a function hundreds of lines away.
The advantage of the library model was that it forced a deliberate consideration of whether a change was appropriate or not. Additionally, it pressured you to make minimal change sets; the organizational pressure to hold a minimum number of check-out locks encouraged a quick, single-file check-in after the common situation of making a besides-the-point refactoring or small change.
I suppose it was in the late 1990s when the “lock-free” model of version control made rapid gains. This is the model that most of us use today: One is free to change any file in the project, and at check-in, one must reconcile situations where your changes conflict with changes introduced since you had begun work. This model is reliant on text-comparison algorithms that attempt to automatically merge files. These algorithms do tolerably well, although I’m sure we’d all be appalled if presented with the number of minutes we’ve accumulated manually fixing merges after our diff tools “got confused.”
The advantage of the lock-free model is simple: Another person cannot get in your way, at least not unless they check in between your last pull and commit. In the early days, there was great fear of lock-free version control because of the potential “chaos of conflicting changes,” and, to be sure, it’s wise to commit changes before going on vacation. In fact, it’s wise to “commit early and commit often” when working lock-free, as the smaller the change set, the less chance for a conflict.
Of course, if everyone on the team commits early and often, the odds increase that the codebase has shifted since your last pull. This is where continuous integration and a test suite become critical: Conflicts are accepted as inevitable and even common, but continuous integration identifies them quickly and a test suite allows one to be confident of his or her resolution.
What does distributed version control bring to the party? The technical answer is that DVCSes do not rely on a central server to be the canonical reference point, but this is not the practical difference. In practice, of course teams have a repository that is carefully backed up from which the integration server draws its code and to which the work flows. The practical difference in DVCSes is that branching and merging becomes an integral part of work.
The terms “trunk” and “branch,” going back to the file-system-like “check-in, check-out” days, reveal a bias that most teams still adhere to: Progress is made along the trunk, and branches are occasionally necessary, but not really desirable. The opposite is true in DVCSes: Branches are a cheap operation and are used constantly for work of any duration.
A change of metaphor helps; I like to think of code evolving in a DVCS like a railroad. There’s the rarely updated “master” track (formerly the “trunk”) which, like an express train, only has stops at important junctions, such as releases. Then there’s the frequently updated “development” track, which is like the local commuter train, from which the integration server draws its code. Finally, there are limitless feature-branch “sidings” in which the individual developer does his or her work, checking in and rolling back, and hitting dead ends without fear of disaster.
This model allows the developer to work his or her branch check-ins in whatever way works best, and also to clean things up and write clear log messages when merging back to the “development” track.
Vincent Driessen, a Dutch software engineer, wrote a fine tutorial showing how to use this model with the popular open-source DVCS git. He additionally wrote some extensions for git to support the model, but the extensions are not necessary, and the model works with any VCS in which branching is cheap and merging is easy.
The technology of Version Control has changed over the years; so too should your version-control process. Try the branch-and-merge style and get on the DVCS train.
Larry O’Brien is a technology consultant, analyst and writer. Read his blog at www.knowing.net.