The open-source distributed version-control system has attained such a level of popularity in such a small time that the larger, centralized enterprise SCM software providers have had to sit up and take notice.
And they’ve noticed two key things. First, Git users sacrifice quite a bit of functionality around security, workflow and tracking that the centralized, commercial SCM systems offer. And second, many engineers don’t care. They want to work in Git for its low barrier to entry and its branching and merging capability. So, several enterprise SCM providers are coming out with Git-supported versions of their software.
There’s one other thing many in the SCM world acknowledge, as summed up by Alex Malinovich of GitHub: “It’s clear that traditional centralized SCM will be extinct in the not-too-distant future.”
Scott Farquhar, cofounder of tools provider Atlassian, which owns the Bitbucket code hosting service, echoed Malinovich’s remarks: “The way we develop has changed, with shorter timelines, the number and departments of people involved. I can’t see anyone starting a new project today and not using Git or (other DVCS systems) Mercurial or Bazaar.”
Bitbucket product manager Justen Stepka added, “In large organizations, where it’s disruptive to the bottom line to switch (SCM systems), I think you’ll see new projects begin on Git, and over time, the number of projects on Git will surpass the number on Subversion or Perforce.”
These developments are even more remarkable when you consider that source-code management had been unchanged for years. Developers wrote code, checked it in to a repository, checked it out to fix bugs or add features, and reconciled it with the main codebase to make sure nothing was broken by the revisions. Everyone understood. Branching was strictly forbidden, because manual reconciliation, and the repair of broken builds, made the practice too risky.
Then, around 2000, DVCS came along, and more specifically Git, which came out in 2004 and began to get some traction by 2007. But Eric Sink of SourceGear, author of the book “Version Control by Example,” said the credit for popularity goes not to Git but to GitHub, the hosting site that launched in 2008.
“It has become a social network for open-source developers,” he said. “It has become a sort of Facebook. They’ve found a sense of community, found a place they belong. That’s a big factor as to the popularity of Git.”
“When you get down to it, guys who like Git work in loose groups and want a low-overhead way of merging,” said Jim Duggan, analyst at Gartner Research.
Make no mistake, the DVCS and centralized sides are forming against one another. “Corporate developers are hearing at the pub from their buddies that Git is great,” said Mike Pilato, a software engineer at CollabNet, which contributes to the Apache Subversion SCM project. “It makes for battles inside enterprises. But we don’t want to move to DVCS. If we do, we admit it’s the only way to go.”
Sink cautioned, “I think we have to be careful about getting a black-and-white mentality about it. Millions of happy Subversion users would take exception with the statement that anything not distributed is obsolete.” Distributed may be the way, he said, “but you’ll see a number of different players with tradeoffs.”
Francisco Monteverde, CEO of Plastic SCM company Codice, pointed out what he believes are the problems of yesterday’s SCM systems:
• Broken builds: With everyone ultimately working in the main line, code reconciliation on large projects is extremely difficult.
• Branching limitations: This prevents parallel and agile development from taking place.
• Arcane branching patterns: Users are locked in to their tool of choice.
• Not distributed: Makes remote development difficult in terms of access to code.
• No flexible release cycle: Does not enable continuous integration and agile deployment of code.
Distributed systems have their drawbacks as well, according to Perforce’s Randy DeFauw and others:
• Git has a steep learning curve.
• There is no effective GUI yet, so it appeals more to power users than enterprise developers.
• It requires a full copy of a repository to be uploaded outside the firewall to multiple disparate developers, has issues around security, and access has become more critical (although submodules are either here or on the way in several DVCS systems).
• The system is only designed to work with text-based files, such as source code.
• There is no “master” file or canonical source.
Branching and merging
There are a number of different ideas and priorities to SCM, but one thing almost all agreed upon is that effective branching and merging is critical for organizations looking to adopt agile methodologies.
Agile development is a big driver behind DVCS and the need for better branching and merging, the experts agreed. “Software development is getting more complex. Products are bigger and harder to manage. Plus, the rise of agile development processes puts stress on the development cycle due to iterations being so short,” said DeFauw. “The days of three-week merge and release cycles are long over. [Agile organizations] want a lightweight branch and merge framework with release management built in.”
In its 2011.1 release late last year, Perforce introduced Streams, which DeFauw said allow developers to define how log fixes should flow to one another, ensuring that changes flow in the right direction and in the right order. The company calls it “branches with brains.”
AccuRev’s Cliff Utstein agreed, noting that his company is seeing “a big need for organizations re-engineering their software process to get agile right. Getting tools to adapt to their organizational process, not the other way around, is a tenet of agile practices.”
AccuRev built its product on the streaming concept from the ground up, so a branch (or stream) can inherit a child stream, and changes in the parent reflect down automatically to the child if there are no conflicts. Further, anyone working in a child stream who pushes a change up to the parent automatically makes that change available to all other streams dependent upon that parent.
“This is one of AccuRev’s first and defining features,” Utstein said. “It’s the software development process modeled out in the SCM system.”
The way the code is modeled in the system is at the core of why distributed systems do merging better than centralized systems, and thus are more accepting of multiple branches (see sidebar). Aside from a non-linear model, Git also uses unique functions for merging. “Git doesn’t actually have a two-way, or four-way, or eight-way merge,” said GitHub’s Malinovich. “It has an n-way merge. You can give it any number of conflicting changes and it uses the same functions to do conflict resolution. It’s the exact same logic that does the work comparing two changes—which is the norm—or 12 changes.
“If you had to manually compare the changes,” he added, “that just defeats the whole purpose of a version-control system.”
Other open-source DVCS providers approach merge from different angles.
Codice’s PlasticSCM takes the approach of creating a branch for each task, creating a 1-to-1 mapping. “This way, tasks start from a well-known, clearly defined start point, and are not dependent on each other,” said Codice’s president and cofounder, Pablo Santos.
“A developer can check in or check out his task as often as he wants,” he said. “When the task is finished, it goes through unit and GUI testing, and then to the integrator, or release manager, who decides when it’s good to go. You’re creating true parallel development with the value of controlled integration and merge tracking.”
Bitbucket, a code-hosting solution that began as a Mercurial project, takes the approach that code review is required before a change can be committed to a primary branch, Stepka said. “It’s a more natural process for doing code review,” he said. “The old way poisons the root of the tree.” Further, he noted, the benefit of having the entire codebase locally means local commits with more incremental save points. “If I head down a path and decide it’s rubbish, I can roll back to the save point” without having done any damage to the branch or the tree.
Bitbucket emphasizes three key features: It’s a hosting service with a standard set of functionality, it controls access to the repository to facilitate collaboration, and it uses pull requests across branches—what Stepka called “the killer feature”—to enable code review before commitment.
SourceGear, which had a Microsoft-centric version-control tool called Vault, is looking to ride the distributed wave with Veracity, which Sink said offers cross-platform DVCS. The company is working on a hosted service it expects to roll out later in spring. Veracity supports the fast-import format popularized by Git; it is a way of exchanging version-control data currently used by Bazaar, Git, Mercurial, Perforce, Subversion and Veracity. The system is also built for agile developers, with Scrum burndown charts and build-tracking functionality built in.
Even the commercial, centralized VCS providers are getting on Git. CollabNet late last year announced it was adding Git SCM to its Codesion enterprise cloud development platform, providing enterprise-grade security, support and availability, and it also created Connect, a framework that sits on top of its TeamForge ALM platform to integrate with Git and Atlassian’s JIRA issue-tracking software.
And AccuRev last month introduced Kando, a version of its AccuRev SCM system with Git capability. “We are seeing a lot of Git in organizations, especially in embedded [telecommunications] riding the Linux and Android wave,” said AccuRev’s Utstein. With Kando, AccuRev is providing “an enterprise underpinning for Git clients. For AccuRev users, it will look just like another Git repository. But the manager will see all the enterprise functionality within that environment.”
Kando, Utstein acknowledged, “is exactly due to the uptake of Git. Why fight it? Why not embrace it? Engineers can work in the same mode, but instead of pushing and pulling from point A (a Git repository), it’ll be point B—an AccuRev server.”
And Perforce is working on a Git connector that is due out in the first quarter of this year, DeFauw said. “SCM has been focusing on what the enterprise needs: security, scalability, managing collaboration. Developers felt their priorities weren’t being met. DVCS gave them a little autonomy. Architecture is what they cared about. It was code, and managing changes in code.
“Git is an effective complement to Perforce,” he continued. “It gives private branching and the ability to work offline.” Perforce, he said, will remain the central server, but will provide DVCS to individual developers as needed.
But the ad hoc notion of developers deciding when to do a merge is too haphazard and not manageable, according to Timpani Software’s Kevin Dietz, the original founder of Team Share (which created the Team Track defect system) and now the driving force behind Timpani’s Merge Magician for VCS systems.
“Instead of developers deciding when to do a merge, can we define a process for what we’re doing? Something that’s repeatable,” Dietz said. Developers, he said, don’t know all the branches that exist, and so cannot merge their changes into everyone else’s branches. Merge Magician brings streaming to automated merging, to get changes down to where they need to be. The tool, Dietz said, currently supports Subversion and Microsoft Team Foundation Server.
Merge Magician uses a publish/subscribe paradigm to create connections between branches. “If you want a parent branch to push changes to a child branch, you create a publisher on the parent branch and create a subscriber on the child branch. Then you can define how frequently to pull down changes,” Dietz explained. “This is way more flexible than a branch hierarchy. You can turn pub or sub on or off independently of the other. It’s a more versatile approach.”
The tool uses a polling algorithm to poll branches for changes, he added. Merge Magician will look at all the changes and will follow rules it has been told. It can publish the changes continuously, on a schedule, or due to an external trigger, such as a person or a build system. “If the build succeeds, that can trigger a message to Merge Magician that the changes can be published,” he said.
Git’s got flaws
Yet for all the movement toward Git, there are some warts on the repository.
First and foremost, when working in Git, you are an island unto yourself, according to GitHub’s Malinovich. “Git doesn’t have collaboration features; there’s no easy way to talk to others.” GitHub, he said, “introduced forking. You can create a copy of someone else’s project without losing the connection. You can hack into someone else’s code via that fork, then compare the forks and run your copy.”
Another well-known weakness of Git is that it is not suited to storing large binary files, videos or 3D images, because they are very large. Git is efficient at packaging data, and compression “is amazing” for typical source files, Malinovich said, but not so much for other file formats.
And this has left an opening for innovation.
Perforce, for example, is moving forward with its message of “Version Everything,” the idea that more than just code files need to be versioned, and their histories understood.
Pixar Animation Studios uses Perforce to manage all the digital content for its movies, providing “state of system” snapshots. “It’s so much more than source code,” said Perforce president Christopher Seiwald. “Pixar is sued all the time over intellectual property infringement. If they can go back in their history and show the original versions of assets, they can protect themselves.”
He sees a big opportunity for Perforce to capitalize on the movement toward versioning everything in the enterprise, by moving into Web content management and other types of document management. “People are thinking about versioning more, but outside of source code, it’s drop-box backup,” he said.
Versioning can become a workflow mechanism, Seiwald said, as adding processes on top of versioning “can be very powerful. And the same is true with other kinds of content. There are conceivably a half-dozen branches of the same document with a complexity similar to source code” in terms of revisions and getting approvals for the changes.
Perforce is working on two sample applications to demonstrate this. One, called Chronicle, is Perforce’s first attempt “toward social (workspaces) with versioning as its underpinning.”
DeFauw described it as a WCM system, similar to Drupal, with an open-source front end on top of Perforce. “You can branch or clone a website or a piece of a site, work on it offline, and merge it back in,” he said. “This brings versioning to sites.”
The other sample application is called Commons, which DeFauw said brings the power of version control to non-technical users working with non-software types of data. Seiwald described it this way: “Instead of e-mailing a document, you e-mail a link, drag the document out of Commons, and when you’re done revising it, you put it back into Commons. The version is tracked; the system tags it with a name when you drag it out, so when it comes back in, we know where it came from and how it changed.”
This will bring Perforce to a different kind of user, Seiwald said. DeFauw pointed out that using Perforce for more than source code—documentation, multimedia assets, chip designs and wire frames, among many examples—makes sense. “You should put all your digital assets that constitute your IP into a single system,” he said.
AccuRev also sees the benefit of workflow. Its next release will include a Workflow Edition, which Utstein said “enables you to create workflows to provide stronger change management around issues and requirements. It enables bringing source code and history with tasks.” He said developers will not be able to move code to branching without updating the status on an issue—for example, changing it from “assigned” to “working.”
CollabNet, meanwhile, is focusing on bringing all this enterprise richness into open-source tools. “You should version everything: code, requirements, configurations, JARs and binaries and documents,” said CollabNet’s senior marketing director Lothar Schubert. “Traceability is important. You can use Subversion for code and documents, do versioning in the file release system, and do configurations in the cloud tool.”
CollabNet’s TeamForge is the central place of orchestration for Git or Subversion code, moving to Jenkins for continuous integration, then deploying into Amazon’s EC2 cloud, where Lab Management does the deployment, he said, adding that TeamForge gives the traceability.
With all these different approaches and offerings, it’s no wonder that SourceGear’s Sink believed that a hybrid approach—Git with “something else,” he said—is the likely path going forward. “The user has Git, and at the end of the day, he has to integrate with the tool his boss wants him to use.”
Sink also believed there is room for the many players in the SCM and version-control space today. “I don’t think version control in five years will be any less fragmented than it’s always been,” he said. “Distributed is the way, but I see a number of different players with tradeoffs.”
The differences under the hood
Subversion remains the most popular SCM tool in the market, with more than 5 million developers using it, according to Schubert, senior director of product marketing. That said, Git is garnering a ton of attention and recently passed 1 million users, due to what its users say is superior branching and merging.
SourceGear’s Sink explained that the fundamental difference between the systems lies in the way the tools model history. Both record every version of code, he noted, but it’s how that versioning is represented that marks the major difference between the systems.
Subversion, Sink said, relies on a linear model in that every version follows from the one before it. “It requires a lot of rules so that when the line splits, it must be reconciled at some point,” he said. That, he added, is both one of the great limitations of Subversion as well as one of its great strengths. “It’s not flexible, but it’s easy to understand,” he said.
Meanwhile, Git has its history modeled as a directed acyclic graph (DAG), in which there is a pointer from every version to the parent. “This allows for some chaos and gives us more flexibility,” he said. “The DAG model lends itself to really good merging. It’s about figuring what happened originally and what changes actually occurred, but it doesn’t force developers to get their changes into the main line. Instead, it leaves how to deal with the changes for later.”