We hear frequent claims about the quality of software produced with Agile methods. Most of the sparse data on the subject come either from case studies, questionnaires, or university-based experiments. We finally have some industrial data comparing the structural quality of Agile and Waterfall methods, and the results are mixed—or rather they support mixed.

Every two years my colleagues at CAST and I produce a CRASH Report, the obligatory acronym for CAST Research on Application Software Health. In it we analyze data collected from static analyses of large IT systems developed or enhanced primarily in North America, Europe and Asia. These systems were typically business or mission-critical systems submitted for analysis to CAST’s Application Intelligence Platform. The applications ranged from 10,000 to over 11 million lines of code, with the average being just over 470,000 lines. Of the 565 Java EE applications we analyzed, 181 reported whether they used Agile methods (typically Scrum,) Waterfall methods, a mix of Agile and Waterfall techniques or no defined method.

CAST’s technology detects violations of good architectural and coding practice that affect the robustness, performance, security, changeability and transferability (or understandability) of the application, and provide summary scores for each quality characteristic. These measures describe the non-functional, or structural quality of the application when evaluated at both the architectural and component levels. The most serious weaknesses detected in these analyses are often described as technical debt.

Our statistical analyses yielded the same finding for every structural quality characteristic—the Agile/Waterfall mix was significantly better than either Agile or Waterfall methods used alone. The results were strongest for robustness and changeability, where differences among methods accounted for between 14% and 15% of all the variation in quality scores. The results for security, performance and transferability scores were slightly smaller, but still significant. In essence, the Agile/Waterfall mix left far fewer structural weaknesses in the application that increased the operational risk to the business or the cost of ownership for IT.

Additional statistical analyses showed that most of the differences on each of the quality measures were accounted for by the difference between the Agile/Waterfall mix and the other methods. In fact, for robustness and changeability, three-quarters of the scores for the Agile/Waterfall mix were higher than the median score for either Agile or Waterfall methods alone. We did not find statistically significant differences between Agile and Waterfall methods on any of the five structural quality measures.

As many in the Agile community such as Scott Ambler, Alistair Cockburn, Dean Leffingwell, and others have argued for years, Agile methods should be adjusted to the level of challenge and complexity in a system. Consequently, it is not surprising that for the predominantly large, business critical applications we analyzed, an Agile/Waterfall mix proved to yield better structural quality results, as it put more emphasis on up-front analysis and architectural design prior to launching short, time-boxed iterations. The Agile/Waterfall mix combines the advantages of avoiding many architectural weaknesses and limitations early with rapid feedback on code-level quality during iterations. These findings should not be generalized to smaller, less complex applications until enough data is available to see if mixing methods offers an advantage.

The variation among scores within each of the methods was large, suggesting that other factors were affecting structural quality. Certainly one factor involves the discipline with which a method is practiced. I remember Jeff Sutherland commenting at the Agile Alliance conference several years ago that 70% of the companies he visited were doing ‘Scrum-but’. They claimed to be using Scrum, but they weren’t doing daily standups or other practices integral to the Scrum method.

Another result highlighting the importance of discipline found that applications developed in CMMI Level 1 organizations scored significantly lower on all structural quality measures than applications developed in CMMI Level 2 or Level 3 organizations. Some of these differences were likely due to the greater control CMMI Level 2 organizations place on both commitments and product baselines, so that developers can work at a sustainable pace, producing higher quality code and catching defects earlier. We did not find significant differences in structural quality scores between applications developed in CMMI Level 2 and Level 3 organizations, which is not surprising since moving to Level 3 primarily involves practices that improve a development organization’s economy of scale across applications.

Taken together these results argue that discipline and method are both important, and that the most structurally sound software is produced when development methods are mixed together to blend a sufficient amount up-front architectural analysis with rapid feedback from time-boxed iterations.

A summary of the key findings from the 2014 CRASH Report can be downloaded here