We hear frequent claims about the quality of software produced with Agile methods. Most of the sparse data on the subject come either from case studies, questionnaires, or university-based experiments. We finally have some industrial data comparing the structural quality of Agile and Waterfall methods, and the results are mixed—or rather they support mixed.

Every two years my colleagues at CAST and I produce a CRASH Report, the obligatory acronym for CAST Research on Application Software Health. In it we analyze data collected from static analyses of large IT systems developed or enhanced primarily in North America, Europe and Asia. These systems were typically business or mission-critical systems submitted for analysis to CAST’s Application Intelligence Platform. The applications ranged from 10,000 to over 11 million lines of code, with the average being just over 470,000 lines. Of the 565 Java EE applications we analyzed, 181 reported whether they used Agile methods (typically Scrum,) Waterfall methods, a mix of Agile and Waterfall techniques or no defined method.

CAST’s technology detects violations of good architectural and coding practice that affect the robustness, performance, security, changeability and transferability (or understandability) of the application, and provide summary scores for each quality characteristic. These measures describe the non-functional, or structural quality of the application when evaluated at both the architectural and component levels. The most serious weaknesses detected in these analyses are often described as technical debt.

Our statistical analyses yielded the same finding for every structural quality characteristic—the Agile/Waterfall mix was significantly better than either Agile or Waterfall methods used alone. The results were strongest for robustness and changeability, where differences among methods accounted for between 14% and 15% of all the variation in quality scores. The results for security, performance and transferability scores were slightly smaller, but still significant. In essence, the Agile/Waterfall mix left far fewer structural weaknesses in the application that increased the operational risk to the business or the cost of ownership for IT.

Additional statistical analyses showed that most of the differences on each of the quality measures were accounted for by the difference between the Agile/Waterfall mix and the other methods. In fact, for robustness and changeability, three-quarters of the scores for the Agile/Waterfall mix were higher than the median score for either Agile or Waterfall methods alone. We did not find statistically significant differences between Agile and Waterfall methods on any of the five structural quality measures.

As many in the Agile community such as Scott Ambler, Alistair Cockburn, Dean Leffingwell, and others have argued for years, Agile methods should be adjusted to the level of challenge and complexity in a system. Consequently, it is not surprising that for the predominantly large, business critical applications we analyzed, an Agile/Waterfall mix proved to yield better structural quality results, as it put more emphasis on up-front analysis and architectural design prior to launching short, time-boxed iterations. The Agile/Waterfall mix combines the advantages of avoiding many architectural weaknesses and limitations early with rapid feedback on code-level quality during iterations. These findings should not be generalized to smaller, less complex applications until enough data is available to see if mixing methods offers an advantage.