Metrics have become a standard accoutrement of all application life-cycle management tools. Every management package has some form of dashboard that enables users to see at a glance the health of the project. Or so it’s promised.
The problem is that there exists little guidance regarding which metrics to watch, which ones to ignore, which should be checked daily, and which you can peer at monthly. This lack of knowledge is compounded by the presentation of nearly useless metrics.
In this latter category, I include the trend graphs generated by most CI servers. With few exceptions, they favor telling managers the historical percent of broken builds accompanied by a graphical history of build times. This is a clear case of data that is presented simply because it’s easy to show, rather than for its intrinsic value. (Most managers only care if the last build broke, not how the current failure rate ranks all time for the project. Likewise, build times are only valuable in the context of the last few builds, not in comparison with every build the project has seen.)
Presuming useful numbers, there are two kinds of metrics: those that can be evaluated in absolute terms, and those that are best interpreted in relation to other metrics. In addition, metrics can be grouped by subject area. In his book “Automated Defect Prevention,” Adam Kolawa suggests the following: requirements metrics, code metrics, test metrics, defect metrics, and a set of numbers that attempt to establish the quality of project estimates.
In many of these categories, standalone metrics provide a convenient, informative, at-a-glance snapshot. For example, LOC—a number that has little value in terms of assessing developer productivity—is an effective tool for understanding other metrics. Almost any code or testing metric that suffers a sharp spike or sudden drop requires a look at total LOC to be understandable. So, LOC should be monitored for any sudden changes.
Requirements metrics, which measure the number of requirements that have been implemented and tested, are also valid standalone metrics. As is the case with LOC, their value is maximized when measured over time; the trend is your friend.
Test metrics are more subtle and often require ratios to be understandable. A favorite test metric that has now been widely discredited as a standalone measure is code coverage. It is now well established that 100% code coverage is rarely a valid goal, so the code coverage as absolute values is useful only to meet a certain base coverage target.
A trend, in this regard, has little value. However, code coverage paired with complexity measures is a highly useful metric. The combination will determine whether you are testing certain code enough. A tool called Crap4j, published by the now-defunct Agitar (but still hosted at crap4j.org), uses cyclomatic complexity (CCN) and teams it with coverage as follows: CCN of 1–5 needs no coverage; 6–10 needs 42% coverage; 11–15 should get at least 57%; 16–20 should have 71%; 21–25 should be at 80%, and the rest up to a CCN of 30 should be subject to 100% coverage. Any method with a CCN above 30 should be refactored, according to the tool.
CCN is a flawed measure of complexity, but it’s easy to calculate, and when combined with code coverage as Crap4j suggests, it gives managers a guideline they can enforce and monitor regularly.
Another valuable ratio compares changes in LOC to changes in the number of tests. In theory, the number of tests should change in direct proportion to the LOC. Clearly, as the previous metric demonstrates, there are variations within this, but on codebases of any size, there are enough simple and complex classes so that the average CCN tends to remain fairly stable. On my projects, I compare these two numbers graphically on a monthly basis, by graphing both trend lines on the same grid and seeing if their slope roughly matches.
Team leaders will like another metric that’s difficult to put together, unless you’re using Atlassian Clover: recently added or changed code that lacks tests. This number should remain constant as a project moves forward. This metric is even more useful if it the uncovered code can be tracked by CCN levels. (However, no automated process to do this exists currently.) Such a capability would help provide a prioritized worklist to developers whenever their complex code lacks tests.
There are many more metrics that can be used effectively by managers. But they should be chosen carefully. Good managers need only a handful of metrics, which are consulted regularly to gauge the health of a project. Anything else is probably too much for most IT projects and will lead to confusion, micromanagement and eventually paralysis—or just the opposite: the metrics will be ignored.
Andrew Binstock is the principal analyst at Pacific Data Works. Read his blog at binstock.blogspot.com.