Metrics have become a standard accoutrement of all application life-cycle management tools. Every management package has some form of dashboard that enables users to see at a glance the health of the project. Or so it’s promised.

The problem is that there exists little guidance regarding which metrics to watch, which ones to ignore, which should be checked daily, and which you can peer at monthly. This lack of knowledge is compounded by the presentation of nearly useless metrics.

In this latter category, I include the trend graphs generated by most CI servers. With few exceptions, they favor telling managers the historical percent of broken builds accompanied by a graphical history of build times. This is a clear case of data that is presented simply because it’s easy to show, rather than for its intrinsic value. (Most managers only care if the last build broke, not how the current failure rate ranks all time for the project. Likewise, build times are only valuable in the context of the last few builds, not in comparison with every build the project has seen.)

Presuming useful numbers, there are two kinds of metrics: those that can be evaluated in absolute terms, and those that are best interpreted in relation to other metrics. In addition, metrics can be grouped by subject area. In his book “Automated Defect Prevention,” Adam Kolawa suggests the following: requirements metrics, code metrics, test metrics, defect metrics, and a set of numbers that attempt to establish the quality of project estimates.

In many of these categories, standalone metrics provide a convenient, informative, at-a-glance snapshot. For example, LOC—a number that has little value in terms of assessing developer productivity—is an effective tool for understanding other metrics. Almost any code or testing metric that suffers a sharp spike or sudden drop requires a look at total LOC to be understandable. So, LOC should be monitored for any sudden changes.

Requirements metrics, which measure the number of requirements that have been implemented and tested, are also valid standalone metrics. As is the case with LOC, their value is maximized when measured over time; the trend is your friend.

Test metrics are more subtle and often require ratios to be understandable. A favorite test metric that has now been widely discredited as a standalone measure is code coverage. It is now well established that 100% code coverage is rarely a valid goal, so the code coverage as absolute values is useful only to meet a certain base coverage target.

A trend, in this regard, has little value. However, code coverage paired with complexity measures is a highly useful metric. The combination will determine whether you are testing certain code enough. A tool called Crap4j, published by the now-defunct Agitar (but still hosted at crap4j.org), uses cyclomatic complexity (CCN) and teams it with coverage as follows: CCN of 1–5 needs no coverage; 6–10 needs 42% coverage; 11–15 should get at least 57%; 16–20 should have 71%; 21–25 should be at 80%, and the rest up to a CCN of 30 should be subject to 100% coverage. Any method with a CCN above 30 should be refactored, according to the tool.

About Andrew Binstock