Finding and fixing defects in software is as old as the practice of development itself. And, like development, defect tracking is done many different ways by different organizations, depending upon the size and distribution of the team, business priorities, and even the development process used.
Many organizations will use an algorithm to help decide what defects get fixed, and when. It can range from “Don’t ever ship software with bugs” to a ranking system, and it will include such calculations as the number of people impacted by the defect, how likely the defect is to come up, how much it costs when it does arise, and how much support the defect is generating.
Methodologies such as agile development and continuous integration and deployment also have changed the game, but in ways that may seem counterintuitive. One might think going faster creates more defects and leaves less time to find and remediate them, but just the opposite is true. Agile development means focusing on smaller pieces of the project, so the amount of change is much smaller, and, according to Jonathan Nolen, a senior development manager at software tool provider Atlassian, “the chances for you to create unintended consequences are less.”
In today’s world, “the acceptable level of quality you’d release has to be higher,” said Alex Perec, senior product manager at TechExcel, which makes development tools. “You want to address issues as quickly as possible.”
This article will crystallize the thoughts and opinions of leading experts into a best-practices guide to defect tracking and remediation.
“It’s really hard to talk about just defect tracking in isolation from the whole rest of your entire project setup,” Nolen said. “It’s a very holistic activity, because anytime you choose to fix a bug, you’re essentially choosing not to do something else. So you always have to balance fixing that bug versus the other thing you would like to be doing. Tying that decision-making process into your entire development team and process is very important.”
Defect tracking must be part of the overall development and planning process, according to AccuRev vice president of product management Brad Hart. “If you say you’re doing agile development and you’re planning your next sprint, there’s going to be stories that you accept into the sprint and you’re going to use your agile planning tool to manage that, but you should also be planning the defects that you’re going to accept. It can’t be as one-offs or on the side,” he said.
Assigning roles is a good way to keep moving forward on a project while also making sure the software’s defects are being corrected, the experts said.
Paula Rome, director of product management at Seapine Software, recommended that organizations identify one person responsible for doing defect “triage”—assessing if the defect needs to be addressed at all, and if so, when, and by whom. “What is your triage process? Who’s responsible? How often should they look at new issues to categorize, prioritize and assign?” she asked.
Atlassian’s Nolen agreed with Rome, and said that at his company, that role is called “the disturbed.” This, he explained, “allows the majority of the team to be as undisturbed as possible. One guy looks at all incoming bugs, does triage, and fixes them himself.” In this way, people who file bugs get a quick response, and the whole team doesn’t have to be manning the bug queue, he added.
Creating a process for dealing with defect fixes offers efficiencies and can be tailored to how an organization likes to work.
The process, though, “is more complicated than being a state traffic cop,” offered Rich Bianchi, CEO of software provider Alexsys. “All this is a workflow problem that might be multi-faceted.”
Alexsys’ tools can be customized to enable users to create different forms, layouts and requirements for different types of defects, each of which might kick off a different workflow. “Depending upon your product, you might have different selections; you can select different people” to be in the workflow, he said, while targeting different types of bugs for different builds.
Meanwhile, Axosoft QA engineer Michael Robinson said terminology can be an issue if the items in the software are not called the same as terms used among the working groups, which can lead to a lack of understanding. If you’re using Scrum as a development methodology, he said, you want the tool to call work items “user stories” and “product backlog,” for example, instead of “defect” and “feature.” Customizable fields are an important feature in defect-tracking tools, and most allow those changes to be made, according to experts.
Seapine’s Rome cautioned, though, that it’s important “for everyone on the team to know that the custom field means. You need to take the time to agree on step 1: what a ‘bug’ means.” She went on to say that when organizations have been doing something for a while, it’s easy to add custom fields to the tools. But after a process is a couple of years old, it’s important to make sure you’re adjusting the process. “It’s not that the (defect-tracking) system isn’t working anymore, but it has changed. Organizations think at the time the custom changes are great, but they never revisit it, and you end up with a bloated system.”
Atlassian’s Nolen discussed the notion of warranty bug fixes. “If your team built a feature, it is responsible for bugs, even if you’ve moved onto something else. This gives other people in the company dealing with the feature someone to go talk to,” he explained. For non-warranty bugs, which usually involve projects more than 10 years old with large feature surfaces, defects go into a backlog, and a person or team then cranks through them as quickly as they can. “It’s important that somebody is trying to make some dent in that backlog,” he said.
If everything is a Priority-1 defect, then nothing is a Priority-1 defect, said Alexsys’ Bianchi. “You have to effectively prioritize,” he said.
The definition of an acceptable level of defects will vary between organizations, or even from product to product, according to TechExcel senior product manager Vineet Agarwal. “Some organization might decide that there can be no user-visible bugs in the product. Others might say only a Priority 2 or 3 is allowed in a release.”
These kinds of priorities help organizations get the most out of their engineering resources.
Atlassian’s Nolen said his engineering teams only have two versions of the software in development at any given time: trunk and stable. “The way we do versioning is, let’s say you’ve got project version 3.4.0. From that moment on, your 3.4 branch, that is your stable branch. You’ll do .0, .1, .2, .3… While that’s going on, your trunk, which is 3.5, or 4.0, that’s in development internally and usually customers are not seeing that at this point, and so you split you efforts between bug fixes and maintenance on [the stable branche]. Usually, we spend like 15 or 20% of total engineering effort in that bucket, sometimes a little less. And then trunk—your 3.5 or 4.0—is on a three- to four-month delivery cycle, and that’s where the majority of your effort is going. What that means is, anything prior to your 3.4 version is essentially frozen.
“We don’t do fixes, we don’t do special one-offs, we don’t do branches per customer, we don’t do custom anything,” he added. “The only time we ever go back further than that four months is in the case of critical security fixes. We will issue patches for those. But we don’t do any traditional bug fix or any maintenance. That basically means our engineering efforts are not split or divided across multiple current versions of the software, and we don’t have to do each fix six or 12 or 18 times as I’ve seen other companies sometimes do. It gets especially bad when you end up branching with custom features per customer or per even slice of customer. It’s something I would highly encourage for people to draw a line in the sand and not do that.”
Bianchi said he’s a big fan of doing high-level build plans to begin with: Lay out what you want to produce, and let the engineers work from that.
“Make it as clear as possible,” he said. “People want to know one thing: What am I supposed to work on today?” But, you have to remain flexible as well. “For every 20 planned tasks you have, you’ll have 50-100 other things—mostly bugs—and you have to track them,” he said.
The trend to agile is a great thing, said AccuRev’s Hart. “I haven’t seen anyone doing pure agile by the book, but there is one thing in common: quicker feedback,” he said. “If a bug is a story, it’s found right away, as opposed to finding it later during integration, like under waterfall. This greatly reduces the cost of bugs, and I’ve seen defects being treated more as first-class citizens.”
Hart noted that development’s all about going faster and producing more code, while the operations side is about managing risk. “So,” he said, “you want to be able to allow the ops team the ability to say, ‘You know what, we’re real close to pushing a new release to the site, but there’s a critical fix that I need, I want to be able to take just that fix, not last night’s build,’ because last night’s build might include 15 different things, and maybe three or four of those are kind of risky that they want to do some more verification on. If you’ve got everything properly set up and you’re working in a change-based model, and your systems support it, then the ops guy can say, ‘I just want to take that critical fix, just that fix, even though there are 10 other fixes in the queue,’ grab that change and push that into production, mitigating the risk. This will give you a lot more granularity as well as the traceability you’re looking for.
Working quickly and reducing the cost of bugs allows engineering teams to work on tasks that advance the software, according to Atlassian’s Nolen. “When I think about our engineering teams, I think about the ratio between work that pushes us forward and work that is about removing risk. It’s really easy to let that ratio get skewed to where you’re spending 80 or 90% of your time removing risk and only 10% of your time actually building something a customer cares about. So, the way I view it is, if you can find a way to reduce the cost of a bug, you can also reduce the amount of time you spend trying to avoid that bug, because you know that you can fix it very quickly.”
The shorter cycles called for by agile development “address the immediate need, instead of compiling a list of high- and low-priority bugs,” said TechExcel’s Perec. This, he said, frees organizations from the multiple calls of “When will it be fixed? When will it be fixed?”
Agile methods also get developers and QA working more closely, said Perec. “The development team is only focusing on one or two features. This takes pressure off QA, which only has to test the two new features and then run regression tests. Defects are discovered and fixed in the same cycle,” he said.
Axosoft’s Robinson cited another benefit of agile development on defect tracking: “When you’re doing a lot of agile, you’ll be able to solve things so quickly that when you look at the (defect) list, those bugs won’t even apply anymore. They’ll be fixed before they can even be taken off the list.”
Another critical element to reducing delivery time is feedback, according to Nolen. “You want the feedback from the customer to get back into your engineering team as quickly as possible,” he said. “And you don’t want completed code sitting on the shelf essentially wasting value or rotting while you try to organize a release. The faster you can get it out to the customer, the faster you can learn, the faster you can deal with problems or learn good things, like if you find a direction that has been profitable, continue to pursue that. That’s why we think that reducing the number of hours between somebody committing a change and a customer actually seeing it and the developer getting the feedback from that interaction is of primary importance in a healthy engineering team.”
Nolen went on to say that when you deploy faster, you reduce the cost of the bug that you shipped. “If you ship a bug and you know it will be out there in the field for a minimum of two years, then the cost in terms of customers hit—dollars affected—the pain cost is very high,” he pointed out. “If you ship that same bug, and you know that as soon as the first customer hits it you will notice it, fix it and that no one else will touch it, your cost is very low. Which is one of the reasons that Software-as-a-Service is such a great sort of paradigm shift for our industry because it means once you’ve seen the bug one time or two times or five times or however many times it takes to notice, you fix it and no one will ever see it again.”
“This might sound trite,” said Alexsys’ Bianchi, “but it comes down to communication. The dev team is not thinking about what marketing needs. It’s important to make everyone’s voices heard.”