Technical debt is real, and teams need a strategy to manage it. Just about every software project accrues tech debt over time. Tech debt manifests itself during development and in production. Ignore it or don’t service it at your peril.
Before continuing, let’s look at a simple definition of tech debt. The one I like goes something like this: Technical debt is the difference between what was promised and what was actually delivered. This includes technical shortcuts made to meet delivery deadlines, whether inadvertent or not.
Tech debt sneaks up on codebases and is insidious in where it hides and the problems it causes. While some tech debt is planned, most is not. The longer a team waits to address tech debt, the harder it is to refactor. Over time it burrows into a codebase, making it more and more challenging and time-consuming to manage.
Studies have shown that a codebase reaches the tipping point when there’s 60% tech debt. At this point it becomes hard to recover from. This is, of course, an extreme situation, but even lower percentages of tech debt are challenging. Software cannot adapt fast enough in the face of a meaningful amount of tech debt. Above all, it impacts productivity and morale. It also makes codebases harder to maintain and extend.
The first step is to stop adding to the accumulated tech debt. That is, all new functionality will be written according to “professional coding guidelines.” In addition, all code, old or new, will be reviewed.
From the agile perspective, every story developed and any work being promoted from the backlog should not increase tech debt. Teams and individuals generally know how to write clean code, so this needs to become the ongoing practice.
The accrual of tech debt negatively impacts and is in direct conflict with agility. Accessing and reducing tech debt is a key part of any agile program. Along with code-specific issues, architects who don’t know how to design extensible architectures and programmers who don’t know how to write clean code are issues that must be addressed.
Where technical debt comes from
Tech debt is one of those pay-me-now-or-pay-me-later kinds of issues. While there are no good reasons to incur tech debt, there are reasons teams knowingly add to it. Unrealistic project deadlines and stakeholder demands often lead teams to sacrifice quality or take shortcuts they normally wouldn’t. Teams promise to remediate the issues after release, but this doesn’t always happen.
There are lots of opportunity for tech debt to occur. Since technical resources don’t generally plan for tech debt, much of it is created without and awareness that it’s even happening. Martin Fowler categorizes tech debt as “reckless vs. prudent” and “deliberate vs. inadvertent.” This is a good way to think about tech debt, and although the reasons tech debt accumulates are somewhat obvious, it’s worth reviewing. The reason for this is to identify the specific attributes that are relevant to you and address them.
Consider the following:
– unrealistic schedules
– scope creep
– inexperienced resources
– lack of code reviews
– few code comments
– little use of frameworks or patterns
– unrefined requirements
– poor preparation/not thinking through the problem
– lack of professionalism
– getting the job done quickly, then worrying about fixing it later
– a developer who knows he’s not going to be responsible for maintaining (or owning) the code he/she writes
Let’s review where tech debt comes from and what this means.
Technical debt is a metaphor coined by Ward Cunningham in a 1992 report. In his report he says, “Although immature code may work fine and be completely acceptable to the customer, excess quantities will make a program unmasterable, leading to extreme specialization of programmers and finally an inflexible product. Shipping first-time code is like going into debt.
“A little debt speeds development so long as it is paid back promptly with a rewrite. Objects make the cost of this transaction tolerable. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a standstill under the debt load of an unconsolidated implementation, object- oriented or otherwise.”
The longer tech debt is allowed to accrue, the harder and more expensive (either in resources, time and/or money) it becomes to fix. In other words, and as mentioned above: Pay now or pay a lot more later. It’s easier to identify, debug and clean code the more recently it was written. (Unfortunately, teams aren’t always given the time to refactor.)
This debt needs to be paid down. That is, bad code needs to be remediated, questionable design decisions need to be addressed, untested code needs to be tested, and infrastructure shortcomings need to be corrected. Tech debt has similar attributes to real financial debt. Eventually the loan comes due, and in the case of software, things start to break.
What technical debt comes from
On a broad scale, tech debt falls into a variety of categories. For example:
– Code debt: poorly written, convoluted design, ball of mud/ spaghetti, hard-coded elements
– Design/architecture debt: few abstractions, lack of separation of concerns, questionable object/component model
– API debt: not granular, poor error handling, slow or unresponsive, poorly structured
– Quality debt: little automation, poor test coverage, inadequate unit tests, performance testing an afterthought
– Infrastructure debt: old equipment, not easily scalable, sloppy deployment, little replication, unrealistic data recovery plans
Agile teams are often able to articulate the reasons and address the root causes of tech debt. One of the best places to do this is the retrospective. Retros are a place where issues can be freely discussed and remedial actions defined: What’s working well vs. what’s not, what to continue and what to stop, what to change and/or something new to try.
Sometimes it’s valuable to plan for a tech-debt-specific retrospective. Depending on the team, consider bringing in a skilled agilest to facilitate it. The goal is to bring together the people who know where the skeletons lie to articulate a plan to fix it.
There is a real derived value from the effort spent in remediating tech debt. Benefits extend to customers, the team, system performance, quality and productivity. This, while true and good, means getting management to understand the implications of tech debt and the realities of what happens if it isn’t paid off is difficult.
There are many competing priorities for limited development resources. To the uninformed, tech debt is like an iceberg: Most of the danger is hidden from view. Management and even stakeholders may not be willing to let the team spend cycles on projects that aren’t directly related to revenue or competitive concerns. So what if you built it and now have to maintain your own file system? Customers aren’t complaining—at least not today!
It has been said that “unaddressed technical debt increases software entropy.” From an R&D perspective, entropy measures the amount of bandwidth that is not available for work. Unchecked, it leads to the inevitable decline and degeneration of the technical environment: people, software and infrastructure. In other words, velocity and productivity slows.
Management needs to understand this, and it’s the job of technical leaders to make sure this is so. Describing tech debt in terms of financial debt is a good way to begin the discussion. It’s an analogy everyone can understand and relate to.
For example, new code or code modifications often impact other pieces of code—some obvious, some not so. The less well structured the codebase, the more likely this is. Sometimes the impacted code is not fully modified because there’s not enough time, or it is left to be worked on “later.” Just like financial debt, the uncompleted work incurs interest, which manifests itself by reducing team velocity, generating defects, or making the code brittle and less extensible. The interest compounds, increasing the overall debt.
So, what do we do about technical debt? First, remediating tech debt is not really a choice; it’s the obligation of every professional developer and mature development team.
Tech debt stories should be in the backlog and identified as impediments in retrospectives. Story point estimates should include time for refactoring. Of course, this is sometimes a hard sell to product owners since the time spent refactoring could be used for delivering new features.
Stakeholders need to understand that there are really only two ways developers can build code. One is quickly and probably not cleanly. This approach usually doesn’t involve peer code reviews. The alternative is with design aforethought using well-established patterns. Method 1 increases the outstanding tech debt and makes future changes harder. Method 2 takes longer but makes future changes easier.
The mindset that a feature is a feature regardless of the implementation is shortsighted. The imagined savings of a “hacked” solution quickly disappear in the medium and long term. This is due to increased defects, slower cycle times, performance issues and other problems. It is not much of a leap to believe that vulnerabilities are more likely to be embedded in poorly designed and written code than the other way around.
All technical debt is not created equal. In a perfect world, there is no technical debt because the codebase has been fully refactored and all new code is well written and reviewed. Huzzah! In the real world, teams must identify and prioritize existing code and design flaws.
Given the finite amount of time available to remediate a codebase, ranking tech debt is a critical exercise. Some code is rarely executed, so there’s not a strong ROI in fixing it. If a new or improved feature or component is under development and will soon replace an existing malformed feature, there’s little value in refactoring that code. This is true even if it’s a highly used component.
Using the concept of derived value is important to understanding what debt to pay off and in what sequence. Bad code that doesn’t deliver key features may not be perfect. “Good enough” is a reasonable reason to not spend time on this code. A general rule of thumb is to allocate 20% of capacity to addressing tech debt.
Derived value can help answer the “Which parts of the code should be high quality?” question. There’s no one right answer, but different constituencies derive different levels of value from the features exposed by the codebase.
Customers derive value from functionality, but they also derive value from good performance. Development teams derive value from well-designed code and architectures because this allows them to deliver code at a consistently high velocity. The sales team derives value from new features or features that provide market differentiation. The Ops team derives value by not having to manage frequent patches. Marketing derives value from predictable, quality releases that deliver new capabilities.
The goal is to associate pieces of the codebase with the different constituencies and assign values to each. There’s as much subjectivity as objectivity to this approach. Regardless, the exercise will prove useful by getting everyone on the same page in terms of what code has the most value and, by extension, what code should be the cleanest.
Of course, things are never as straightforward as they should be. Early in a product’s life cycle there may an advantage to releasing it early despite the goodness of the codebase. This may be due to market share or profit. Of course, these benefits may only pay off if the advantage(s) gained are sustainable compared to delivering a release later but with better quality.
It is hard to estimate the impact of these competing themes since there is always uncertainty. Because no one has a crystal ball, all stakeholders must be aware of the risks and make the best judgment they can after listening to the technical trade-offs and their longer-term consequences.
Some might ask if tech debt is avoidable. On the white board, yes, but in the real world not really. If time and cost were not factors, the tech teams could spent all of their time designing perfect architectures and developing well-abstracted, clean code. Unfortunately, this is not the world we live in.
In our world, the damage due to bad code quality has tangible, negative impacts. These impacts worsen over time as the codebase continues to degenerate. Making matters worse, it’s often the case that the person who created the debt is not the one repaying it.
Identifying technical debt
One key problem is that teams don’t often know where the debt actually lies. The older the codebase and the further back in time a piece of code was written, the more hidden the “bad” code is. In other words, tech debt isn’t highlighted in the codebase in bold, and it’s not obvious where it begins and ends. Over time, code gets maintained and enhanced, and the old and new code get commingled. If developers are not careful, this code creep infects the newer, hopefully better-structured code.
There are warning signs, and people and teams have tells that help identify tech debt. How often have you heard someone say, “If I touch that code, lots of other things will break”? Similarly, “John is the only one allowed to work on this piece of code.” Determine your own canary-in-the-coal-mine indicators.
Well-organized Scrum teams use metrics. If an established team has a certain velocity that suddenly drops, the code being working on may need refactoring. Tech debt is often found in old libraries, integration points or components written in languages that the team uses less frequently. Pieces of code that aren’t regularly touched or repeatedly cause problems are good places to locate debt as well.
The best way of identifying debt is through the programmers who work with the code daily. They’re in the best position to discuss this because they’re doing all of the heavy lifting. As mentioned earlier, tech-debt-specific retrospectives are very revealing and lead to frank, objective conversations about the state of the codebase and what has to be done in terms of remediation.
Analyzing defect logs or customer support call logs is quite telling. If multiple defects appear in the same section of code or a feature, there’s likely a problem. There must be a reason why customers repeatedly call about certain functions, sections of the solution, or other problems. Where there’s smoke, there’s often fire.
Risks and dependencies are two attributes of refactoring that need mentioning. We know that not all bad code needs to be fixed, and certainly not concurrently. Sometimes there’s more risk in cleaning up a troubled piece of code than leaving it as is. For code like this, there’s no such thing as too many code comments.
Dependencies are a standard part of building software. Queries have to be revised to take advantage of schema changes, but only after the data model has been updated. A feature can’t use new search capabilities until the enhanced search component is released. Once it’s been decided to refactor a piece of code, it’s critical to understand the upstream and downstream consequences of doing this.
A system becomes hard to maintain if its class structure (for example) has too many dependencies to other classes. If a developer is making a change, you must understand any associated class dependencies. Issues like this are as much about how the software is architected as it is about the code itself.
Complexity is a topic that should be discussed by every development team. It’s easy to design an elegant architecture on the whiteboard. More often than not, implementing that elegant architecture involves all sorts of complexities. This is important because complexity is the enemy of an extensible, maintainable codebase.
Just like code, architectures get brittle and become unable to satisfy the needs of the solution. For example, the original release was designed to support a certain level of concurrent load, but now the architecture must support 3x that load. Synchronous communications were originally OK, but now there’s a requirement for asynchronous. The list goes on and on.
Note: Attributes like complexity, code quality and poor design are in the eye of the beholder. There’s a degree of subjectivity that is not easily avoided. Given this, it’s important for teams to build a common viewpoint around these terms.
To avoid architectural debt, implement the simplest design possible. Use small, granular services that satisfy one requirement only. Take advantage of proven frameworks and patterns.
This is fine when designing a new system or independent module, but most teams are dealing with an architecture designed years ago for a different class of problems. Without change, and the more functionality that gets bolted on, the less likely the architecture will be able to support the needs of users.
Sometimes the wrong technology or stack was chosen, but there was never time to go back and correct that. Problems like this, along with the ones mentioned above, manifest themselves in many ways. To name a few: performance, scalability, long enhancement cycle times, convoluted schema, production outages, and so on.
Addressing technical debt
A few rules to begin with. The first is, code should never be refactored without a good reason. A good reason might be implementing a high-value business requirement. If this requirement needs to be implemented in a poorly written piece of code, then that code must but refactored first.
The second rule is that for most cases, code only needs to be refactored so that it’s “good enough.” The great majority of code doesn’t need to be perfect, whatever that might be, so don’t try to make it so. The team can determine what constitutes good enough.
The third rule is that not all tech debt needs to be refactored. Identifying tech debt doesn’t equate to the absolute need to refactor it. Consider using some form of derived value to help make this determination.
With the above in mind, consider using a technical backlog to address outstanding debt that has been determined to need refactoring. This backlog is the same as any other. Stories need to be written about the specific refactoring requirements for the piece of code in question. Estimating the time or amount of work required to clean up the code must also be determined. This estimate should be based on what it will take to clean up the code without interference. Clean up in the sense of good enough.
Similar to feature stories, tech stories should be transparent and clear to both the Scrum team and stakeholders. Stakeholders understand the value of building out requirements. The same is not so for technical stories, so including some measure of cost to value makes sense.
Technical work and feature work are going to be mixed together based on product owner prioritization. Non-technical owners need to clearly understand the value of the work so it can be correctly prioritized. It’s not enough to just say that refactoring this code will save time and effort in the future. A more complete explanation is required to ensure the right decisions are made.
Testing is, of course, a great way to ferret out code problems, including tech debt. Unit testing, feature testing, performance testing, UI testing and regression testing all play important roles in code quality. Comprehensive test suites coupled with fully automated regressions help detect poor-quality code with little manual effort.
In addition to code and design debt, data schema, UI and the infrastructure are all candidates for refactoring. Older code is also more likely to expose vulnerabilities. This is a class of tech debt that must be taken seriously and remediated.
Tools play an important part in code quality. A discussion of tech debt is incomplete without at least mentioning the value that tools bring to the table. Test coverage tools, code analysis tools, data model tools, and code vulnerability analyses can help identify challenged code. There are even tech-debt-specific measurement tools like Sonar and Structure101.
Disciplined teams use a variety of tools as part of their Continuous Integration strategy. Knowing where technical debt lives is the first step in removing it. Tools are an important ally but are not a substitute for the methods described above.
Conclusion
Let’s end where we began: Tech debt is real and teams need a strategy to manage it. Reducing tech debt should be part of every team’s culture. It belongs right up there with quality and accountability. Tech debt eventually needs to be addressed, so it becomes a question of pay now or pay more later.
Unfortunately, tech debt doesn’t fix itself. Worse yet, tech debt accrues “interest” over time in the forms of lower productivity, performance issues, quality, morale and expenses. Teams need to take responsibility for managing their codebases and the quality of their deliverables.
Not all tech debt needs to be remediated and, as we’ve discussed, there are situations where deciding to accept some short-term tech debt is OK. Part of the decision to accept technical debt is to also ensure that time will be available to remediate it. The problem is that teams, even with the best of intentions, often don’t find the time to refactor this code. Deadlines, critical bugs, and changes of focus are just a few reasons why this occurs.
While everyone agrees that tech debt is bad, it’s unclear what the exact impact of a quality-challenged piece of code will be. Using derived value to help determine refactoring priorities is a good way to address this. Using the argument that it’s impossible to predict the impact of tech debt with the goal of ignoring it entirely will likely have a negative impact on the team’s ability to deliver software down the road.
The development team and stakeholders must work together to handle the risk of tech debt. It is important for all constituencies to remember that codebases, architectures and infrastructures will always have tech debt. However, depending on where it lives, tech debt isn’t always bad. The concept of “good enough” must be used when refactoring code. That is, for every case, it isn’t always necessary to pay down tech debt in full.
As software ages, managing tech debt becomes an increasingly critical aspect of producing cost-effective, timely and high-quality products. A balance must be achieved between sales/marketing’s desire to release new features rapidly vs. the technical team’s desire to practice sound software engineering practices. Practices that deliver quality, maintainability and extensibility while minimizing future rework. It’s a question of short-term value delivery vs. longer-term effects.
For those who aren’t sure if tech debt exists in their environment, complete this lengthy questionnaire to determine the answer:
Do you suspect you have technical debt? If YES, then you have technical debt. If NO, then you still likely have technical debt.
To state the obvious, prevention is the best remedy to tech debt accrual. Identifying, prioritizing and developing a strategy to eliminate debt is great. If there’s no plan oriented to preventing it in the first place, you’ll find yourself in much the same place in the future.
So, what are the key attributes both teams and management should embrace to stop the cycle of tech debt creation?
– Get better at developing software generally by ensuring the team consists of mature, professional resources
– Adopt a truly agile process with meaningful stakeholder involvement
– Keep the design simple
– Refactor whenever necessary; don’t put it off
– Employ as many clean code techniques as possible
– Practice continuous improvement
– Test everything
– Measure everything and publish the metrics
– A culture of quality and continuous improvement is key. This includes a definition of what constitutes “done”
Now go out there and great create great software.