3 areas where traditional APMs leave developers exposed

Published: December 5th, 2017

If you’re responsible for creating or managing a customer-facing application for your organization, you have a long list of things to worry about. A scenario like this may actually be at the top of the list: you’ve recently launched a new version of your application to the world, and customers start finding serious issues in production. Excessive latency in the application is destroying its UX. While the APM you’re using is picking up on some of these issues, it is catching them too late. Your customers are already complaining directly to the company, and voicing their displeasure on social media, and your management team is asking, “How did this happen?”

This nightmare scenario is the kind of thing that even the best companies in the world can experience. Google, for example, found that traffic dropped by 20 percent with just an extra half-second in search page generation time. Amazon discovered that each additional 100ms of latency resulted in 1 percent fewer sales. If even these giants can fall victim to application issues in production, it can happen to anyone.

Relying solely on traditional APMs may be leaving you open to risk in three key areas:

Finding performance issues early
Diagnosing the root cause of performance issues
Fixing performance issues

Finding performance issues

One of the biggest questions for those managing application performance is whether they are finding issues as early as possible. The answer for most organizations is no. In fact, 75 percent of developers report that their performance issues affect their end users in production. APM solutions are traditionally designed to work in production only.

Traditional APMs aren’t built for the testing phase. While traditional APMs are generally built to focus on production environments, some organizations try to use them in the earlier stages of development and test. What they often find is that the metrics and reporting aren’t effective for these stages. A production-focused APM will provide a statistical analysis of your application performance that is essentially an aggregated result of thousands of transactions. This can help point to major issues that may be affecting performance, but because there isn’t any transaction detail, it can be a very vague indicator of the problem. Bottom line: traditional APMs are indicators of trends but those trends aren’t always real problems.

Developers are disconnected from how their code changes affect overall performance. In many companies, we still have a situation where developers aren’t tied directly to the performance of the applications they build. They build their applications and throw them over the wall to an operations team in production, and when that team finds issues, they are thrown back to the development team to fix.

The DevOps movement has urged companies to try to get away from this by creating one big virtual team and to “shift left” some of the functions and responsibilities from operations to development.

But even in DevOps environments, we still see much of the testing happening in production, and the majority of APM tools geared to operations or performance experts. Because of this, developers don’t always feel they are ultimately responsible for delivering performant code, as long as they are meeting functional requirements. This has created a bit of a divide between development and operations teams that still make it difficult to find issues. In order to bridge across these two teams, developers should have more of an ability to gain insight and influence the performance of the applications they’re building. Today’s production-focused APMs don’t give them the ability to do that.

Diagnosing the source of performance issues

Once you’ve found an application issue, you have the difficult task of diagnosing the source of the issue. This is a task that becomes more and more difficult as you move away from the development process into production. Teams that test too late are forced to diagnose performance issues that are happening in complex infrastructures and scenarios. In reality, 86 percent of root causes are application-level issues that will manifest in development environments, and scale with the environment. It makes sense therefore to try to catch these application-level issues early when it’s easier to find the root cause.

Overly complex scenarios. Once an application makes it to production, it is small part of a large, often complex system. It is no longer just about whether the application works, but is about all of the technologies that surround the app, from the network infrastructure to distributed systems. A Dynatrace study found that on average, a single transaction uses 82 different types of technology. This makes trying to diagnose the source of a performance issue in production like finding a needle in a haystack.

Because this complexity makes it difficult to accurately diagnose the source of the issue, most problems aren’t actually solved, they’re simply patched. Worse yet, hastily delivered fixes often break something else, and with every day that passes, the problem gets worse and more convoluted.

No root-cause analysis. As we already covered, traditional APMs are high-level enough to tell you that a problem exists and point to the general area that is affected. They’re built to monitor incredibly complex infrastructures, so a general health report is immensely useful in production scenarios for operations teams. Traditional APMs are not, however, as valuable for development teams looking to diagnose the source of the issue because they don’t offer a detailed root-cause analysis. When an issue is detected and a ticket created and passed on to a development team, actionable data still needs to be mined by performance experts using other toolsets, likely in a staged environment.

The issue may be conditional and hard to reproduce, delaying the diagnosis even further, especially if you don’t have any affected customers volunteering to be guinea pigs. All of this again leads to situations where an issue may be patched versus fixed.

Fixing performance issues

This is the area left most exposed by traditional APMs, as issues are ultimately fixed by developers. Production-focused APMs don’t line up with the workflow of a developer’s day-to-day, so adoption and usage among development teams is a challenge .Developers are already dealing with tight deadlines and product pressures, so the complexity of traditional APMs simply does not make it worth their time to figure out how to get actionable data.

On top of that, traditional APMs are seen as absolute overkill in a development environment. After all, they’re built for operations, not development, and have many features that developers don’t need. They alert you to an issue and point you in a general direction, but they don’t provide low-level data presentations that cater to the needs of developers fixing the issues. Because of that, companies run into the following problems when trying to fix issues with traditional APMs.

No fix validation available. Setting up and configuring a traditional APM on a development machine is a large task for potentially little return, as they don’t provide features that aid in isolating, fixing and testing an issue in a development environment. Traditional APMs are unable to provide developers with immediate feedback so they can see how code changes are impacting the performance of the application they’re working on.

Fixing performance issues

In order to verify a bug fix, development teams have to wait until it’s been deployed to production. The fix-test cycle is incredibly costly in time and business-impact if the bug is live. Long feedback loops between the owner of the code and manifestation of issues in production complicate a fix.

The process for fixing problematic code often involves going to the author of the code with the assumption that he/she can easily pick up where they left off. However, because it can often take months for code to be released into production from when it’s developed, the developers aren’t seeing this problematic code until long after it has been written. At this point, the code may be unfamiliar, even to the developer who wrote it, and others may have built on top of the problematic code making it part of a big spaghetti codebase. In the time it takes to research, replicate and develop a fix for an issue, hundreds and thousands of customers can be affected.

Takeaways

The way that most companies currently handle performance management is broken. When you wait until production to catch issues with your application, your customers will find them before you do. And when you take issues that are found in production and send them back to development teams to fix, it will take longer and cost more than if you had fixed them in the development or test phases to begin with. Every team, particularly DevOps focused teams, should take a close look at how they can improve the speed with which they find, diagnose and fix performance issues.

If you’re not testing early, your customers are your testers. If you’re subjecting real users to production code that hasn’t been thoroughly performance tested, this is a great recipe for losing your customers.

If you’re testing early with production APMs, you’re not using the right tools. Traditional APMs are built for operations, and are essential to production, but are not built for developers in testing and development. Instead, look for APM tools built specifically for development and test. Organizations that want to shift left to catch performance issues earlier, need to also shift their toolset towards development-focused solutions.

Article Tags

APM, developers, ZeroTurnaround

About Simon Maple

Simon Maple is director of Developer Relations at ZeroTurnaround

View all posts by Simon Maple

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

3 areas where traditional APMs leave developers exposed

Article Tags

Subscribe to SDTimes

About Simon Maple

Related Articles

Beyond the Blue Link: Why “Borrowed Authority” is the Only Way to Reach Engineering Leaders in the AI Era

Stack Overflow: Developers’ trust in AI outputs is worsening year over year

Catchpoint adds OpenTelemetry-based real-user monitoring for mobile devices

Q&A: Why the Developer Relations Foundation is forming