Application Performance Monitoring: What it means in today’s complex software world

Published: April 3rd, 2020

Software continues to grow as the driver of today’s global economy, and how a company’s applications perform is critical to retaining customer loyalty and business. People now demand instant gratification and will not tolerate latency — not even a little bit.

As a result, application performance monitoring is perhaps more important than ever to companies looking to remain competitive in this digital economy. But today’s APM doesn’t look much like the APM of a decade ago. Performance monitoring then was more about the application itself, and very specific to the data tied to that application. Back then, applications ran in datacenters on-premises, and written as monoliths, largely in Java, tied to a single database. With that simple n-tier architecture, organizations were able to easily collect all the data they needed, which was then displayed in Networks Operations Centers to systems administrators. The hard work came from command-line launching of monitoring tools — requiring systems administration experts — sifting through log files to see what was real and what was a false alarm, and from reaching the right people to remediate the problem.

In today’s world, doing APM efficiently is a much greater challenge. Applications are cobbled together, not written in monoliths. Some of those components might be running on-premises while others are likely to be cloud services, written as microservices and running in containers. Data is coming from the application, from containers, Kubernetes, service meshes, mobile and edge devices, APIs and more. The complexities of modern software architectures broaden the definition of what it means to do performance monitoring.

“APM solutions have adapted and adjusted greatly over the last 10 years. You wouldn’t recognize them at all from what they were when this market was first defined,” said Charley Rich, a research director at Gartner and lead author of the APM Magic Quadrant, as well as the lead author on Gartner’s AIOPs market guide.

So, although APM is a mature practice, organizations are having to look beyond the application — to multiple clouds and data sources, to the network, to the IT infrastructure — to get the big picture of what’s going on with their applications. And we’re hearing talk of automation, machine learning and being proactive about problem remediation, rather than being reactive.

“APM, a few years ago, started expanding broadly both downstream and upstream to incorporate infrastructure monitoring into the products,” Rich said. “Many times, there’s a problem on a server, or a VM, or a container, and that’s the root cause of the problem. If you don’t have that infrastructure data, you can only infer.”

Rekha Singhal, the Software-Computing Systems Research Area head at Tata Consultancy Services, sees two major monitoring challenges that modern software architectures present.

First, she said, is multi-layered distributed deployment using Big Data technologies, such as Kafka, Hadoop and HDFS. The second is that modern software, also called Software 2.0, is a mix of traditional task-driven programs and data-driven machine learning models. “The distributed deployment brings additional performance monitoring challenges due to cascaded failures, staggered processes and global clock synchronization for co-relating events across the cluster, she explained. ”Further, a Software 2.0 architecture may need a tight integrated pipeline from development to production to ensure good accuracy for data-driven models. Performance definition for Software 2.0 architectures are extended to both system performance and model performance.”

Moreover, she added, modern applications are largely deployed on heterogeneous architectures, including CPU, GPU, FPGA and ASICs. “We still do not have mechanisms to monitor performance of these hardware accelerators and the applications executing on them,” she noted.

The new culture of APM
Despite these mechanisms for total monitoring not being available, companies today need to compete to be more responsive to customer needs. And to do so, the have to be proactive. Joe Butson, co-founder of consulting company Big Deal Digital, said, “We’re moving to a culture of responding ‘our hair’s on fire,’ to being proactive,” he said. “We have a lot more data … and we have to get that information into some sort of a visualization tool. And, we have to prioritize what we’re watching. What this has done is change the culture of the people looking at this information and trying to monitor and trying to move from a reactive to proactive mode.”

In earlier days of APM, when things in application slowed or broke, people would get paged. Butson said, “It’s fine if it happens from 9 to 5, you have lots of people in the office, but then, some poor person’s got the pager that night, and that just didn’t work because what it meant in the MTTR — mean time to recovery — depending upon when the event occurred, it took a long time to recover. In a very digitized world, if you’re down, it makes it into the press, so you have a lot of risk, from an organizational perspective, and there’s reputation risk.

High-performing companies are looking at data and anticipating what could happen. And that’s a really big change, Butson said. “Organizations that do this well are winning in the marketplace.”

Who’s job is it, anyway?
With all of this data being generated and collected, more people in more parts of the enterprise need access to this information. “I think the big thing is, 10-15 years ago, there were a lot of app support teams doing monitoring, I&O teams, who were very relegated to this task,” said Stephen Elliot, program vice president for I&O at research firm IDC. “You know, ‘identify the problem, go solve it.’ Then the war rooms were created. Now, with agile and DevOps, we have [site reliability engineers], we have DevOps engineers, there are a lot broader set of people that might own the responsibility, or have to be part of the broader process discussion.”

And that’s a cultural change. “In the NOCs, we would have had operations engineers and sys admins looking at things,” Butson said. “We’re moving across the silos and have the development people and their managers looking at refined views, because they can’t consume it all.”

It’s up to each segment of the organization looking at data to prioritize what they’re looking at. “The dev world comes at it a little differently than the operations people,” Butson continued. “Operations people are looking for stability. The development people really care about speed. And now that you’re bringing security people into it, they look at their own things in their own way. When you’re talking about operations and engineering and the business people getting together, that’s not a natural thing, but it’s far better to have the end-to-end shared vision than to have silos. You want to have a shared understanding. You want people working together in a cross-functional way.”

Enterprises are thinking through the question of who owns responsibility for performance and availability of a service. According to IDC’s Elliot, there is a modern approach to performance and availability. He said at modern companies, the thinking is, “ ‘we’ve got a DevOps team, and when they write the service, they own the service, they have full end-to-end responsibilities, including security, performance and availability.’ That’s a modern, advanced way to think.”

In the vast majority of companies, ownership for performance and availability lies with particular groups having different responsibilities. This can be based on the enterprise’s organizational structure, and the skills and maturity level that each team has. For instance, an infrastructure and operations group might own performance tuning. Elliot said, “We’ve talked to clients who have a cloud COE that actually have responsibility for that particular cloud. While they may be using utilities from a cloud provider, like AWS Cloud Watch or Cloud Trail, they also have the idea that they have to not only trust their data but then they have to validate it. They might have an additional observability tool to help validate the performance they’re expecting from that public cloud provider.”

In those modern organizations, site reliability engineers (SREs) often have that responsibility. But again, Elliot here stressed skill sets. “When we talk to customers about an SRE, it’s really dependent on, where did these folks come from?” he said. “Where they reallocated internally? Are they a combination of skills from ops and dev and business? Typically, these folks reside more along the lines of IT operations teams, and generally they have operating history with performance management, change management, monitoring. They also start thinking about are these the right tasks for these folks to own? Do they have the skills to execute it properly?”

Organizations also have to balance that out with the notion of applying development practices to traditional I&O principles, and bringing a software engineering mindset to systems admin disciplines. And, according to Elliot, “It’s a hard transition.”

Compound all that with the growing complexity of applications, running the cloud as containerized microservices, managed by Kubernetes using, say, an Istio service mesh in a multicloud environment.

TCS’ Singhal explained that containers are not permanent, and microservices deployments have shorter execution times. Therefore, any instrumentation in these types of deployment could affect the guarantee of application performance, she said. As for functions as a service, which are stateless, application states need to be maintained explicitly for performance analysis, she continued.

It is these changes in software architectures and infrastructure that are forcing organizations to rethink how they approach performance monitoring from a culture standpoint and from a tooling standpoint.

APM vendors are adding capability to do infrastructure monitoring, which encompasses server monitoring, some amount of log file analyst, and some amount of network performance monitoring, Gartner’s Rich said.Others are adding or have added capabilities to map out business processes and relate the milestones in a business process to what the APM solution is monitoring. “All the data’s there,” Rich said. “It’s in the payloads, it’s accessible through APIs.” He said this ability to bring out visualize data can show you, for instance, why Boston users are abandoning their carts 20% greater than they are in New York over the last three days, and come up with something in the application that explains that.

Article Tags

APM, cloud, containers, Kubernetes, microservices, performance monitoring

About David Rubinstein

David Rubinstein is editor-in-chief of SD Times.

View all posts by David Rubinstein

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Application Performance Monitoring: What it means in today’s complex software world

Article Tags

Subscribe to SDTimes

About David Rubinstein

Related Articles

Mirantis reveals Lens Prism, an AI copilot for operating Kubernetes clusters

Modernizing your approach to governance, risk and compliance

Plotly brings vibe coding to visual data app development

Catchpoint adds OpenTelemetry-based real-user monitoring for mobile devices