Application performance management (APM) solutions need to adapt now that the age of monolithic applications has evolved into microservice-based architectures, which are innately distributed and complex and therefore harder to monitor.
Collecting vast troves of data on how apps are performing is no longer enough, and APM providers have been adding new ways to analyze that data that will drive meaningful and hyper-fast solutions to expose any bottlenecks or code dependencies. Whether that’s by adding AI, ML, new plugins or methods of monitoring, reliability and speed is on everyone’s mind.
“It’s not just enough to monitor specific isolated metrics because it’s not enough to just detect that something’s wrong. You need to act fast because the environment is fast. The end user reaction to degradation is catastrophic,” said Daniella Pontes, the senior product marketing manager at InfluxData. “If you are in a big event day, you are talking about hundreds of thousands of dollars per minute or billions per day. So you can’t afford a degradation that can not be quickly identified and most importantly, fixed.”
In 2017, The Economist reported that the world’s most valuable resource is no longer oil, but data.
But data in application monitoring isn’t effective if it can’t be analyzed, which makes it all the more crucial to have easy-to-use and intuitive monitoring to transform that data into outcomes, Pontes added.
Most commonly, teams use APM tools when they find out that their app is running slow, according to Denny LeCompte, the general manager of application management at SolarWinds.
“You’re then trying to find out as rapidly as you can, is it the code? Is it the infrastructure? Is it the network? Is it the database? You’re trying to figure out where in the stack it is. If you can provide an application team a way to reduce the meantime to resolution or meantime to innocence, that’s it,” LeCompte explained.
APM solutions leverage data that is collected through API gateways, service mesh, business transaction tracking, log analytics and container APIs to determine both the performance experienced by end users of an application and to measure the computational resources to see whether there is an adequate capacity to support a load and to find potential bottlenecks.
Service mesh is a relatively new method that aids APM in microservices.
“Instead of using an API gateway which can be challenging, service meshes are a very new modern way that we can concentrate, be a proxy, and provide a point that all microservices can report to,” said Charley Rich, a senior director analyst at Gartner. “And then a monitoring tool can inquire to the service mesh to capture the collection of data. So it can act as a collection point and you can help in terms of ease of deployment and potentially performance.”
Another trend is the use of OpenTracing. OpenTracing is a CNCF project that includes a set of vendor-neutral APIs and instrumentation that is used for distributed tracing.
“OpenTracing, census telemetry, service mesh and others need to be explored and utilized,” Rich said. “We’re moving from an era of the monitoring solutions go out and collect the data they need to an area where the infrastructure and applications are reporting back that information.”
Another major change in who uses the APMs in an organization has shifted more towards the developers, according to LeCompte.
“Ten years ago the app dev guys would not have cared. That was not their problem. Whereas now, they’re definitely more involved and when there is a problem, they are more likely to go into the tool and expect the monitor tool to help them understand,” LeCompte said. “It’s getting to the point where any sort of application team would feel naked without a tool to provide them with visibility.”
Meanwhile, Pontes said APM solutions have evolved to a point where all parts of a team are using it. The developers are using APM to understand how fragmented code performs before moving forward with it in the production environment. The CI/CD teams are using it to understand what kind of impact that change can do and IT teams are using it to make sure everything stays as it should.
What used to be one slowly changing monolith is now all of a sudden dozens of quickly changing microservices that get changed on a weekly or even daily cadence, according to Ivo Mägi, the CEO of Plumbr.
“Every change is risky by nature so you need to keep a closer eye on your microservices-based architecture because errors are just more likely to happen in situations where you have really agile release cadences,” Mägi said.
He added that APM helps users with availability metrics so that whenever those metrics drop below tolerable levels, the teams are aware of the issues emerging. Another important aspect is the distributed tracing throughout all the microservices in the back end that allows one to zoom in to the exact service failing and, better yet, into the single line of source code in a particular service failing. These functionalities cut down the time to resolution for every incident.
“Technical monitoring solutions like APMs are similar to sport watches in the sense that through some sensors they gather data and turn it into information. It would be like monitoring the heart rate or steps done during the day. Now if I just see that I did 3000 steps during the day, I don’t know whether I just broke the world record or am I the laziest guy in the world.. I actually haven’t changed my habits nor really gained anything It’s just a distraction after a while,” Mägi explained. “But if I know that 10,000 steps a day keeps the doctor away and that coupling this with an actual action and doing the remaining 7,000 steps, I have gained quality in my life. And to me this is really similar to what APMs are able to do. If you understand how and why performance and availability can impact your business and know when to respond then you can actually have a significant impact on your business.”
However, despite all of its benefits, creating an effective APM solution comes with a set of challenges. According to Rich, the biggest challenge when monitoring microservices is its ephemerality, and APM vendors have to adapt to work with it.
“Usually agents for most cases are specific, so that’s problematic for a lot of vendors. To package agents in the containers, I need to know in advance what’s going to go into a container image. That’s a lot of work. And it also makes me more static when I’m trying to be agile,” Rich said. “They’re just there for moments, then gone and somewhere else, which makes monitoring challenging. That’s different from the traditional approaches to monitoring within an enterprise in a cloud,”
Another challenge, according to a Gartner report, is that many organizations don’t provide production visibility for the application development and DevOps teams that build microservice-based applications, resulting in an isolation from the IT teams that are responsible for operational deployment.
To fix these problems, Gartner recommends companies adopt a coordinated monitoring strategy between operations, developers and DevOps teams, enabling service discovery by using the API gateway layer, leveraging service mesh and maintaining up-to-date service metrics.
Rich said companies that are undergoing digital transformation are the primary candidates for using APM solutions. Mode 2 applications that emphasize agility and speed need to be monitored the most because these are the ones that change frequently. Sometimes changes occur several times a day; therefore, protecting the money-making applications is most critical.
“Anything that’s built now really does need some sort of APM. I don’t really think there’s an application in modern times that doesn’t do better with some level of monitoring,” SolarWind’s LeCompte said. “Lots of customers only monitor the most mission-critical things, but if you built it and it’s running part of your business, then if you’re not monitoring it, you’re just going to be surprised.”
LeCompte said this includes things many people would not immediately regard as an application, such as websites. Yet, web dev and web operations teams are constantly monitoring how different users are perceiving it.
He added that users expect an APM solution to work out-of-the-box and to automate agent deployment.
“Customers don’t want to have to spend weeks rolling this thing out. We do not think that a modern product should require some third party to go spend a bunch of time and money to make it work. It should all be a sort of automatic out of the box,” LeCompte said.
Increasing automation to keep up with continuous deployment
In order to keep up with the rapid pace of monitoring, many APM solutions are adding AI and ML capabilities. Manual APMs are no longer equipped to deal with the dynamism and the scale that microservices require, said Pontes.
“You need to feed the data into artificial intelligence and machine learning frameworks to start automating certain aspects of the workflow. Because the human factor is actually the bottleneck,” Pontes said.
These machine learning additions do things like correlation and analysis to reduce the volume of alerts, preventing a storm, reducing false alarms, detecting anomalies and finding unusual values to then correlate them and then predicting the potential impact, Rich added.
“Machine learning has been embedded in many APM solutions, not necessarily to do anything new but to do what they did before much better,” Rich said.