Observability is the latest evolution of application performance monitoring, enabling organizations to get a view into CI/CD pipelines, microservices, Kubernetes, edge devices and cloud and network performance, among other systems.
While being able to have this view is important, handling all the data these systems throw off can be a huge challenge for organizations. In terms of observability, the three pillars of performance data are logs (for recording events), metrics (what data you decide gives you the most important measures of performance) and traces (views into how software is performing).
Those data sources are important, but if that is where you stop in terms of what you do with the data, your organization is being passive and not proactive. All you’ve done is collect data. According to Gartner research director Charley Rich, “We think the definition of observability should be expanded in a couple of ways. Certainly, that’s the data you need — logs, metrics and traces. But all of this needs to be placed and correlated into a topology so that we see the relationships between everything, because that’s how you know if it can impact something else.”
Bob Friday, who leads the AIOps working group at the Open Networking User Group (ONUG) and is CTO at wireless network provider Mist Systems (a Juniper Networks company), said from a network perspective, it’s important to start with the question, “Why is the user having a problem?” and work back from that. That, he said, all starts with the data. “I would say the fundamental change I’ve seen from 15 years ago, when we were in the game of helping enterprises deal with network stuff, is that this time around, the paradigm is we’re trying to manage end-to-end user experience. [Customers] really don’t care if it’s a Juniper box or a Cisco box.”
Part of this need is driven by software development, which has taken services and distributed deployment environments to a whole other level, by deploying more frequently and achieving higher engineering productivity. And, as things speed up, performance and availability management become more critical than ever. “Infrastructure and ops, these app support teams, have to understand that if more applications are coming out of the factory, we better move fast,” said Stephen Elliot, program vice president for I&O at analysis firm IDC. “The key thing is recognizing what type of analytics are the proper ones to the different data sets; what kinds of answers do they want to get out of these analytics.”
But with that, it’s very important to recognize what type of analytics are the proper ones to the different data sets; what kinds of answers do organizations want to get out of these analytics.
Elliot explained that enterprises today understand the value of monitoring. “Enterprises are beginning to recognize that with the vast amount of different types of data sources, you sort of have to have [monitoring],” he said. “You have more complexity in the system, in the environment, and what remains is the need for performance availability capabilities. In production, this has been a theme for 20 years. This is a need-to-have, not a nice-to-have.”
Not only are there now different data sources, it’s the type of data being collected that has changed how organizations collect, analyze and act on data. “The big change that happened in data for me from 15 years ago, where we were collecting stats every minute or so, to now, we’re collecting synchronous data as well as asynchronous user state data,” Friday said. “Instead of collecting the status of the box, we’re collecting in-state user data. That’s the beginning of the thing.”
Analyzing that data
To make the data streaming into organizations actionable, graphical data virtualization and visualization is key, according to Joe Butson, co-founder of Big Deal Digital, a consulting firm. “Virtualization,” he said, “has done two things: It’s made it more accessible for those people who are not as well-versed in the information they’re looking at. So the virtualization, when it’s graphical, you can see when performance is going down and you have traffic that’s going up because you can see it on the graph instead of cogitating through numbers. The visualization really aids understanding, leading to deeper knowledge and deeper insights, because in moving from a reactive culture in application monitoring or end-to-end life cycle monitoring, you’ll see patterns over time and you’ll be able to act proactively.
“For instance,” he continued, “if you have a modern e-commerce site, when users are spiking at a certain period that you don’t expect, you’re outside of the holiday season, then you can then look over, ‘Are we spinning up the resources we need to manage that spike?’ It’s easy when you can look at a visual tool and understand that versus going to a command-line environment and query what’s going on and pull back information from a log.”
Another benefit of data virtualization is the ability to view data from multiple sources in the virtualization layer, without having to move the data. This helps everyone who needs to view data stay in sync, as there’s but one version of truth. This also means organizations don’t have to move data into big data lakes.
When it comes to data, Mist’s Friday said, “A lot of businesses are doing the same thing. They first of all go to Splunk, and they spend a year just trying to get the data into some bucket they can do something with. At ONUG we’re trying to reverse that. We say, ‘Start with the question,’ figure out what question you’re trying to answer, and then figure out what data you need to answer that question. So, don’t worry about bringing the data into a data lake. Leave the data where it’s at, we will put a virtualized layer across your vendors that have your data, and most of it is in the cloud. So, you virtualize the data and pull out what you need. Don’t waste your time collecting a bunch of data that isn’t going to do you any good.”
Because data is coming from so many different sources and needs to be understood and acted on by many different roles inside a company, some of those organizations are building multiple monitoring teams, designed to take out just the data that’s relevant to their role, and presented in a way they can understand.
Friday said, “If you look at data scientists, they’re the guys who are trying to get the insights. If you have a data science guy trying to get the insight, you need to surround him with about four other support people. There needs to be a data engineering guy who’s going to build the real-time path. There has to be a team of guys to get the data from a sensor to the cloud. That’s the shift we’re seeing to get insights from real-time monitoring. How you get the data from the sensor to the cloud is changing… Once you have the data to the cloud, there needs to be a team of guys — this is like Spark, Flink, Storm — to set up real-time data pipelines, and that’s relatively new technology. How do we process data in real time once we get it to the cloud?”
AI and ML for data science
The use of artificial intelligence and machine learning can help with things like anomaly detection, event correlation and remediation, and APM vendors are starting to build those features into their solutions.
AI and ML are starting to provide more human-like insights into data, and deep learning networks are playing an important role in reducing false positives to a point where network engineers can use the data.
But Gartner’s Rich pointed out that all of this activity has to be related to the digital impact on the business. Observing performance is one thing, but if something goes wrong, you need to understand what it impacts, and Rich said you need to see the causal chain to understand the event. “Putting that together, I have a better understanding of observation. Adding in machine learning to that, I can then analyze, ‘will it impact,’ and now we’re in the future of digital business.”
Beyond that, organizations want to be able to find out what the “unknown unknowns” are. Rich said a true observability solution would have all of those capabilities — AI, ML, digital business impact and querying the system for the unknown unknowns. “For the most part, most of the talk about it has been a marketing term used by younger vendors to differentiate themselves and say the older vendors don’t have this and you should buy us. But in truth, nobody fully delivers what I just described, so it’s much more aspirational in terms of reality. Certainly, a worthwhile thing, but all of the APM solutions are all messaging how they’re delivering this, whether they’re a startup from a year ago or one that’s been around for 10 years. They’re all making efforts to do that, to varying degrees.”
With Jenna Sargent