Observability: It's all about the data

Published: June 2nd, 2020

Observability is the latest evolution of application performance monitoring, enabling organizations to get a view into CI/CD pipelines, microservices, Kubernetes, edge devices and cloud and network performance, among other systems.

While being able to have this view is important, handling all the data these systems throw off can be a huge challenge for organizations. In terms of observability, the three pillars of performance data are logs (for recording events), metrics (what data you decide gives you the most important measures of performance) and traces (views into how software is performing).

Those data sources are important, but if that is where you stop in terms of what you do with the data, your organization is being passive and not proactive. All you’ve done is collect data. According to Gartner research director Charley Rich, “We think the definition of observability should be expanded in a couple of ways. Certainly, that’s the data you need — logs, metrics and traces. But all of this needs to be placed and correlated into a topology so that we see the relationships between everything, because that’s how you know if it can impact something else.”

Bob Friday, who leads the AIOps working group at the Open Networking User Group (ONUG) and is CTO at wireless network provider Mist Systems (a Juniper Networks company), said from a network perspective, it’s important to start with the question, “Why is the user having a problem?” and work back from that. That, he said, all starts with the data. “I would say the fundamental change I’ve seen from 15 years ago, when we were in the game of helping enterprises deal with network stuff, is that this time around, the paradigm is we’re trying to manage end-to-end user experience. [Customers] really don’t care if it’s a Juniper box or a Cisco box.”

Part of this need is driven by software development, which has taken services and distributed deployment environments to a whole other level, by deploying more frequently and achieving higher engineering productivity. And, as things speed up, performance and availability management become more critical than ever. “Infrastructure and ops, these app support teams, have to understand that if more applications are coming out of the factory, we better move fast,” said Stephen Elliot, program vice president for I&O at analysis firm IDC. “The key thing is recognizing what type of analytics are the proper ones to the different data sets; what kinds of answers do they want to get out of these analytics.”

But with that, it’s very important to recognize what type of analytics are the proper ones to the different data sets; what kinds of answers do organizations want to get out of these analytics.

Elliot explained that enterprises today understand the value of monitoring. “Enterprises are beginning to recognize that with the vast amount of different types of data sources, you sort of have to have [monitoring],” he said. “You have more complexity in the system, in the environment, and what remains is the need for performance availability capabilities. In production, this has been a theme for 20 years. This is a need-to-have, not a nice-to-have.”

Not only are there now different data sources, it’s the type of data being collected that has changed how organizations collect, analyze and act on data. “The big change that happened in data for me from 15 years ago, where we were collecting stats every minute or so, to now, we’re collecting synchronous data as well as asynchronous user state data,” Friday said. “Instead of collecting the status of the box, we’re collecting in-state user data. That’s the beginning of the thing.”

Analyzing that data
To make the data streaming into organizations actionable, graphical data virtualization and visualization is key, according to Joe Butson, co-founder of Big Deal Digital, a consulting firm. “Virtualization,” he said, “has done two things: It’s made it more accessible for those people who are not as well-versed in the information they’re looking at. So the virtualization, when it’s graphical, you can see when performance is going down and you have traffic that’s going up because you can see it on the graph instead of cogitating through numbers. The visualization really aids understanding, leading to deeper knowledge and deeper insights, because in moving from a reactive culture in application monitoring or end-to-end life cycle monitoring, you’ll see patterns over time and you’ll be able to act proactively.

“For instance,” he continued, “if you have a modern e-commerce site, when users are spiking at a certain period that you don’t expect, you’re outside of the holiday season, then you can then look over, ‘Are we spinning up the resources we need to manage that spike?’ It’s easy when you can look at a visual tool and understand that versus going to a command-line environment and query what’s going on and pull back information from a log.”

Another benefit of data virtualization is the ability to view data from multiple sources in the virtualization layer, without having to move the data. This helps everyone who needs to view data stay in sync, as there’s but one version of truth. This also means organizations don’t have to move data into big data lakes.

When it comes to data, Mist’s Friday said, “A lot of businesses are doing the same thing. They first of all go to Splunk, and they spend a year just trying to get the data into some bucket they can do something with. At ONUG we’re trying to reverse that. We say, ‘Start with the question,’ figure out what question you’re trying to answer, and then figure out what data you need to answer that question. So, don’t worry about bringing the data into a data lake. Leave the data where it’s at, we will put a virtualized layer across your vendors that have your data, and most of it is in the cloud. So, you virtualize the data and pull out what you need. Don’t waste your time collecting a bunch of data that isn’t going to do you any good.”

Because data is coming from so many different sources and needs to be understood and acted on by many different roles inside a company, some of those organizations are building multiple monitoring teams, designed to take out just the data that’s relevant to their role, and presented in a way they can understand.

Friday said, “If you look at data scientists, they’re the guys who are trying to get the insights. If you have a data science guy trying to get the insight, you need to surround him with about four other support people. There needs to be a data engineering guy who’s going to build the real-time path. There has to be a team of guys to get the data from a sensor to the cloud. That’s the shift we’re seeing to get insights from real-time monitoring. How you get the data from the sensor to the cloud is changing… Once you have the data to the cloud, there needs to be a team of guys — this is like Spark, Flink, Storm — to set up real-time data pipelines, and that’s relatively new technology. How do we process data in real time once we get it to the cloud?”

AI and ML for data science
The use of artificial intelligence and machine learning can help with things like anomaly detection, event correlation and remediation, and APM vendors are starting to build those features into their solutions.

AI and ML are starting to provide more human-like insights into data, and deep learning networks are playing an important role in reducing false positives to a point where network engineers can use the data.

But Gartner’s Rich pointed out that all of this activity has to be related to the digital impact on the business. Observing performance is one thing, but if something goes wrong, you need to understand what it impacts, and Rich said you need to see the causal chain to understand the event. “Putting that together, I have a better understanding of observation. Adding in machine learning to that, I can then analyze, ‘will it impact,’ and now we’re in the future of digital business.”

Beyond that, organizations want to be able to find out what the “unknown unknowns” are. Rich said a true observability solution would have all of those capabilities — AI, ML, digital business impact and querying the system for the unknown unknowns. “For the most part, most of the talk about it has been a marketing term used by younger vendors to differentiate themselves and say the older vendors don’t have this and you should buy us. But in truth, nobody fully delivers what I just described, so it’s much more aspirational in terms of reality. Certainly, a worthwhile thing, but all of the APM solutions are all messaging how they’re delivering this, whether they’re a startup from a year ago or one that’s been around for 10 years. They’re all making efforts to do that, to varying degrees.”

With Jenna Sargent

Article Tags

APM, application stability management, CI/CD pipelines, data analytics, observability

About David Rubinstein

David Rubinstein is editor-in-chief of SD Times.

View all posts by David Rubinstein

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Observability: It’s all about the data

Article Tags

Subscribe to SDTimes

About David Rubinstein

Related Articles

The Context Advantage

New Relic adds monitoring for ChatGPT apps

Honeycomb announces native support for OpenTelemetry metrics

Elastic simplifies log analytics for SREs and developers with launch of Log Essentials