Observability: A process change, not a set of tools

Published: April 9th, 2021

If you do a Google search for the phrase “observability tools,” it’ll return about 3.3 million results. As observability is the hot thing right now, every vendor is trying to get aboard the observability train. But observability is not as simple as buying a tool; it’s more of a process change — a way of collecting data and using that data to provide better customer experiences.

“Right now there’s a lot of buzz around observability, observability tools, but it’s not just the tool,” said Mehdi Daoudi, CEO of digital experience monitoring platform Catchpoint. “That’s the key message. It’s really about how can we combine all of these data streams to try to paint a picture.”

If you go back to where observability came from — like many other processes, it originated at Google — its original definition was about measuring “how well internal states of a system can be inferred from knowledge of its external outputs,” said Daoudi.

Daoudi shared an example of observability in action where one of Catchpoint’s customers was seeing a trend where customers complained a lot on Mondays and Tuesdays, but not on Sundays. The server load was the same, but the services were slower. Through observability, the company was able to determine that backup processes that only run on weekdays were the culprit and were impacting performance.

“Observability is about triangulation,” said Daoudi. “It’s about being able to answer a very, very complex question, very, very quickly. There is a problem – where is the problem? The reason why this is important is because things have gotten a lot more complex. You’re not dealing with one server anymore, you’re dealing with hundreds of thousands of servers, cloud, CDNs, a lot of moving parts where each one of them can break. And so not having observability into the state of those systems, that makes your triangulation efforts a lot harder, and therefore longer, and therefore has an impact on the end users and your brand and revenue, etc.”

This is why Daoudi firmly believes that observability isn’t just a set of tools. He sees it as a way of working as a company, being aligned, and being able to have a common way to collect data that is needed to answer questions.

The industry has standardardized on OpenTelemetry as the common way of collecting telemetry data. OpenTelemetry is an open source tool used for gathering metrics, logs, and traces — often referred to as the three pillars of observability.

The three pillars are often referenced in the industry when talking about observability, but Ben Sigelman, CEO and co-founder of monitoring company Lightstep, believes that observability needs to go beyond metrics, logs, and traces. He compared the three pillars to Steve Jobs announcing the first iPhone back in 2007. Jobs started off the presentation by announcing a widescreen iPod with touch controls, a “revolutionary” mobile phone, and a breakthrough internet communications device, making it seem as though they were three separate devices.

“These are not three separate devices,” Jobs went on to clarify. “This is one device, and we are calling it iPhone.” Sigelman said the same is true of telemetry. Metrics, logs, and traces shouldn’t be known as the three pillars because you get all three at once and it’s one thing: telemetry.

Michael Fisher, group product manager at AIOps company OpsRamp, broke observability data down further into two signals: symptomatic signals and causal signals. Symptomatic signals are what an end user is experiencing, such as page latency or a 500 Internal Server Error on a website. Causal signals are what cause those symptomatic signals. Examples include CPU, network, and storage metrics, and “things that may be an issue, but you’re not sure because they’re not being tied to any symptom that an end user might be facing.”

Monitoring tools tend to focus mostly on the causal signals, Fisher explained, but he recommends starting with symptomatic signals and working towards causal signals, with the end state being a unit of the two.

“When something is going wrong [the developer] can search that log, they can search that trace and they can tie it back to the piece of code that’s having an issue,” said Fisher. “The operations team, they may just see the causal symptoms, or maybe there is no causal symptom. Maybe the application is running fine but users are still complaining. Tying those two together is kind of a key part of this shift towards observability. And that’s why I talk about observability as a development principle because I think starting with the symptomatic signals with the people who actually know is a huge paradigm shift for me because I think some of the people you talk to or ITOps teams you talk to is that monitoring is their wheelhouse, whereas many modern shops, OpsRamp included, much more monitoring actually happens on the development team side now.”

Providing good end user experience is the ultimate goal of observability. With monitoring, you might only be focusing on those causal signals, which might mean you miss out on important symptomatic signals where the end user is experiencing some sort of service degradation or trouble accessing your application.

“When I talk about using observability to drive end-user outcomes, I’m really talking about focusing on observing the things that would impact end users and taking action on them before they do because traditionally this focus on monitoring has been at a much lower level, layer 3, I care about my network, I care about my switches,” said Fisher. “I’ve talked to customers where that’s all they care about, which is fine but you start to realize those things really matter less once you move up the stack and you have a webpage or you have a SaaS application. The end user will never tell you that their CPU is high, but they will tell you that your webpage is taking 10 seconds to load and they couldn’t use your tool. If an end user can’t use your tool who gives a damn about anything else?”

It’s important that observability not just stay in the hands of developers. In fact, Bernd Greifeneder, CTO of monitoring company Dynatrace, believes that if developers just do observability on their own, then it’s nothing more than a debugging tool. “The reason then for DevOps and SREs needs to come into play is to help with a more consistent approach because these days multiple teams create different microservices that are interconnected and have to interplay. This is sort of a complexity challenge and also a scale challenge that needs to be solved. This is where an SRE and Ops team have to help with standing up proper observability tooling or monitoring if you will, but making sure that all the observability data comes together in a holistic view,” he said.

SRE and Ops teams can help make sure that the observability data that the developers are collecting has the proper analytics on top of it. This will enable them to gain insights from observability data and use those insights to drive automation and further investments into observability. “IT automation means higher availability, it means automatic remediation when services fail, and ultimately means better experiences for customers,” Greifeneder said.

When looking into the tools to put on top of your observability data to do those analytics, Tyler McMullen, CTO of edge cloud platform Fastly recommends constantly experimenting to see what works for your team. He explained that often these observability vendors charge a lot of money, and teams might fall into the trap of buying a solution, putting too much observability data into it, and being shocked when they’re charged a lot of money to do so.

“Are the pieces of information that we’re plugging into our observability, are they actually working for us? If they’re not working for us, we definitely shouldn’t have them in there,” said McMullen. “On the other hand, you only really find out whether or not something is useful after it becomes useful. Figuring out what you need in advance is I think, one of the biggest problems with this thing. You don’t want to put too much in. On the other hand, if you put too little in you don’t know whether or not it is useful.” As a result, your team will need to do lots of experimenting to discover the right process and the right balance.

Daoudi added that it’s also important to answer the question of why you’re doing observability before looking into products. “Like every new thing that when a company goes and decides to implement something, you start with why? Why do you need to implement observability? Why do you need to implement SREs? Why do you need to implement an HR system? If you don’t define the ‘why’ then what typically happens is first it’s a huge distraction to your company and also a lot of resources being wasted and then the end result might not be what you’re looking for,” he said.

And of course, it’s important to remember that observability is more of a process, so looking for a tool that will do observability for you won’t work. The tooling is really about analytics on the observability data you’ve gathered.

“I really don’t think observability is a tool,” said Daoudi. “If there was such a thing as go to Best Buy, aisle 5, or Target, or Walmart and buy an observability tool for like $5 million, it ain’t going to work because if your company is not functioning and aligned, and your processes and everything isn’t aligned around what observability is supposed to do, then you’re just going to have shelfware in your company.”

Article Tags

APM, monitoring, observability

About Jenna Barron

Jenna Barron is News Editor of SD Times.

View all posts by Jenna Barron

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Observability: A process change, not a set of tools

Article Tags

Subscribe to SDTimes

About Jenna Barron

Related Articles

Catchpoint adds OpenTelemetry-based real-user monitoring for mobile devices

Grafana 12 is now available with new observability as code features, Dynamic Dashboards, and more

Instabug launches new observability features to connect business outcomes with app performance, user experience

O11y like a B.O.S.S – The modern observability stack