Efforts to standardize tracing through OpenTracing

Published: April 5th, 2018

Industry efforts toward distributed tracing have been evolving for decades, and one of the latest initiatives in this arena is OpenTracing, an open distributed standard for apps and OSS packages. APMs like LightStep and Datadog are eagerly pushing forward the emerging specification, as are customer organizations like HomeAway, PayPal and Pinterest, while some other industry leaders – including Dynatrace, NewRelic, and App Dynamics – are holding back from full support. Still, contributors to the open-source spec are forging ahead with more and more integrations, and considerable conference activities are in store for later this year.

“Distributed tracing is absolutely essential to building microservices in highly scalable distributed environments,” contended Ben Sigelman, co-creator of OpenTracing and co-founder and CEO at LightStep, in an interview with SD Times. In contrast to other types of tracing familiar to some developers, such as kernel tracing or stack tracing, distributed tracing is all about understanding the complex journeys that transactions take in propagating across distributed systems.

Where academic papers about distributed tracing started appearing even sooner, Google first began using a distributed tracing system called Dapper some 14 years ago, publishing the Dapper paper online about six years later. As a Google employee during the early phases of his career, Sigelman worked on Dapper, in addition to several other Google projects. He became intrigued with Dapper as a solution to the issues posed when a single user query would hit hundreds of processes and thousands of surfaces, overwhelming existing logging systems. Zipkin, another distributed tracing system, went open source a couple of years after Dapper.

A spec is born
Where Dapper was geared to Google’s own internally controlled repository, however, the OpenTracing specification, launched in 2015, is designed to be a “single, standard mechanism to describe the behavior of [disparate] systems,” according to Sigelman. Tracing contexts are passed to both self-contained OSS services (like Cassandra and NGINX) and OSS packages locked into custom services (such as ORM amds amd grpc), as well as “arbitrary application glue and business logic built around the above,” Sigelman wrote in a blog.

As might be expected, among the earliest customer adopters of OpenTracing are many large, cloud-enabled online services dealing with massive numbers of transactions across myriad distributed systems. HomeAway, for example, is a vacation rental marketplace dealing with 2 million vacation homes in 190 countries, across 50 websites around the globe.

“Our system is composed of different services written in different languages,” said Eduardo Solis, architect at HomeAway, in an email to SD Times. “We are also seeing many teams using patterns like CQRS and a lot of streaming where transactions have real-time patterns and asynchronous ones. Being able to visualize and measure all of this is critical!”

Why OpenTracing?
“OpenTracing is a ‘must have’ tool for microservices and cloud-native applications. It is the API to adopt,” Solis continued. “Observability of the system is critical for business success in a containerized cloud world where applications are spinning up and down, having degradation or failure, and there is a very complex dependency graph. Instrumenting code properly is hard. Assuming you have the resources and knowledge to do it you end up using either some proprietary API or getting the system baked to a vendor system. There are APM solutions that auto-instrument but then you end up losing some of the powerful context capabilities. OpenTracing solves all of the above.

“You have the whole open source community instrumenting popular frameworks and libraries,” Solis added, “you get a vendor neutral interface for instrumentation, and you can use that same API to do other more interesting things at the application level without getting married to one single solution.”

How OpenTracing is different
Sigelman, of course, concurs that OpenTracing carries significant advantages for developers. For one thing, developers of application code and OSS packages and services can instrument their own code without binding to any specific tracing vendor. Beyond that, each component of a distributed system can be instrumented in isolation, “and the distributed application maintainer can choose (or switch, or multiplex) a downstream tracing technology with a configuration change,” he said.

Sigelman points to a number of different ways in which distributed tracing can be standardized, such as the following:

Standardized span management. Here, programmatic APIs are used to start, finish, and decorate time operations, which are called “spans” in the jargons of both Dapper and Zipkin.
Standardized inter-process propagation. Programmatic APIs are used to help in transferring tracing context across process boundaries.
Standardized active span management. In a single process, programmatic APIs store and retrieve the active span across package boundaries.
Standardized in-band context encoding. Specifications are made as to an exact wire-encoding format for tracing context passed alongside application data between processes.
Standardized out-of-band trace data encoding. Specifications are made about how decorated trace and span data should be encoded as it moves toward the distributed tracing vendor.

Earlier standardization efforts in distributed tracing have focused on the last two of these scenarios, meaning the encoding and representation of trace and context data, both in-and out-of-band, as opposed to APIs. In so doing, these earlier efforts have failed to provide several benefits that developers actually need, Sigelman argued.

“Standardization of encoding formats has few benefits for instrumentation-API consistency, tracing vendor lock-in, or the tidiness of dependencies for OSS projects, the very things that stand in the way of turnkey tracing today,” he wrote. “What’s truly needed – and what OpenTracing provides – is standardization of span management APIs, inter-process propagation APIs, and ideally active span management APIs.”

OpenTracing isn’t for everything (or everyone)
Sigelman told SD Times that he sees three main use scenarios for OpenTracing: “The first of these is basic storytelling. What happens to a transaction across processes? The second is root cause analysis. What’s broken?” he noted. “The third main use case scenario is greenfield long-term analysis, to help bring improvements that would prevent the need for engineering changes in the future.”

Still, leading APMs like Dynatrace, New Relic, and App Dynamics are hanging back from full support for OpenTracing. Why is this so?

Alois Reitbauer, chief technology strategist at Dynatrace, agreed that OpenTracing does offer some important benefits to developers.

“There’s a lot going on in the industry right now in terms of creating a standardized way for instrumenting applications, and OpenTracing is one part of that. What it tries to achieve is something really important, and something that the industry needs to solve, in terms of defining what a joint API can look like. Some frameworks are using OpenTracing already today, but it’s mainly targeted for library and some middleware developers. End users will not necessarily have first-hand contact as frameworks and middleware either come already instrumented or instrumentation is handled by the monitoring provider,” Reitbauer told SD Times, in an email.

“It’s a good first step, but it’s in its early stages, and the reality is that OpenTracing doesn’t paint the whole picture. Beyond just traces, systems need metrics and logs to give a comprehensive view of the ecosystem, with a full APM system in the backend as well.”

In a recent blog post, Reitbauer went further to maintain that interoperability has become much more necessary lately with the rise of cloud services apps from third-party vendors, but that the only way to achieve interoperability is to solve two problems that OpenTracing doesn’t address. The problems involve abilities to “create an end-to-end trace with multiple full boundaries” and to “access partial trace data in a well defined way and link it together for end-to-end visibility,” he wrote.

Many APM and cloud providers and well aware of these issues and have started to work on solving them by agreeing on two things: a standardized method for propagating trace context information of vendors end-to-end, and a discussion of how to be able to ingest trace fragment data from each other, according to Reitbauer.

“The first [of these] is on the way to be resolved within the next year. There is a W3C working group forming that will define a standardized way to deal with trace information referred to as Trace-Context, which basically defines two new HTTP-Headers that can store and propagate trace information. Today every vendor would use their own headers, which means they will very likely get dropped by intermediaries that do not understand them,” said the Dynatrace exec.

“Now let us move on to data formats. Unfortunately, a unified data format for trace data is further away from becoming reality,” he acknowledged. “Today there are practically as many formats available as there are tools. There isn’t even the conceptual agreement whether the data format should be standardized or if there should be a standardized API and everyone can build an exporter that fits their specific needs. There are pros and cons for both approaches and the future will reveal what implementers consider the best approach. The only thing that cannot be debated is that eventually we will need a means to easily collect trace fragments and link them together.”

For his part, though, Sigelman has suggested that one of the big reasons why OpenTracing is progressing so rapidly is precisely due to the narrow, well defined, and manageable focus of the spec.

New support for the spec
Now Datadog. a major monitoring platform for cloud environments, is another force avidly backing OpenTracing. In December of 2017, Datadog announced its support for OpenTracing as well as its membership in the Cloud Native Computing Foundation (CNCF). The vendor also unveiled plans to join the OpenTracing Specification Committee (OTSC) and to invest in developing the standard going forward.

Datadog’s support for OpenTracing will let customers instrument their code for distributed tracing without concerns about getting locked in to a single vendor or making costly modifications to their code in the future, according to Ilan Rabinovitch, VP product and community for Datadog.

“Open source technologies and open standards have long been critical to Datadog’s success. Customers want to emit metrics and traces with the tooling that best fits their own workflows and want to enable them to do so, rather than force them to specific client-side tooling,” he told SD Times.

“Many of our most popular integrations in infrastructure monitoring, including OpenStack and Docker, started off as community-driven contributions and collaborations around our open-source projects. In the world of OpenTracing we have seen our community build and open source their own OT-based tracers that enable new languages on Datadog, beyond our existing support for Java, Python, Ruby and Go.

In addition to the Specifications Committee, OpenTracing also runs multiple working groups. The Documentation Working Group meets every Thursday, while the Cross Language Working Group – entrusted with maintaining the OpenTracing APIs and ecosystem – meets on Fridays.

Conference fare
Want to find out more about OpenTracing? This year, developers have an opportunity to meet with OpenTracing experts and discuss the emerging spec at a number of different conference venues.

At the end of March, HomeAway held an end user meetup group together with Indeed, PayPal, and Under Armour. Talking with SD Times just before the event in Austin, HomeAway’s Solis said that he planned to give a presentation detailing how his development team is using the new spec.

“As infrastructure groups we are providing platforms and frameworks that deliver instrumentation to developers so they don’t have to do anything to get quality first level (entry/exit) tracing in their applications. We have also worked on an internal standard that developers using other technologies that we don’t support can instrument themselves. OpenTracing gives us this ability to just delegate to standard documentation and open-source forums if developers want to enrich their tracing. We are also doing a slow rollout so we can build capabilities in small but fast iterations,” the architect elaborated.

Yet in case you missed the meetup in Austin, you have several other chances ahead for getting together with developers from the OpenTracing community.

KubeCon EU, happening from May 2 to 4 in Copenhagen, will feature two talks about OpenTracing, along with two salons. Salons are breakout sessions where folks interested in learning about distributed tracing can discuss the subject with speakers and mentors.

OSCON, going on from July 17 to 19 in Portland, OR, will include three talks on OpenTracing, along with a workshop and salons. If you’d like to attend an OpenTracing salon at either venue, you can email OpenTracing at hello@opentracing.io to pose questions in advance. OpenTracing would also love to hear from participants who are willing to help out by mentoring.

Recent OpenTracing feats
Sigelman is quick to observe that his co-creators on OpenTracing and his co-founders on LightStep are two distinctly separate groups, and that many OpenTracing adopters are not LightStep customers.

He also cites large numbers of recent contributions from both OpenTracing and customer and vendor contributors, including the following.

Core API and official OpenTracing contributions

OpenTracing-C++ has now added support for dynamic loading, meaning that they will dynamically load tracing libraries at runtime rather than needing them to be linked at compile-time. Users can use any tracing system that supports OpenTracing. Support currently includes Envoy and NGINX.
OpenTracing-Python 2.0 and OpenTracing-C#v.0.12 have both been released. The main addition to each is Scopes and ScopeManager.

Content from the community

Pinterest presented its Pintrace Trace Analyzer at the latest OTSC meeting. “The power of this tool is its ability to compare two batches of traces – displaying stats for each of the two and highlighting the changes,” explained Pinterest’s Naoman Abbas. “An unexpected and significant change in a metric can indicate that something is going wrong in a deployment.”
RedHat has shared best practices for using OpenTracing with Envoy or Istio. “We have seen that tracing system and with Istio is very simple to set up. It does not require any additional libraries. However, there are still some actions needed for header propagation. This can be done automatically with OpenTracing, and it also adds more visibility into the monitored process,” according to RedHat’s Pavol Loffay.
HomeAway presented at the Testing in Production meetup at Heavybit. LightStep’s Priyanka Sharma showed ways to use tracing to lessen the pain when developers are running microservices using CI/CD.
Idit Levine, founder of Solo.io, delivered a presentation at Qcon about her OpenTracing native open-source project, Squash, and how it can be used for debugging containerized microservices.

Community contributions

Software development firm Alibaba has created an application manager called Pandora.js, which integrates capabilities such as monitoring, debugging and resiliency while supplying native OpenTracing support to assist in inspecting applications at runtime.
Xavier Canal from Barcelona has built Opentracing-rails, a distributed tracing instrumentation for Ruby on Rails apps based on OpenTracing. The tool includes examples of how to initialize Zipkin and Jaeger tracers.
Gin, a web framework written in the Golong language, has begun to add helpers for request-level tracing.
Daniel Schmidt of Mesosphere has created Zipkin-playground, a repo with examples of Zipkin-OpenTracing-compatible APIs for client-side tracing.
The Akka and Concurrency utilities have both added support for Java and Scala.
Michael Nitschinger of Couchbase is now leading a community exploration into an OpenTracing API to be written in the Rust programming language.

Article Tags

APM, Datadog, distributed tracing, LightStep, OpenTracing, standards

About Jacqueline Emigh

Jacqueline Emigh is a contributing editor for SD Times and ITOPs Times.

View all posts by Jacqueline Emigh

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Efforts to standardize tracing through OpenTracing

Article Tags

Subscribe to SDTimes

About Jacqueline Emigh

Related Articles

Catchpoint adds OpenTelemetry-based real-user monitoring for mobile devices

Report: Java is the language that’s most prone to third-party vulnerabilities

New Relic APM 360 provides developers with comprehensive view of issues

Zero-Copy Integration standard made available to public