Industry efforts toward distributed tracing have been evolving for decades, and one of the latest initiatives in this arena is OpenTracing, an open distributed standard for apps and OSS packages. APMs like LightStep and Datadog are eagerly pushing forward the emerging specification, as are customer organizations like HomeAway, PayPal and Pinterest, while some other industry leaders – including Dynatrace, NewRelic, and App Dynamics  – are holding back from full support. Still, contributors to the open-source spec are forging ahead with more and more integrations, and considerable conference activities are in store for later this year.

“Distributed tracing is absolutely essential to building microservices in highly scalable distributed environments,” contended Ben Sigelman, co-creator of OpenTracing and co-founder and CEO at LightStep, in an interview with SD Times. In contrast to other types of tracing familiar to some developers, such as kernel tracing or stack tracing, distributed tracing is all about understanding the complex journeys that transactions take in propagating across distributed systems.

Where academic papers about distributed tracing started appearing even sooner, Google first began using a distributed tracing system called Dapper some 14 years ago, publishing the Dapper paper online about six years later. As a Google employee during the early phases of his career, Sigelman worked on Dapper, in addition to several other Google projects. He became intrigued with Dapper as a solution to the issues posed when a single user query would hit hundreds of processes and thousands of surfaces, overwhelming existing logging systems. Zipkin, another distributed tracing system, went open source a couple of years after Dapper.

A spec is born
Where Dapper was geared to Google’s own internally controlled repository, however, the OpenTracing specification, launched in 2015, is designed to be a “single, standard mechanism to describe the behavior of [disparate] systems,” according to Sigelman. Tracing contexts are passed to both self-contained OSS services (like Cassandra and NGINX) and OSS packages locked into custom services (such as ORM amds amd grpc), as well as “arbitrary application glue and business logic built around the above,” Sigelman wrote in a blog.

As might be expected, among the earliest customer adopters of OpenTracing are many large, cloud-enabled online services dealing with massive numbers of transactions across myriad distributed systems. HomeAway, for example, is a vacation rental marketplace dealing with 2 million vacation homes in 190 countries, across 50 websites around the globe.

“Our system is composed of different services written in different languages,” said Eduardo Solis, architect at HomeAway, in an email to SD Times. “We are also seeing many teams using patterns like CQRS and a lot of streaming where transactions have real-time patterns and asynchronous ones. Being able to visualize and measure all of this is critical!”

Why OpenTracing?
“OpenTracing is a ‘must have’ tool for microservices and cloud-native applications. It is the API to adopt,” Solis continued. “Observability of the system is critical for business success in a containerized cloud world where applications are spinning up and down, having degradation or failure, and there is a very complex dependency graph. Instrumenting code properly is hard. Assuming you have the resources and knowledge to do it you end up using either some proprietary API or getting the system baked to a vendor system. There are APM solutions that auto-instrument but then you end up losing some of the powerful context capabilities. OpenTracing solves all of the above.

“You have the whole open source community instrumenting popular frameworks and libraries,” Solis added, “you get a vendor neutral interface for instrumentation, and you can use that same API to do other more interesting things at the application level without getting married to one single solution.”

How OpenTracing is different
Sigelman, of course, concurs that OpenTracing carries significant advantages for developers. For one thing, developers of application code and OSS packages and services can instrument their own code without binding to any specific tracing vendor. Beyond that, each component of a distributed system can be instrumented in isolation, “and the distributed application maintainer can choose (or switch, or multiplex) a downstream tracing technology with a configuration change,” he said.

Sigelman points to a number of different ways in which distributed tracing can be standardized, such as the following:

  • Standardized span management. Here, programmatic APIs are used to start, finish, and decorate time operations, which are called “spans” in the jargons of both Dapper and Zipkin.
  • Standardized inter-process propagation. Programmatic APIs are used to help in transferring tracing context across process boundaries.
  • Standardized active span management. In a single process, programmatic APIs store and retrieve the active span across package boundaries.
  • Standardized in-band context encoding. Specifications are made as to an exact wire-encoding format for tracing context passed alongside application data between processes.
  • Standardized out-of-band trace data encoding. Specifications are made about how decorated trace and span data should be encoded as it moves toward the distributed tracing vendor.

Earlier standardization efforts in distributed tracing have focused on the last two of these scenarios, meaning the encoding and representation of trace and context data, both in-and out-of-band, as opposed to APIs. In so doing, these earlier efforts have failed to provide several benefits that developers actually need, Sigelman argued.

“Standardization of encoding formats has few benefits for instrumentation-API consistency, tracing vendor lock-in, or the tidiness of dependencies for OSS projects, the very things that stand in the way of turnkey tracing today,” he wrote. “What’s truly needed – and what OpenTracing provides – is standardization of span management APIs, inter-process propagation APIs, and ideally active span management APIs.”

OpenTracing isn’t for everything (or everyone)
Sigelman told SD Times that he sees three main use scenarios for OpenTracing: “The first of these is basic storytelling. What happens to a transaction across processes? The second is root cause analysis. What’s broken?” he noted. “The third main use case scenario is greenfield long-term analysis, to help bring improvements that would prevent the need for engineering changes in the future.”

Still, leading APMs like Dynatrace, New Relic, and App Dynamics are hanging back from full support for OpenTracing. Why is this so?

Alois Reitbauer, chief technology strategist at Dynatrace, agreed that OpenTracing does offer some important benefits to developers.

“There’s a lot going on in the industry right now in terms of creating a standardized way for instrumenting applications, and OpenTracing is one part of that. What it tries to achieve is something really important, and something that the industry needs to solve, in terms of defining what a joint API can look like. Some frameworks are using OpenTracing already today, but it’s mainly targeted for library and some middleware developers. End users will not necessarily have first-hand contact as frameworks and middleware either come already instrumented or instrumentation is handled by the monitoring provider,” Reitbauer told SD Times, in an email.

“It’s a good first step, but it’s in its early stages, and the reality is that OpenTracing doesn’t paint the whole picture. Beyond just traces, systems need metrics and logs to give a comprehensive view of the ecosystem, with a full APM system in the backend as well.”

In a recent blog post, Reitbauer went further to maintain that interoperability has become much more necessary lately with the rise of cloud services apps from third-party vendors, but that the only way to achieve interoperability is to solve two problems that OpenTracing doesn’t address. The problems involve abilities to “create an end-to-end trace with multiple full boundaries” and to “access partial trace data in a well defined way and link it together for end-to-end visibility,” he wrote.

Many APM and cloud providers and well aware of these issues and have started to work on solving them by agreeing on two things: a standardized method for propagating trace context information of vendors end-to-end, and a discussion of how to be able to ingest trace fragment data from each other, according to Reitbauer.

“The first [of these] is on the way to be resolved within the next year. There is a W3C working group forming that will define a standardized way to deal with trace information referred to as Trace-Context, which basically defines two new HTTP-Headers that can store and propagate trace information. Today every vendor would use their own headers, which means they will very likely get dropped by intermediaries that do not understand them,” said the Dynatrace exec.

“Now let us move on to data formats. Unfortunately, a unified data format for trace data is further away from becoming reality,” he acknowledged. “Today there are practically as many formats available as there are tools. There isn’t even the conceptual agreement whether the data format should be standardized or if there should be a standardized API and everyone can build an exporter that fits their specific needs. There are pros and cons for both approaches and the future will reveal what implementers consider the best approach. The only thing that cannot be debated is that eventually we will need a means to easily collect trace fragments and link them together.”

For his part, though, Sigelman has suggested that one of the big reasons why OpenTracing is progressing so rapidly is precisely due to the narrow, well defined, and manageable focus of the spec.

New support for the spec
Now Datadog. a major monitoring platform for cloud environments, is another force avidly backing OpenTracing. In December of 2017, Datadog announced its support for OpenTracing as well as its membership in the Cloud Native Computing Foundation (CNCF). The vendor also unveiled plans to join the OpenTracing Specification Committee (OTSC) and to invest in developing the standard going forward.

Datadog’s support for OpenTracing will let customers instrument their code for distributed tracing without concerns about getting locked in to a single vendor or making costly modifications to their code in the future, according to Ilan Rabinovitch, VP product and community for Datadog.

“Open source technologies and open standards have long been critical to Datadog’s success. Customers want to emit metrics and traces with the tooling that best fits their own workflows and want to enable them to do so, rather than force them to specific client-side tooling,” he told SD Times.

“Many of our most popular integrations in infrastructure monitoring, including OpenStack and Docker, started off as community-driven contributions and collaborations around our open-source projects. In the world of OpenTracing we have seen our community build and open source their own OT-based tracers that enable new languages on Datadog, beyond our existing support for Java, Python, Ruby and Go.

In addition to the Specifications Committee, OpenTracing also runs multiple working groups. The Documentation Working Group meets every Thursday, while the Cross Language Working Group – entrusted with maintaining the OpenTracing APIs and ecosystem – meets on Fridays.  

Conference fare
Want to find out more about OpenTracing? This year, developers have an opportunity to meet with OpenTracing experts and discuss the emerging spec at a number of different conference venues.

At the end of March, HomeAway held an end user meetup group together with Indeed, PayPal, and Under Armour. Talking with SD Times just before the event in Austin, HomeAway’s Solis said that he planned to give a presentation detailing how his development team is using the new spec.

“As infrastructure groups we are providing platforms and frameworks that deliver instrumentation to developers so they don’t have to do anything to get quality first level (entry/exit) tracing in their applications. We have also worked on an internal standard that developers using other technologies that we don’t support can instrument themselves. OpenTracing gives us this ability to just delegate to standard documentation and open-source forums if developers want to enrich their tracing. We are also doing a slow rollout so we can build capabilities in small but fast iterations,” the architect elaborated.  

Yet in case you missed the meetup in Austin, you have several other chances ahead for getting together with developers from the OpenTracing community.

KubeCon EU, happening from May 2 to 4 in Copenhagen, will feature two talks about OpenTracing, along with two salons. Salons are breakout sessions where folks interested in learning about distributed tracing can discuss the subject with speakers and mentors.

OSCON, going on from July 17 to 19 in Portland, OR, will include three talks on OpenTracing, along with a workshop and salons. If you’d like to attend an OpenTracing salon at either venue, you can email OpenTracing at hello@opentracing.io to pose questions in advance. OpenTracing would also love to hear from participants who are willing to help out by mentoring.

Recent OpenTracing feats
Sigelman is quick to observe that his co-creators on OpenTracing and his co-founders on LightStep are two distinctly separate groups, and that many OpenTracing adopters are not LightStep customers.

He also cites large numbers of recent contributions from both OpenTracing and customer and vendor contributors, including the following.

Core API and official OpenTracing contributions

  • OpenTracing-C++ has now added support for dynamic loading, meaning that they will dynamically load tracing libraries at runtime rather than needing them to be linked at compile-time. Users can use any tracing system that supports OpenTracing. Support currently includes Envoy and NGINX.
  • OpenTracing-Python 2.0 and OpenTracing-C#v.0.12 have both been released. The main addition to each is Scopes and ScopeManager.

Content from the community

  • Pinterest presented its Pintrace Trace Analyzer at the latest OTSC meeting. “The power of this tool is its ability to compare two batches of traces – displaying stats for each of the two and highlighting the changes,” explained Pinterest’s Naoman Abbas. “An unexpected and significant change in a metric can indicate that something is going wrong in a deployment.”
  • RedHat has shared best practices for using OpenTracing with Envoy or Istio. “We have seen that tracing system and with Istio is very simple to set up. It does not require any additional libraries. However, there are still some actions needed for header propagation. This can be done automatically with OpenTracing, and it also adds more visibility into the monitored process,” according to RedHat’s Pavol Loffay.
  • HomeAway presented at the Testing in Production meetup at Heavybit. LightStep’s Priyanka Sharma showed ways to use tracing to lessen the pain when developers are running microservices using CI/CD.
  • Idit Levine, founder of Solo.io, delivered a presentation at Qcon about her OpenTracing native open-source project, Squash, and how it can be used for debugging containerized microservices.

Community contributions

  • Software development firm Alibaba has created an application manager called Pandora.js, which integrates capabilities such as monitoring, debugging and resiliency while supplying native OpenTracing support to assist in inspecting applications at runtime.
  • Xavier Canal from Barcelona has built Opentracing-rails, a distributed tracing instrumentation for Ruby on Rails apps based on OpenTracing. The tool includes examples of how to initialize Zipkin and Jaeger tracers.
  • Gin, a web framework written in the Golong language, has begun to add helpers for request-level tracing.
  • Daniel Schmidt of Mesosphere has created Zipkin-playground, a repo with examples of Zipkin-OpenTracing-compatible APIs for client-side tracing.
  • The Akka and Concurrency utilities have both added support for Java and Scala.
  • Michael Nitschinger of Couchbase is now leading a community exploration into an OpenTracing API to be written in the Rust programming language.