The last decade has brought a progressive transition from monolithic applications that run on static infrastructure to microservices that run on highly dynamic cloud-native infrastructure. This shift has led to the rapid emergence of lots of new technologies, frameworks, and architectures and a new set of monitoring and observability tools that give engineers full visibility into the health and performance of these new systems.
Visibility is essential to ensure that a system and its dependencies behave as expected and to identify and speed resolution of any issues that may arise. To that end, teams need to gather complete health and performance telemetry data (metrics, logs, and traces) from all those components. This is accomplished through instrumentation.
Why do we need OpenTelemetry?
For many years there have been a wide variety of open-source and proprietary instrumentation tools like StatsD, Nagios plugins, Prometheus exporters, Datadog integrations, or New Relic agents. Unfortunately, while there are lots of open-source tools, there hasn’t been alignment about specific instrumentation standards, such as StatsD, in the developer community and between vendors. This makes interoperability a challenge.
The lack of instrumentation standards and interoperability has required every monitoring and observability tool to build their own collection of integrations to instrument the technologies developers use and need visibility into. For example, many monitoring tools have built integrations to instrument widely used databases like MySQL, including Prometheus MySQL Exporter, Datadog MySQL integration, and New Relic MySQL integration.
This is also true for application code instrumentation, where New Relic, Dynatrace, Datadog and other vendors have built complex agents that automatically instrument popular application frameworks and libraries. Developers spend years building instrumentation, and it requires a sizable investment to build a large enough catalog of integrations and maintain it as new versions of the technologies monitored are released. Not only is this a very inefficient use of global developer resources, it also creates vendor lock-in since you need to re-instrument your systems if you want to change your observability tool.
Finally, the value of (and where customers most benefit from!) innovation is not innovation on the instrumentation itself. It’s improvements and advancements on what you can do with the data that gets collected. The requirement to make a large investment on instrumentation – i.e., the area that delivers little benefit to end users – for new tools to enter the market has created a big barrier to entry and has severely limited innovation in the space.
This is all about to dramatically change, thanks to OpenTelemetry: an emerging open-source standard that is democratizing instrumentation.
OpenTelemetry has already gained a lot of momentum, with support from all major observability vendors, cloud providers, and many end users contributing to the project. It has become the second most active CNCF project in terms of number of contributions only behind Kubernetes. (It’s also recently been accepted as a CNCF incubating project, which reiterates its importance to engineering communities.).
Why is OpenTelemetry so popular?
OpenTelemetry approaches the instrumentation “problem” in a different way. Like other (usually proprietary) attempts, it provides a lot of out-of-the-box instrumentation for application frameworks and infrastructure components, as well as SDKs for developers to add their own instrumentation.
Unlike other instrumentation frameworks, OpenTelemetry covers metrics, traces, and logs, defines an API, semantic conventions, and a standard communication protocol (OpenTelemetry protocol or OTLP). Moreover, it is completely vendor agnostic, with a plugin architecture to export data to any backend.
Even more, OpenTelemetry’s goal is for developers who build technologies for others to use (e.g., application frameworks, databases, web servers, and service meshes) to bake instrumentation directly into the code they produce. This will make instrumentation readily available to anyone who uses the code in the future and avoid the need for another developer to learn the technology and figure out how to write instrumentation for it (which in some cases requires the use of complex techniques like bytecode injection.)
OpenTelemetry unlocks a lot of new value to all developers:
- Interoperability. Analyze the entire flow of requests to your application as they go through your microservices, cloud services, and third party SaaS in your observability tool of choice. Effortlessly send your observability data to a data warehouse to be analyzed alongside your business data. OpenTelemetry’s common API, data semantics, and protocol make all of the above – and more – possible, out-of-the-box.
- Ubiquitous instrumentation. Thanks to a much larger community working together vs. siloed duplicative efforts, everyone benefits from the broadest, deepest, and highest quality instrumentation available.
- Future-proof. You can instrument your code once and use it anywhere since the vendor-agnostic approach enables you to send data to and run analysis in your backend of choice. Before OpenTelemetry, changing observability backends typically required a time-consuming reinstrumentation of your system.
- Lower resource footprint. More and more instrumentation is directly baked into frameworks and technologies instead of injected, resulting in reduced CPU and memory utilization.
- Improved uptime. With OpenTelemetry’s shared metadata, observability tools deliver better correlation between metrics, traces, and logs, so you troubleshoot and resolve production problems faster.
More importantly, companies no longer have to devote time, people, and money to developing their own product-specific instrumentation and can focus on improving developer experience. With access to a broad, deep, and high-quality observability data set of metrics, traces, and logs with no multi-million dollar investment in instrumentation, a new wave of new solutions that leverage observability data is about to come.
Let’s look at some examples to demonstrate what OpenTelemetry will – and is already – enabling developers to do:
- AWS is embedding OpenTelemetry instrumentation across their services. For example, they have released automatic trace instrumentation for Java Lambda functions with no code changes. This gives developers immediate visibility into the performance of their Java code and enables them to send any collected data to their backend of choice. As a result, they’re not tied to a specific vendor and can send the data to multiple backends to solve for different use cases.
- Kubernetes and the popular GraphQL Apollo Server have added initial OpenTelemetry tracing instrumentation to their code. This provides efficient out-of-the-box instrumentation that’s directly embedded in the code through the Go and JavaScript OpenTelemetry libraries, and the instrumentation is written by the experts that have built those technologies.
- Jenkins, the open-source CI/CD server, offers an OpenTelemetry plugin to monitor and troubleshoot jobs using distributed tracing. This gives developers visibility into where time in jobs is spent and where errors are occurring to help troubleshoot and improve those jobs.
- Rookout, a debugger for cloud-native applications, has integrated OpenTelemetry traces to provide additional context within the debugger itself. This helps developers understand the entire flow of the request traversing the code they are troubleshooting, with additional context from tags in the OpenTelemetry data.
- Promscale lets developers store your OpenTelemetry trace data inside Postgres via OTLP. Then, developers can use powerful SQL queries to analyze their traces and correlate them with other business data that’s stored in Postgres. For example, if you develop a SaaS service that uses a database, you could analyze database query response time by customer ARR band to ensure your most valuable customers – who are most likely to suffer from bad query performance, since they store more data in your application – are seeing the best possible performance with your product.
OpenTelemetry is still being (very!) actively developed, so this is just the beginning. While many of the above products and projects will improve the lives of engineers who operate production environments, there is a greenfield of possibilities. With interoperability and ubiquitous instrumentation, there’s massive potential for existing companies to improve their existing products or develop new tools – and for new upstarts and entrepreneurs to leverage OpenTelemetry instrumentation to solve new problems or existing problems with new innovative approaches.
Learn more about OpenTelemetry at KubeCon + CloudNativeCon Oct. 11-15.