When things break at large companies, there’s usually an on-call team or engineer that’s ready to step in and make sure the problem gets fixed. However, the manual process of calling up an on-call engineer or contacting a second team member to fix the issue takes too long and automating this process is more efficient.
At LinkedIn, the company wanted to say goodbye to this manual process, so in the summer of 2015, they created a new, automated system Iris, named after the Greek goddess of messages. Iris is an open-source project for incident escalation and reliable messaging, and according to a blog by Daniel Wang, site reliability engineer at LinkedIn, it “solves the problem of ambiguity by allowing its users to specifically define an escalation plan that it will automatically follow in the event of an incident.”
Another feature to Iris is that no specific modes of contact are specified anywhere, and instead, it defines priorities from low to urgent, and allows users to map contact modes to these priorities, said Wang.
In addition, Iris includes a feature designed to highlight is its design and its architecture. Everything external to Iris is pluggable, general and abstract. Iris also solves issues with abstraction, remaining flexible and adaptable.
LinkedIn also introduced another project to provide Iris with a “source of truth” for determining who is on-call for a given team. The product is called Oncall, and it allows managers to define rotating schedules for on-call shifts, and provides a calendar for viewing and changing these shifts as needed, according to Wang.
“Oncall comes with built-in support for follow-the-sun schedules, and provides a clean UI for swapping, editing, and deleting events,” wrote Wang. “It supports a number of different event types, and has built-in shortcuts for overriding an existing shift, should the need arise for a substitution. It acts as a specialized calendar of sorts, making management of on-call schedules fast, clean, and painless.”
By providing Iris and Oncall as open-source products, LinkedIn hopes the community can have a production ready escalation system that is “free, open, and growing.” Developers can check out the code and documentation, and LinkedIn welcomes any potential users or contributors.