LinkedIn announced the open-sourcing of two tools for investigating broken hosts and services, among other website navigation issues. Fossor (Latin for gravedigger) is a Python tool and library that automates multiple investigative checks in parallel, while Ascii Etch, another Python library, outputs information gleaned from Fossor in ascii-character graphs.
“If you’re the oncall engineer when a service breaks at 3 a.m. (as I have been many times before), the ability to automate aspects of diagnosing and repairing the issue is very welcome,” lead developer Steven Callister wrote in a blog post.
Callister borrowed some troubleshooting philosophy from Netflix, which outlined 10 useful commands for website outages in a blog post. “Having experienced the pain of performing the same repetitive steps again and again during my own on-call shifts, I concluded that writing a tool to perform some of these basic checks in parallel would speed up the mean time to resolution,” Callister wrote. “Taking the idea even further, I wanted tool that could perform checks tailored specifically to my services while still having the flexibility to incorporate newly-developed checks in the future. Fossor was created to do just that.”
Fossor’s design splits the two components of the program, the engine and plugins, to reduce the incidence of serious bugs.
“By isolating each plugin in its own process, the main engine is protected from a single plugin failing and crashing the application,” Callister wrote. “This plugin resiliency was specifically built in to allow Fossor to safely manage plugins from many contributors, thereby creating a platform for the bridging of expertise among users.”
Plugins come in three flavors, variable gathering, check and report, and allow the user to specify which information gathered by Fossor is of value. Depending on the type of information, a graph can then be drawn with Ascii Etch.
Ascii Etch was originally created to draw the results from running Fossor. Callister writes that it proved more helpful than simple text for quickly spotting anomalies in the data.
“The original downstream latency plugin for Fossor displayed latency average, minimum, and maximum,” Callister wrote. “While these are useful stats, a quick graph is much clearer and more informative of whether or not there is actually latency downstream.”
Callister says that the development team hopes that the specificity-through-modularity of Fossor will greatly benefit site administrators and the open-source community while it contributes more plugins to the automation tool.