This week’s featured open-source project is Lumos, a Python library built to compare metrics between two datasets, accounting for population differences and invariant features.
Lumos was open sourced this month by Microsoft. In a technical paper that shows the results from a real-world deployment of Lumos in Microsoft RTC applications , the Microsoft team wrote: “Regressions in metric values need to be detected and diagnosed as early as possible to reduce the disruption to users and product owners.”
“It has enabled engineering teams to detect 100s of real changes in metrics and reject 1000s of false alarms detected by anomaly detectors,” the team added.
According to Microsoft, regressions commonly surface due to genuine product regressions, changes in user population, and bias due to telemetry loss or processing.
The application of Lumos has resulted in freeing up as much as 95% of the time allocated to metric-based investigations. Now the project has been open sourced, it can be coupled with any production system to manage the volume of alerting efficiently, Microsoft explained.
The solution uses a new methodology that includes existing, domain-specific anomaly detectors, but was found to reduce the false-positive alert rate by over 90%. It also provides insight into locating the root cause of a Key Performance Indicator (KPI) incident.
Lumos uses the principles of A/B testing and compares the dataset before the metric anomaly and the dataset during the metric anomaly. The configuration file contains hyper-parameters for running the workflow and details which columns in the dataset correspond to the metric, invariant, and hypothesis columns.
This project has adopted the Microsoft Open Source Code of Conduct and is welcoming contributions and suggestions.