Earlier this week, Microsoft announced the open-source release of its streaming data analysis and query tool Trill, a single-node engine library that can be incorporated into .NET applications to process complex queries in real-time or offline data sets.
In a blog post about the release, principal software engineer James Terwilliger explained that Trill has been incorporated to great processing-over-time benefit in internal projects like Bing Ads, Azure Stream Analytics and the Halo games since 2013. The open-source release, Terwilliger wrote, was spurred by Microsoft’s desire to bring IStreamable abstraction to users to compliment the IEnumerable and IObserveable capabilities they already have access to, along with the promise of community-involved development going forward.
Terwilliger describes Trill as able to process “a trillion events per day,” with filters operating at “several billions of events per second” and grouped aggregates at “10 to 100 million events per second” of memory bandwidth speed.
“Trill has enabled us to process large scale data in petabytes, within a few minutes and near real-time compared to traditional processing that would give us results in 24 plus hours,” Rajesh Nagpal, principal program manager at Bing, said of his team’s use of Trill for Bing Ads. “The key capabilities that differentiate Trill in our view are the ability to do complex event processing, clean APIs for tracking and debugging, and the ability to run the stream processing pipeline continuously using temporal semantics. Without Trill, we would have been struggling to get streaming at scale, especially with the additional complex requirements we have for our specific big data processing needs.”
Terwilliger explained that there are a few other Trill-based projects in the pipeline, including digital signal processing, improvements to the library’s ability to handle “out of order” data and management of operator states with the open-source FASTER framework.