Apache Storm hit a milestone release today. The Apache Software Foundation announced version 1.0 of the open-source distributed real-time computation framework for processing large streams of data with a new native streaming windows API.
“Window-based computations are common among use cases in stream processing, where the unbounded stream of data is split into finite sets based on some criteria (e.g. time) and a computation is applied on each group of events,” wrote Taylor Goetz, vice president of Apache Storm, in a blog post.
Windows are often used for functions such as aggregations, joins and pattern matching. In previous releases, users had to rely on their own windowing logic, and there was no way to define a window in a topology. With version 1.0’s native streaming window API, users can specify windows as window length or sliding interval.
Goetz also claimed that with version 1.0, Apache Storm runs up to 16x faster with 60% reduced latency than previous releases.
Other features of the latest version include:
- Pacemaker: an optional program that acts as a in-memory key/value store
- A distributed cache API that allows users to share files among topologies
- HA Nimbus, designed to eliminate the “soft” point of failure in the Nimbus service
- A new Stateful Bolt API with automatic checkpointing
- The ability for users and administrators to change log settings
- A new topology debugging capability that aims to eliminate the need to add debugging functions
- Distributed log search in Storm’s UI in order for users to search across log files and find a specific topology
- A new automatic backpressure mechanism
- A new scheduler implementation
- A dynamic worker-profiling feature
Apache Storm was created in 2011, and graduated to an ASF top-level project in 2014.
“Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, [and] can be used with any programming language,” according to the project’s website.