What makes Flink unique in regards to data streaming and pipelining? What does the technology offer, or what use cases does it enable that makes it stand out among similar Big Data technologies?
Tzoumas: The combination of batch and true low-latency streaming is unique in Flink. Combining these styles of processing, often termed “lambda architecture,” is becoming increasingly popular. Unlike pure batch or streaming engines, through its hybrid engine Flink can support both high-performance and sophisticated batch programs as well as real-time streaming programs with low latency and complex streaming semantics.
Ewen: In addition, Flink is unique among Big Data systems (and perhaps unique in the open-source Java world) in how it uses memory. Flink was designed to run very robustly in both clusters with plenty of RAM, and in clusters with limited memory resources, or where data vastly exceeds the available memory. To that end, the system manages its own memory and contains sophisticated Java implementations of database-style query processing algorithms, traditionally written in C/C++, that can work both on in-heap and off-heap memory.
How has the open-source community around Flink grown and developed over the past several years?
Tzoumas: The Flink community came initially from the academic world: [the] Technical University of Berlin, Humboldt University of Berlin, [the] Hasso Plattner Institute. As the project gained more exposure, more people from industry joined the project. Recently, a group of Flink committers started Data Artisans, a Berlin-based startup that is committed to developing Flink as open source. We have been very excited about how the community has been developing. The number of contributors to the project has almost doubled since the project went to the Apache Incubator.
What does this milestone of ascension to Top-Level Project mean for Flink? Where does Flink go from here?
Ewen: Graduating to a Top-Level Project is a very important milestone for Flink as it reflects the maturity of Flink both as a system and as a community. The Flink community is currently in its public developer mailing list discussing a developer road map for 2015, which includes several exciting new features in the APIs, optimizer and runtime of the system. In addition, the community has started to put a lot more focus on building libraries, e.g., a Machine Learning library and a Graph processing library, on top of Flink. This work will make the system accessible to a wider audience that will in turn feed back to the community.