The Apache Kudu Project is, as of today, a top-level project within the open-source technology foundation. Originally contributed by Cloudera, the project is an effort to build a highly efficient and fast analytics platform for quickly moving data, such as streams.
Kudu, in practice, is actually a columnar storage manager for Hadoop. The system is designed to quickly process OLAP workloads, integrate with Spark, and to work with Apache Impala. This last feature enables the pair to perform the same function as HDFS combined with Parquet, another Hadoop-based columnar storage engine.
Overall, Kudu is highly available and extremely consistent, meaning it’s not suitable for partitions, but rather lives inside a single Hadoop installation as the analytics back end.
(Related: Hadoop Summit hints at platform’s growth)
Kudu had, up to now, been developed within the Apache Incubator. The first step after its move to top-level project is to head toward version 1.0. Todd Lipcon, vice president of Apache Kudu and software engineer at Cloudera, said, “Under the Apache Incubator, the Kudu community has grown to more than 45 developers and hundreds of users. We are excited to be recognized for our strong open-source community and are looking forward to our upcoming 1.0 release.
“Graduation to a top-level project marks an important milestone in the Apache Kudu community, but we are really just beginning to achieve our vision of a hybrid storage engine for analytics and real-time processing. As our community continues to grow, we welcome feedback, use cases, bug reports, patch submissions, documentation, new integrations, and all other contributions.”