Jim Scott, director of enterprise strategy and architecture at MapR, said that SQL still drives the needs of many enterprises. “When it comes down to it, most people need the rudimentary basics of ANSI SQL, and the tiny subset of that in Hive is usually less than adequate,” he said.
“Calcite is just sitting out there waiting to be used. Drill helped open that one up. When it comes down to it, look at the history of SQL on Hadoop technologies. Apache Hive was a great entry into expressing SQL at scale. Apache Impala came along and took a step forward and said, ‘We need to make this faster.’ They didn’t necessarily fix the problems. They just made something run faster, so it has a complete dependency on Hive.”
Scott predicted change will come to the SQL-on-Hadoop market, mainly because existing solutions are not optimum. “I think what it comes down to is the logical model these platforms have been built on are not the easiest to adapt to the complexity of SQL will support,” he said.
“Idealistically, people are going to put their hands on a tool like Apache Drill [and] say, ‘I can start with this on my laptop and can query every data store in my enterprise.’ Drill supports utilizing the Hive metastore, but does not require Hive to use it. There has been a competitive landscape of SQL on Hadoop.”
Perhaps the best way to describe Apache Calcite is to let the project describe itself. According to the Apache site:
Apache Calcite is a dynamic data-management framework.
It contains many of the pieces that comprise a typical database-management system, but omits some key functions: storage of data, algorithms to process data, and a repository for storing metadata.
“Calcite intentionally stays out of the business of storing and processing data. As we shall see, this makes it an excellent choice for mediating between applications and one or more data-storage locations and data-processing engines. It is also a perfect foundation for building a database: Just add data.