When it comes to dealing with big data, the Apache Hadoop project has a head start on the problem. Today, a new company called Datameer announced that it has created tools to make interfacing with Hadoop data easier.
Datameer was founded by CEO Ajay Anand and CTO Stefan Groschupf, both of whom have extensive experience with Hadoop from their previous work at Yahoo and on the Nutch search engine project, respectively. Their new company attempts to solve one of Hadoop’s biggest problems: complexity.
“One big power of Hadoop is the ability to have a common place for collecting data from structured sources and unstructured sources,” said Anand. “Business analysts should be able to get insights into the data without having to do heavy lifting. Our value is the product that sits on top of Hadoop.
“In order to get results [with Hadoop], you have to really dedicate a team that’s putting together the plumbing,” he continued. “It’s not designed for business users; it’s designed for programmers. [Hadoop company] Cloudera has built a training program around it and is providing more expertise to programmers. We say you shouldn’t be a programmer if you want to do analytics.”
To that end, Anand and Groschupf have built a Web-based interface for business analysts that requires no programming knowledge to use. Normally, Hadoop users would be required to write their batch processing jobs in Java, or in one of the many Hadoop-specific query frameworks, such as Hive or Pig. Datameer ditches these approaches and replaces them with a simple spreadsheet interface.
Analysts can pull a mathematically relevant sample of each dataset stored in Hadoop, and then arrange and manipulate these samples in the Web-based spreadsheet. Once they’ve prepared the transformations, comparisons and graphs they want to build from the data, they can then submit the job to the Hadoop cluster, where the proposed actions are performed on the full data sets.
Right time, right place
Datameer is launching into the market at a time when Hadoop has just crested the wave of hype, and has begun to head towards the shores of value. Ping Li, a partner at investment firm Accel Partners and a member of the board of directors at Cloudera, said that he’s seeing a change in what customers are looking to get from Hadoop.
“It’s not about ROI; it’s about extracting value,” he said. “How can we extract value out of this data? I think the next phase has begun, when people start thinking about value creation. Hadoop is not a commoditization play like MySQL or JBoss. There will be a value creation element to it.
“I’ve seen a bunch of startups talk about building business intelligence stacks on top of Hadoop…If you have a more scalable back end, you can do richer things on the front end.”
That’s what Anand and Groschupf have set out to do with Datameer. “We feel the pain,” said Groschupf. “Everybody is providing developer training and developer tools. Ajay and I always worked for business users, and we had endless meetings where people would get excited.
“But every time they want to do something in Hadoop, they have to have a developer meeting. But what if I want to answer this question right now? We want to provide that capability to a very big market for people empowered to use big data.”