Yelp saved itself US$10 million by building out its Apache Kafka-based Data Pipeline, and now it wants to spread that love to other enterprises. Just before the holidays, Yelp open-sourced its Data Pipeline and assorted utilities used to maintain and build out this streaming data platform.
Data Pipeline is now available on GitHub under the Apache 2.0 license. Using Data Pipeline, developers can tie their applications into the constantly flowing stream of Kafka data. The company detailed this in a blog entry.
(Related: Linux Foundation provides insights into the open cloud)
Jason Fennell, vice president of engineering at Yelp, said that Data Pipeline coupled with Kafka provides benefits to all data streaming through the company’s systems. “We’ll build a connector from Kafka to our Salesforce instance, and now we have this real-time stream of updates from our core databases into Salesforce. We managed to make a process that could take as long as three weeks to get data, down to a few seconds,” he said.
“We can start plugging in other sorts of things. It’s not just Salesforce, but also Redshift that a lot of our business strategy folks use. As we hook up other things like MySQL so logs are coming into our data pipeline, Kafka forms this central routing layer for us, which means each additional source we add gets multiplicative influence.”
Fennell went on to say that “The impetus for us was that we were looking at our data warehouse. We stuff all our data together, and business and strategy folks can make data-driven decisions about sales, strategy or product strategy. That process used to be extraordinarily manual. For every table in our MySQL, an engineer would have to do work to get it out to that data warehouse. It was from several days to a couple weeks of work.
“We started by looking at our data warehouse. It would take 10 to 15 years of work to get all our data there, and we needed it there sooner. Even with the amount of time and effort we put into this pipeline, we think we saved $10 million in terms of lower engineering costs by building this system. Once we connected up Salesforce, that starts to push that number up.”