Google this morning removed the beta label from Google Cloud Dataflow. The company also introduced Google Cloud Pub/Sub, as well as integrations between these services and the Google BigQuery service. Together, developers can use these services to provide analytics, stream processing and batch processing applications within the same infrastructure.
Cloud Dataflow is built around the company’s experience with MapReduce, FlumeJava and MillWheel. The software allows for data to be ingested, filtered and grouped. The Cloud Dataflow can also perform transformations, allowing for ETL operations to be performed inside Google’s Cloud Platform.
(Related: Intel introduces Cloud for All initiative)
That means a single platform can be used to support both batch operations and streaming operations. Using the entire Google Cloud Platform enables data to be processed according to its timestamps, allowing analytics to be more accurate and timely.
Cloud Pub/Sub, on the other hand, allows developers to grab data within specific time windows before transformations are performed. This allows developers to perform GroupByKey or Combine operations on the data as it is being ingested.
“We’re excited to collaborate with Google Cloud Platform on integrations with Salesforce Wave,” said Olivier Pin, vice president of product management for Wave Analytics for Salesforce.com.
“The integrations with Google Cloud Dataflow further enable Wave to deliver insights to business users. Businesses can now use vast, diverse datasets like machine-generated data to derive customer insights in near-real-time.”
With the now fully integrated BigQuery stack, developers can stream in data, modify it, and use it in analytics workflows with more accurate data windows. Other Google APIs can be used to push data into this new workflow. The Gmail Push API, for example, can push e-mail information into BigQuery for analysis.