Talend, a data integration software maker, has pulled Hadoop into its platform so that map/reduce jobs can be performed within data management projects without requiring separate tooling.
An update to Talend 4.0, released on Tuesday, integrates Cloudera’s distribution of Hadoop for projects that require analysis and transformation of petabytes of data. The update is free for existing customers.
The Talend platform combines data integration, data quality and master data management capabilities into a single Eclipse-based IDE. A repository manages application metadata, business models, business rules, transformation and validation rules, connectors, data validation, and workflows.
Talend has used the platform’s extract, transform and load capabilities to read and write to the Hadoop file system, said cofounder and COO Fabrice Bonan. It is also capable of using Hadoop to process data, he added.
Developers can mix and match multiple data processing modes and are not constrained to using Hadoop, Bonan said. The data is stored within the same database architecture and does not have to be extracted from the database in order to be processed by Hadoop, he said.
Hadoop jobs are created using a visual designer where components are dragged and dropped onto a canvas. Developers visually design mapping and transformations, and Talend generates jobs to be executed inside of Hadoop. “They don’t need to understand its inner workers or the details of Hadoop’s syntax,” Bonan said.
Some Talend customers, including a large online content provider and a telecommunications company, have been early adopters. The content provider uses Hadoop to parse server logs to analyze website traffic patterns and delegation; the telecom provider uses Hadoop for invoice generation, according to Bonan.