WANdisco today announced the availability of WANdisco Fusion, a tool that can distribute data across multiple Hadoop clusters, keeping them up to date and in sync. Fusion uses active replication to bring updated information from one Hadoop cluster to another, regardless of the distances between them.
Randy DeFauw, director of product marketing at WANdisco, said that this new product will help enterprises roll out production Hadoop servers worldwide.
“The fundamental ability to use the same data from everywhere, as if everyone was running in the same cluster in the same place, this solves a lot of the key challenges the enterprise Hadoop architects were worrying about,” he said.
Fusion is not just about a single type of Hadoop installation, either. The software can be used to bolster processing power in the cloud, by transferring information to AWS for temporary extra processing power. Fusion can also be run to sync different distributions of Hadoop.
“The new architecture also means it has the ability to replicate between different types of Hadoop distributions,” said DeFauw. “You can not only replicate between two Hortonworks clusters, you can replicate between Hortonworks and Cloudera and EMC’s Isilon storage systems.”
That means data can also be backed up off the Hadoop cluster, he said.
DeFauw added that Fusion works by standing in front of the Hadoop File System. “We stand a proxy application in front of whatever the underlying file system application is. We have a Fusion URL instead of an HDFS URL. We intercept and coordinate that information,” he said.
Fusion also works to sync HBase servers, though this required more technical knowledge from WANdisco than straight HDFS syncing, said DeFauw.
“With HBase, it is more interesting,” he said. “The coordination happens for the writes, and each region server maintains its own write log. When it comes time to flush the memstore onto disk and write an HFile, every region server can have its own HFile. It writes to its local sever, but which region server should write to HDFS? We have a coordinated flush, where we choose a specific server that will write the file on the underlying file system.”