WANdisco, a leading provider of continuous-availability software for global enterprises to meet the challenges of Big Data, announced today that it had contributed code to the Apache Hadoop open source project that enables changes to the Hadoop Distributed File System (HDFS) to be undone automatically when a transaction is aborted. This new feature, referred to as TRUNCATE, is a standard capability of transactional systems. Previously, if a user mistakenly appended data to an existing file stored in HDFS, their only recourse was to recreate the file by rewriting the contents. In addition, software engineers developing Hadoop Big Data applications were forced to write code to work around this limitation.
WANdisco’s team, led by Dr. Konstantin Shvachko, the company’s Chief Architect for Big Data, who is also a senior committer on the Hadoop Project Management Committee and one of the original developers of HDFS, led the TRUNCATE effort. Other members of the team included Dr. Konstantin Boudnik, Plamen Jeliazkov, and Byron Wong. All Hadoop distributions will be able to leverage this major enhancement and users as well as application developers will benefit greatly.
“TRUNCATE represents a significant step forward that all Hadoop users and application developers will benefit from,” said David Richards, CEO and Co-Founder of WANdisco. “WANdisco has been a sponsor of the Apache Software Foundation for many years, with senior committers on staff who have made significant contributions to Apache open source projects. Our work on TRUNCATE further demonstrates WANdisco’s deep and continued commitment to the Apache open source community.”
Further details about HDFS TRUNCATE can be found at: https://issues.apache.org/jira/browse/HDFS-3107.