Datameer, the only proven big data insights platform for rapid data discovery, today announced new data governance capabilities for its native Hadoop environment. As a pioneer in big data analytics, Datameer is helping solidify Hadoop as a mature and transparent platform for production-ready, mission-critical and regulatory-compliant analytics use cases.

While big data analytics is enabling significant new use cases, it’s also becoming increasingly complex. Analysts and administrators alike need an easier way to navigate data pipelines that have been developed by multiple departments and participants, and involve multiple data sources. Additionally, a number of these use cases are now incorporating sensitive data sets such as PII, PCI and PHI, or require legal compliance with specific regulations such as Basel international banking regulations. As these types of use cases occur more frequently, it’s imperative that quality and consistency, data policies and standards, data security and privacy, regulatory compliance and retention and archiving are recognized as “must-have” capabilities for enterprises across the board. Users no longer have to choose between a robust, governable data regime often associated with traditional data warehouse implementations and the ease-of-use of a self-service Hadoop platform. Now, by adding Datameer’s premium module, businesses have complete transparency into their data pipelines, and can provide IT with the appropriate tools to audit diligently for compliance with internal and external regulations.

“The world of big data, which includes Hadoop, needs to take data governance more seriously in order to become ready for enterprise-grade deployments,” said John L. Myers, managing research director of business intelligence at Enterprise Management Associates. “As more technologies join next-generation data management environments, open architectures such as Datameer’s are going to be critical in meeting both internal and external data governance requirements to make those solutions enterprise ready.”

“Hadoop has been seen as the Wild West in which vendors have been developing different products for the ecosystem without really thinking about data governance and sophisticated security protocols,” said Stefan Groschupf, CEO of Datameer. “With these new features we’re driving home the point that we’re serious about helping enterprises transform their business into data-driven organizations.”

Quality & Consistency
Data quality and consistency are imperative when it comes to ultimately extracting value from big data. If at any point in the data pipeline there is a question about data validity, the overall value of the resulting insights is in question. Datameer’s data profiling tools enable you to check and remediate issues like dirty, inconsistent or invalid data at any stage in a complex analytics pipeline, and provides transparency into every change, from the original dataset all the way through to the final visualization.

Datameer’s capabilities include data profiling, data statistics monitoring, metadata management and impact analysis.

Data Policies & Standards
Data access policies are the first line of defense against risk for businesses. For IT, the goal is to implement policies that allow them to manage risk appropriately, while still meeting business needs. Specifically, Datameer supports secure data views and multi-stage analytics pipelines.

Data Security & Privacy
True big data security needs to exceed that of the Hadoop Distributed File System’s built-in capabilities. Datameer provides LDAP / Active Directory integration, role-based access control, permissions and sharing, integration with Apache Sentry 1.4 and column and row security/anonymization functions.

Regulatory Compliance
Across several industries, there are legal imperatives for big data governance such as Sarbanes Oxley, Basel, HIPAA and PCI compliance. Features like data lineage involve artifact/file level dependency graphs, dependencies REST API and worksheet lineage. Auditing functionality includes user action log and allows external systems to be apprised of user and system audit events as they happen.

Retention & Archiving
In Datameer, flexible retention rules allow each imported data set’s retention policy to be configured by an individual set of rules. It is easy to configure Datameer to keep data permanently, or to purge records that are older than a specific time window. Independent of time, retention rules can also be configured based on the number of runs of ELT ingests or analytics workbook executions. Security rules allow retired data to be either instantly removed, retained until a specified time, or manually removed after system administrator approval.

Learn More
For more information, please register for the upcoming Datameer Data Governance webinar, which will be held Tuesday, June 23 at 11 a.m. PT / 2 p.m. ET.