Amazon is releasing a new data management service to enable collaboration on data between different areas of an organization. Amazon DataZone enables cataloging, discovery, analysis, sharing, and governance of data between its producers and consumers.
The data producers provide the data catalog with structured data assets, and then consumers can subscribe to those assets and share them with other collaborators.
“Every enterprise is made up of multiple teams that own and use data across a variety of data stores,” said Shikha Verma, head of product for Amazon DataZone, during a keynote at AWS re:Invent earlier this year when the solution was being previewed for the first time. “Data people have to pull this data together but do not have an easy way to access or even have visibility to this data. DataZone provides a unified environment where everyone in an organization—from data producers to consumers, can go to access and share data in a governed manner.”
Data is stored in domains, which are representative of a boundary within the business. These domains include the core components of Amazon DataZone: the data portal, data catalog, projects and environments, and built-in workflows.
The data portal is a web application that offers self-service capabilities for accessing data. For authentication, it uses AWS Identity and Access Manager (IAM) or credentials from external identity providers verified in the AWS IAM Identity Center.
The data catalog is where data can be sorted and organized based on a predefined taxonomy, like business context. By organizing this way, it is easier for users to find data quickly, AWS explained.
Users can also create data projects and environments. Projects can be used to create groupings of people, data assets, and analytics tools based on use case. Environments are the foundation for this, providing access to the tools and data needed to produce or consume data. “Amazon DataZone projects provide a space where project members can collaborate, exchange data, and share data assets,” AWS wrote in a blog post.
And finally, there are built-in workflows to enable easy governance and access control. When a consumer makes a request to access data, the owner will need to approve the request, and then Amazon DataZone handles granting access based on the set permissions for the data store.
“To get started, consider a scenario where a product marketing team wants to run campaigns to drive product adoption. To do this, they need to analyze product sales data owned by a sales team. In this walkthrough, the sales team, which acts as the data producer, publishes sales data in Amazon DataZone. Then the marketing team, which acts as the data consumer, subscribes to sales data and analyzes it in order to build a campaign strategy,” AWS wrote.