MapR Technologies announced a broad range of improvements coming in MapR Data Platform 6.1 today, including performance boosts for AI and analytics data processing, a simplified and more accessible platform for developing AI and analytics applications, and streamlined security options.
Further cross-compatibility between on-premises and remote data stores is a major focus of the update, the company wrote in the announcement. It is accomplishing this by making MapR-stored data more easily accessible with support for NFSv4, access via the Amazon S3 API and MapR Data Fabric for Kubernetes, which provides data management for Kubernetes-based containerized applications.
“Many machine learning systems run best in containers managed by Kubernetes,” Ted Dunning, chief application architect at MapR, wrote in a blog post about the release. “In fact, Kubernetes is a candidate for the software having the fastest widespread enterprise adoption in quite some time. Kubernetes, however, only manages the compute side of the problem. That computation is done by programs running in containers, and containers work best when they aren’t bogged down by tons of data…That is, Kubernetes should manage your containers, and MapR should manage your data.”
This cross-compatibility will have benefits for AI and machine learning development using MapR, Dunning explained.
“Taking just one example of how MapR helps with machine learning, large-scale learning often requires that training data be extracted from very large historical records,” Dunning wrote. “The tools that are best at this are typically systems like Spark that read data from S3 or HDFS. Training a model, however, requires the use of special purpose machine learning software that is happiest reading data using standard file access methods and often needs to run on specialized GPU machines. With MapR, Spark, and GPU, programs run in the same cluster and can access the same data. You can deal with the scale of the raw data and the raw speed of the GPUs in the same system.”
The company is also aiming to lower TCO with this release by introducing features that reduce overall storage use. Dunning again explains:
“With the 6.1 release, data in a MapR cluster that doesn’t need maximum performance can be stored using erasure coding,” he wrote. “This allows the data security provided by the normal triplication of data to be achieved with about half the storage. The cost of storage can be cut even more by moving cold data entirely out of the cluster into an object storage system. Such systems are optimized for low-cost, so moving data there decreases cost. This process of moving data to less expensive, lower-performance alternative storage is known as object tiering. Performance of these lower-cost tiers can be much lower than that of a MapR system, but if data has been moved to a low-cost tier, reading it will cause the data to be recalled, thus allowing high performance access again.”
Finally, MapR is improving its encryption technology with the ability to automatically generate new keys at intervals for “hot data,” or data that’s ready and waiting to be utilized, which is at rest as opposed to being actively accessed. The company’s technique is similar to one used to encrypt network traffic.
“A big difference between wire-level encryption and encryption at rest is that keys for data in motion only need to be maintained while the data is still in motion,” Dunning wrote. “Keys for data at rest, on the other hand, have to be managed for as long as the data might still be read. Ideally, users don’t have to set up any additional services in order to make this work. Encryption is also used in the object tiering system. As files and parts of files are designated as cold data and are written to objects, an additional layer of encryption is done. Again, key rotation is handled completely automatically.”
MapR Data Platform 6.1 is entering a beta for customers now with general availability coming next quarter, the company says.
“Customers have made it clear that traditional approaches to managing and processing data for AI and Analytics leave critical gaps,” Anoop Dawar, senior vice president product management and marketing at MapR said in the announcement. “In response, MapR’s newest innovations enable data scientists and developers to power distributed AI and analytics by leveraging all data for more impactful results.”