What’s new in Big Data at Strata + Hadoop World

Published: October 15th, 2014

- Rob Marvin

Cloudera releases Cloudera Enterprise 5.2, Impala 2.0, Cloudera Director product and Cloudera Accelerator Program
Cloudera used the Strata stage to unveil a slew of new products, programs and updates.

The Apache Hadoop software provider released version 5.2 of Cloudera Enterprise, its data analytics management solution for Hadoop, with several security advancements including simple key management, enhanced auditing and component coverage in Cloudera Navigator and sentry policy management support in the open-source Hadoop UI.

Cloudera also announced a new product and programs, the Cloudera Director self-service platform for managing enterprise cloud deployments and two new programs real-time streaming innovation—the Cloudera Accelerator Program and Cloudera Labs.

Cloudera Director extends the company’s enterprise data hub architecture to the cloud, allowing for self-service provisioning through a simple interface and foundational support for hybrid deployments across multiple cloud environments. The Cloudera Accelerator Program will work with partners to further advance real-time streaming architectures such as the Apache Spark framework, while Cloudera Labs will serve as a virtual incubator for open-source initiatives the Cloudera engineering team is contributing to, such as the Apache Kafka fault-tolerant messaging system.

Finally, Cloudera announced version 2.0 of its Impala open-source analytics database running natively in Hadoop. Impala is a core component of Cloudera Enterprise 5.2, and the 2.0 release bolsters the platform with SQL 2003 support for standards-based analytics, role-based access controls and Apache Sentry integration, legacy data type migration and vendor-specific SQL extensions.

Hortonworks Data Platform 2.2 released
Popular Hadoop distribution Hortonworks announced version 2.2 of the Hortonworks Data Platform, the company’s enterprise data platform for the Hadoop YARN subproject.

HDP 2.2 adds more than 100 new features that integrate with YARN to enable batch, interactive and real-time methods of interacting with a single set of Hadoop data.

A YARN-ready Apache Spark engine engine for data science and an Apache Kafka engine for Internet of Things data processing.
Enterprise SQL at Hadoop scale with the Stinger.next initiative, adding updated SQL semantics for ACID transactions in Apache Hive, a cost-based optimizer for better SQL query performance and ORC file compression.
Apache Argus for centralized security administration and policy enforcement, integrated with Apache Storm and Samsung Knox with the ability to enforce policy with Hive and HBase.
Management and monitoring improvements: 100% uptime target with cluster rolling upgrades, Ambari Views for custom visualization and Ambari Blueprints to deliver template cluster deployment.
Automated cluster backup to the cloud for Microsoft® Azure and Amazon S3.

An HDP 2.2 preview is currently available, and the release will be generally available in November.

MongoDB announces enhancements to MongoDB Management Service

MongoDB is rolling out upgrades to its database management service, MMS, to improve MongoDB provisioning, monitoring, backup and scaling.
The popular cross-platform NoSQL database claims the revamped MongoDB Management Service reduces operational overhead by up to 95% for any size deployment. The key enhanced elements of MMS that MongoDB highlighted include:

Advanced AWS Integration: MMS can provision and optimize Amazon Web Services instances for MongoDB automatically.
Upgrades: MMS manages upgrades and downgrades of deployments in minutes, with no downtime.
Scale Out: Users can rapidly scale deployments, adding capacity without taking the application offline.
Infrastructure Agnostic: MMS works with any internet-connected infrastructure including public or private clouds and laptops, controlled through a single interface.
Continuous Backups: MMS backs up deployments continuously, seconds behind the production database, without impacting overhead.
Point-in-time Recovery: Users can restore deployments to any point in time.
Performance Alerts: Users can be notified on custom alerts for over 100 system metrics, via email, SMS, PagerDuty, HipChat, and others services.

Pentaho announces Data Refinery Blueprint for automated data modeling
Enterprise Big Data analytics platform Pentaho announced a new architecture blueprint for Streamlined Data Refinery, a design pattern for orchestrating blended data sets for on-demand Hadoop queries.

According to Pentaho, a Streamlined Data Refinery solution can expand automated business user capabilities through secure, blended and on-demand analytics. The blueprint launch supports Streamlined Data Refinery architectures by automating the modeling process and publishing data large-scale analytical databases such as HP Vertica while still meeting IT requirements.

Predixion Software releases Predixion Insight 4.0
Cloud-based predictive analytics software provider Predixion Software announced the latest version of its predictive analytics platform, Predixion Insight 4.0. The release expands the platform’s predictive analytics capabilities across applications, databases, data stores, real-time engines, devices and machines.

New features and support in Predixion Insight 4.0 include:

Deployment of scripts and packages created with other predictive modeling tools by leveraging machine learning libraries, statistical programming languages such as R and Maht, and PMML integration support.
Combination of structured and unstructured data from multiple sources.
Providing visualizations and summaries with immediate feedback.
A single portable object called an MLSM (Machine Language Semantic Model) package containing all transformations and analytics for deployment anywhere.
Solution Accelerator, a framework for rapid creation of custom web-based predictive application.

Attunity Replicate 4.0
Attunity has released Attunity Replicate 4.0, the latest version of its Big Data replication solution for Hadoop.

The Big Data management and distribution provider aims to reduce time, labor and Hadoop implementation costs with version 4.0, adding high performance data loading and extractions for Hadoop to Attunity Replicate with optimized processes and APIs. Other new features include drag-and-drop data configuration, a Web-based performance metrics dashboard and certification with top Hadoop distributions including Hortonworks and Cloudera.

GraphLab Create 1.0 now generally available
GraphLab, a high-performance distributed computation startup behind the parallel machine learning C++ framework, announced the general availability of GraphLab Create 1.0.

New features added in the GraphLab Create 1.0 release include the ability to build predictive, scalable applications deployed on AWS and queried in real-time with a RESTful API. The release also adds expanded Deep Learning, Boosted Tree algorithm and dashboard visualization capabilities, along with a new Auto-tuning Toolkits API that automatically selects a machine learning model for enterprises.

GraphLab also marked the Create 1.0 release with Hadoop, Apache Spark and Apache Avro integrations.

About Rob Marvin

Rob Marvin has covered the software development and technology industry as Online & Social Media Editor at SD Times since July 2013. He is a 2013 graduate of the S.I. Newhouse School of Public Communications at Syracuse University with dual degrees in Magazine Journalism and Psychology. Rob enjoys writing about everything from features, entertainment, news and culture to his current work covering the software development industry. Reach him on Twitter at @rjmarvin1.

View all posts by Rob Marvin

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

What’s new in Big Data at Strata + Hadoop World

Article Tags

Subscribe to SDTimes

About Rob Marvin

Related Articles

MariaDB unifies transactional, analytical, and vector databases in MariaDB Enterprise Platform 2026 release

How Progress helps SaaS vendors deliver BI success to customers via the DataDirect Hybrid Data Pipeline solution

This week in AI updates: OpenAI Codex updates, Claude integration in Xcode 26, and more (September 19, 2025)

MongoDB brings Search and Vector Search to self-managed versions of database