Hadoop's growth sparks competition

Published: June 29th, 2011

- Alex Handy

When Hadoop first appeared as an open-source framework for scalable distribution computing with large data sets—back in 2009—the project was a lone player in a seemingly empty marketplace. But two years later, a dozen startups are all vying for the Hadoop crown. A big player has just entered the market: At this year’s Hadoop Summit, Yahoo entered the fray by spinning off its own internal Hadoop group as Hortonworks.

Another sign of growth: This year’s event saw 27 sponsors, all of whom are eager to cash in on this popular open-source ecosystem. By contrast, the 2010 summit had only seven sponsors.

Among last year’s sponsors were Hadoop-specific companies such as Karmasphere and Datameer. This year, however, big names like Dell, IBM, NetApp and Supermicro were all sponsoring the event.

Matt Aslett, senior analyst at The 451 Group, said, “As interest in Hadoop expands from early adopters to mainstream enterprise and government users, we are increasingly seeing the focus shift from development and testing to understanding potential use cases for the core distribution to the value-added tools and services that will enable and accelerate enterprise adoption.”

Interested players
Hortonworks is now just another in a chorus line of Hadoop consulting and services firms. A recent article about Hadoop written on technology news site GigaOM estimated Cloudera’s revenues as a few million dollars, and it pointed out that despite high interest from enterprises, the Hadoop market remains almost exclusively a consultancy-based market, not a product-based market.

And because consulting services don’t scale and rarely bring in the big profits like products can, Hortonworks and other Hadoop firms are facing an uphill battle.

Still, as the Hadoop ecosystem continues to expand and new solutions pop up almost daily, it is the developers who benefit from all of this innovation, even if firms aren’t yet buying Hadoop packages instead of free versions.

And still other firms are spending their time and money on integrating Hadoop into existing process flows, which can often call for packaged software. For these folks, traditional integrations and data management firms have stepped up to the plate.

Firms like Pervasive. Joe Dubin, product manager for Pervasive DataRush, said that his company is preparing a new accelerator for Hadoop users, one that will process batch jobs faster than map/reduce.

“We’re releasing at the end of June in early access form,” he said. “It’s a way to make Hive queries run faster on less hardware without changing Hive scripts. It’s the first in a series of big data accelerators that we will be releasing.

“At a high level, normally when you put a Hive query into Hive, it turns that into map/reduce jobs. We now have it produce an alternative. It can produce DataRush jobs. You access the DataRush back end, construct a DataRush data flow, and execute that query.”

Pervasive’s approach speaks to the Wild-West nature of Hadoop. Enterprises may have fallen in love with the software, but they’re all using it in their own way. Some use Hadoop as a big data store, with HDFS as a way to store petabytes of information cheaply. Others are using Hadoop as a way to pull chunks of data out of cold storage, where they can be moved into a relational database and analyzed with traditional methods. Still, others are using Hadoop as the front-end database by hosting their live information in HBase, the relational database store inside Hadoop.

And thus it all comes back to the central point that Hadoop, as packaged commercial software, isn’t quite ready yet. Cloudera hopes to change this fact with the release of its release of Cloudera Enterprise 3.5. With this release, Cloudera has added support for full life-cycle management of Hadoop jobs, as well as a streamlined management console. The suite is a direct response to what Cloudera sees as the pain points for Hadoop users.

Charles Zedlewski, vice president of product management at Cloudera, said that release 3.5 should push Hadoop from the early adopters to mainstream adoption. “This system was designed by engineers for engineers, and that’s not a tenable way for a typical Fortune 500 company to run Hadoop,” he said.

“The new management suite is a big advance, functionally. With the new and old enhancements, we’ve brought it to a stage where you’re able to manage the full life cycle of the Hadoop system and to diagnose the root cause of problems.”

Cloudera Enterprise 3.5 is not Cloudera’s only product. The company releases free distributions of Hadoop to the public with every passing version of Hadoop and its ecosystem of sub-projects. And while Cloudera’s distribution of Hadoop can be found within Amazon Web Services and across other cloud providers, its competitors are hoping to get in on the distribution action as well.

Karmasphere, for example, announced today the release of its virtual Hadoop appliance for developers at the Hadoop Summit. The company’s distribution is targeted at developers instead of administrators, and thus the company hopes to skirt Cloudera’s popular version of Hadoop by offering one targeted at the development process of batch jobs.

Abe Taha, vice president of engineering at Karmasphere, said, “We know that more and more companies are attracted to the power of Hadoop but just don’t know how to get started. With our appliance, developers can now jumpstart their Hadoop projects with or without a cluster installed and immediately prepare to support the needs of the data analyst professionals across the company who are looking to unlock the intelligence inside unstructured data.”

Combine this new offering with Datameer’s Excel-like data manipulation tools, and Hadoop is expanding into an end-to-end ecosystem of data manipulation, analysis and storage.

Article Tags

Hadoop

About Alex Handy

Alex Handy is the Senior Editor of Software Development Times.

View all posts by Alex Handy

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Hadoop’s growth sparks competition

Article Tags

Subscribe to SDTimes

About Alex Handy

Related Articles

Is the Hadoop party over?

What the Cloudera and Hortonworks merger means

SD Times open source project of the week: TonY

IBM expands data science’s reach