IBM’s M2 corrals massive data sets with Hadoop

Jeff Feinman
October 2, 2009 —  (Page 1 of 2)
With 1,386 members making up the two houses of the Parliament of the United Kingdom, there is certainly no shortage of government data flowing from the territories of Great Britain and Northern Ireland. Bills must be voted upon, elections must be carried out, and many other actions must and tracked.

That is one of the reasons why IBM created M2, an enterprise data analysis platform. M2, announced today at Hadoop World in New York, aims to help organizations better gather important government and business data. It was built using Apache Hadoop, an open-source Java framework that enables applications to work with large sets of data.

M2 is IBM’s latest Web 2.0 technology, joining the ranks of the Mashup Center mashup platform and WebSphere sMash Web application development environment.

Rod Smith, vice president of IBM’s emerging technologies unit, said M2 is different from other data analyzers because it is flexible and able to scale to large data sets. It can also integrate with other visualization and analytic engines, such as IBM’s Cognos business intelligence software.

Smith said customers spoke about how they didn’t know how to harvest vast amounts of data properly for business intelligence and analytics. “We scratched our heads about it for a while, and then when the Hadoop project got started up, it looked like a good foundation to build on where we could explore the idea of doing do-it-yourself analytics,” he said.

“It’s about deeper intelligence that’s more exploratory than what you’d think about from a data warehouse.”

In a demo with SD Times, IBM showed a BBC data mashup called “Digital Democracy,” which sifts through government-published data and makes that information easier to access for BBC journalists. The mashup can show which members of Parliament are working on what bills, as well as voting records, demographic trends and many other data points.

M2 has a spider that crawls the Internet to retrieve content, but content can come from other sources, such as internal databases. In the case of the “Digital Democracy” mashup, the spider collected a few million pages of content over four days, according to Stewart Nickolas, a distinguished engineer for IBM’s emerging technologies unit. For a crawl, a user will identify URLs he or she would like to begin with and how vast a search they want to conduct.

Related Search Term(s): Hadoop, IBM

Pages 1 2 

Share this link:

Doug Cutting: Why Hadoop is still No. 1
How Hadoop become the de facto standard, and what it plans on doing next Read More...

News on Monday  more>>
Android Developer News  more>>
SharePoint Tech Report  more>>
Big Data TechReport  more>>



Download Current Issue

Need Back Issues?

Want to subscribe?