Big Data analytics is transforming the way entire industries operate, and yet individual organizations are still struggling to unlock the value of their data. Given the speed at which businesses operate today, a lot of tool and platform providers are attempting to balance the power and scalability of their offerings with ease of use. Although the choices and the value tools provide can vary greatly, there are nevertheless some commonalities that benefit developers and their organizations.
“Data is coming in from all different sources today, including transactional data, clickstreams, mobile devices, operational log files, public data and sensor data,” said John Fanelli, VP of marketing at DataTorrent, which built the first enterprise-grade real-time stream processing platform on Hadoop. “Enterprises have to rethink how they handle and how they process Big Data, and more importantly what value they can extract from the data.”
In many organizations, data remains trapped in disparate systems, which makes even interdepartmental use of it difficult.
“The most powerful and interesting aspect of this is not just having more data, but the links between the different dimensions that allow you to do something to effect a relationship,” said Buddy Brewer, VP of business development at SOASTA, a performance analytics solution provider.
Serving up ‘the right’ data
There are an increasing number of platforms designed to tie different data sources together, organize them, and provide insights. Over time, they are getting easier to use, more powerful and more flexible so that enterprises can respond more effectively to change.
“The right data sources means all data sources,” said Fanelli. “We’re fortunate that solutions like Hadoop drive down the cost of storage in a way that allows you to maintain high-fidelity data for when you need to have access to all [that] data.”
Over the past few years, companies have been integrating structured and unstructured data to make better business decisions faster as well as to improve customer relationships and operational effectiveness. As organizations mature in their use of analytics tools, they want the ability to unearth what was previously undiscoverable.
“Traditional analytics requires user-generated queries, so the users must know what to ask,” said Joe Leung, a product marketing manager at HP. “However, there are times that users don’t know what they don’t know, let alone what data sources are needed. This could lead to risky blind spots. The new approach is to let your data tell the story.”
Of course, accurate analytics require accurate data. Although some organizations are saving everything, they may not know what to do with the data, how to catalog it, or what value it may provide in the future.
“Many of our customers would just copy files into Hadoop and throw them into the data lake,” said Fanelli. “We help them with ingestion [so that] they can examine the data, clean it up a bit, and tag the data so when they go into Hadoop, they know where the data is, they understand the schema, and they have a mechanism for getting the data out.”
SOASTA is helping its customers tie together business metrics, third-party data, and user performance data so they can understand how user experience directly affects the business.
“We don’t try to edit in advance the types of data we may need around performance,” said Brewer. “We just collect all of it, as many dimensions as possible and every single interaction so we’re not missing any individual user experience on the site. When we have all that information, we change the problem from being a collection problem into an analytics and search problem—looking inside the haystack and finding the needles that matter to the business.”
Making information useful
Contextual awareness is becoming a more popular feature in tools and platforms because it enables more precise insights, recommendations and actions. In practice, it means that retailers can more accurately predict buying trends based on weather patterns, events, regional preferences and individual customer behavior, as well as improve the accuracy and relevance of customer interactions, for example.
“Data has more value in the context of what’s going on around that data,” said Fanelli. “We see customers regularly enriching data with customer data, support data, marketing events and public data.”
Context is also helping to drive more sophisticated types of data visualizations that accelerate the comprehension of data and the appropriate actions that should be taken based on that data. Some of the more recent improvements include 3D mapping, animations and richer interaction capabilities.
“We try to make sure the visualizations answer some key questions, like what’s happening, do I care about it, and what should I do about it,” said Brewer. “I think a lot of visualizations stop at number one.”
Accommodating the future
No two companies are approaching Big Data analytics the same way, since their business models and existing infrastructures differ. Regardless of how sophisticated or unsophisticated an organization may be, tool investments should consider the current state, but be flexible enough to adapt to a future state.
“Given the proliferation of data sources and exploding volume growth, it’s critical to make sure that platforms can sustain performance, address future requirements, and protect investments by being able to truly address any data source,” said HP’s Leung. “Analytics is just one part. The platform [should] support or integrate with tools that address the rest of the life cycle, such as discovery, capture, preservation and management.”
Some solutions allow developers to focus on business logic while others require them to build things such as connectors or message buses. Increasingly, more products are being designed with ease of use in mind so they are easier to adopt, deploy, customize and use. For example, HP has a rich set of Web service APIs programmers can use to speed up development and explore new opportunities. SOASTA includes pre-developed functions and statistical models, so users can quickly get answers to commonly asked questions out of the box. DataTorrent is simplifying tasks in the data center and beyond. Its latest platform release allows developers and data scientists to easily create real-time streaming Big Data applications.
Given the fast pace of business, developers are continually looking for more efficient ways of building applications, improving site performance, and exploiting the value of their company’s ever-expanding data assets. Tools and platforms are evolving rapidly, so developers and their organizations can drive more business value from their data faster and more effectively.
What can Big Data analytics do for you?
We asked DataTorrent, SOASTA and HP how they are helping developers meet their customers’ Big Data analytics requirements. This is what they had to say:
John Fanelli, VP of marketing at DataTorrent
As more organizations look to real-time analytics solutions to take fast action on the data flowing into their business, they often task their developers with complex, platform-level coding of streaming applications. While this might work for an extremely large business with countless numbers of platform-level developers, that is the exception and not the rule. We have taken that heavy lifting off of the developer’s shoulders by providing visual application creation and intuitive data visualization tools to enable developers to quickly create streaming applications and iterate over their hypotheses.
The more quickly that decisions can be made based on data generated by business transactions, market data, sensors, monitors, mobile devices, IoT-connected devices and clickstreams (to name a few), the greater the impact of that business decision. Furthermore, if organizations can base those decisions on easy-to-understand information that speaks to their business logic, decision time can be reduced even further. We provide more than 450 pre-built operators providing myriad analytical capabilities, as well as fast data ingestion and distribution coupled with real-time visualization dashboards built by the developer to aid business analysts in their decision-making.
Businesses of all sizes benefit immensely from customer and transactional analytics, which provide unique insight into product offerings, campaigns and user behavior (to name a few). Our tools allow for organizations to take advantage of this fast big data as it happens.
Buddy Brewer, VP of engineering at SOASTA
The exploration and analysis of current and historical Web and mobile user performance data should be simple, so developers, business analysts, performance engineers and data scientists can gain instant insight into online business success. We allow them to correlate key business metrics and third-party data with user performance data, which is usually stuck in silos, so they can understand how user experience directly affects business outcomes.
The data generated by activities such as load testing and monitoring are immensely valuable for establishing patterns and trends, especially when they can be integrated with other types of data to present a complete picture of online business performance. Billions of pieces of data lack meaning until analytics and visualization are applied. If you can pull together all the relevant monitoring and performance data into an infinitely scalable schema, ready to mine for business-specific user experience information, you can improve the effectiveness of your site and the value it provides your business. We provide pre-developed functions and statistical models that answer the most commonly asked questions, so users can rapidly drill down into the metrics that matter to their businesses.
Digital businesses need to understand today’s Web and mobile user behavior. Big Data increases the requirements around the data that is collected so that relevant insight can be discovered and made actionable. SOASTA’s tools allow for this critical insight to be uncovered, clearly displaying how business is directly impacted.
Joe Leung, product marketing manager, HP
Today’s developers are rushing to capitalize on the limitless opportunities in the world of Big Data and the Internet of Things. The explosive data growth fueled by proliferation of data sources and rapid technology adoption is resulting in an unprecedented wealth of information locked up in massive volumes of data in diverse formats. Leading developers are using innovative Web services to take advantage of this phenomenon. Examples of Web service APIs used in Big Data applications include:
- Face detection: Groundbreaking high-fidelity image file compression (FPEG, Winner of 2015 Developer Week’s Accelerate Hackathon)
- Sentiment analysis: Cyberbullying prevention (Sparky Guardian)
- Find Similar: Content recommendation service (Missum)
- Face recognition: Intelligent Nerf gun that recognizes friend or foe
To get to that killer app may not be easy. First, we need a rich selection of easy-to-use APIs to unlock possibilities and accelerate time to market. Not to mention the fact that cloud-based applications demand extreme and massive scale. There is no need to compromise on any of these with HP Haven OnDemand, which provides the ability to understand and act on all forms of data, including structured, semi-structured and unstructured ones.
From image, speech and text analytics, to lighting-fast SQL queries, HP Haven onDemand provides enterprise-class data search, processing and analysis capabilities.
Altamira: Lumify is an open-source Big Data fusion, analysis and visualization platform. Its Web-based interface helps users discover connections and explore relationships in their data using 2D and 3D graph visualizations, full-text-faceted search, dynamic histograms, interactive geographic maps, and collaborative workspaces. It runs on Amazon Web Service and most on-premises Apache Hadoop stacks.
Ayasdi: Ayasdi Core is an advanced analytics application that helps data scientists and business executives uncover critical business intelligence from highly complex and growing datasets. Its broad range of algorithms and topological data analysis accelerate the discovery of insights, hidden or previously overlooked by conventional analytical approaches.
ClearStory Data: ClearStory Data’s solution is an integrated Spark-based data-processing platform and simple user application that harmonizes dozens of disparate sources, identifies data relationships among them, and converges data on the fly. It provides an easy way to speed up access to more sources, answer new questions fast, and reduce dependence on IT.
Datameer: Datameer simplifies the Big Data analytics environment into a single application on top of the Hadoop platform. To speed up insights, it combines self-service data integration, analytics and visualizations. Datameer’s Analytics App market includes horizontal use cases such as e-mail and social sentiment analysis as well as vertical market and product-specific apps.
DataTorrent: DataTorrent RTS is a high-performing, highly scalable, fault-tolerant and secure enterprise-grade solution that includes a visual development tool with more than 450 pre-built functions to speed up enterprise insights and action. With its massively scalable architecture, DataTorrent RTS allows enterprises to process, monitor, analyze and act on data instantaneously.
EMC: Pivotal HAWQ is an enterprise SQL-on-Hadoop analytic engine that uses massively parallel processing. It enables discovery-based analysis of large data sets and rapid, iterative development of data analytics applications that apply deep machine learning. Analytic applications written over HAWQ are easily portable to other SQL-compliant data engines, and vice versa. It also supports many data analysis and data visualization tools.
Google: BigQuery enables fast, SQL-like queries against append-only tables using the processing power of Google’s infrastructure. Users can control access to their projects and data based on their business needs. BigQuery is accessible via a Web UI, a command-line tool, and by making calls to the BigQuery REST API using client libraries. Third-party tools are also available.
HP: HP Haven OnDemand provides the ability to understand and act on all forms of data, including structured, semi-structured and unstructured ones. From image, speech and text analytics to lightning-fast SQL queries, HP Haven OnDemand provides enterprise-class data search, processing and analysis capabilities.
Informatica: Informatica Big Data Edition integrates all types of data from cloud and mobile applications, social media, sensor devices, and more on Hadoop at any scale without requiring knowledge of Hadoop. Rather than hand-coding in Java or a scripting language, developers can use the visual development environment, reusable business rules and mapplets, collaboration tools, and flexible deployment models.
KNIME: KNIME Analytics Platform is an open, enterprise-grade analytics platform that allows users to discover the potential hidden in data, mine fresh insights, and predict futures. It provides code-free setup and an intuitive interface, and can be customized with free, commercial or custom applications. Its collaborative capabilities can be used for joint development and analytics as well as the sharing of knowledge, tools and insights.
Metric Insights: Push Intelligence is a platform optimized to deliver KPIs to specific users under specific conditions along with rich metadata that provides context. Because user annotations and commentary are stored at the data-point level, anyone referencing the same data can see associated comment threads and metadata in any chart, report or alert that contains an annotated data point.
Microsoft: HDInsight is a 100% Hadoop-based service in the cloud that scales to petabytes on demand. Capable of processing unstructured and semi-structured data, the service allows users to spin up a Hadoop cluster in minutes and visualize Hadoop data in Excel. It integrates easily with on-premise Hadoop clusters, and allows development in Java, .NET and more.
OpenText (formerly Actuate): BIRT Analytics is a Big Data analytics platform for business analysts who need to access, blend, explore and analyze all of their data quickly without IT or data experts. Its columnar database engine instantly loads billions of records into a visual interface so users can get insights from massive data on the fly by joining disparate data sources and analytical techniques.
Palantir: Gotham is a platform that allows enterprises to integrate, manage, secure and analyze all of their data. It transforms bits and bytes into meaningfully defined objects and relationships, including people, places, things, events and the connections between them. Using applications built on top of the platform, users can visualize relationships, explore divergent hypotheses, discover unknown connections and hidden patterns, and share insights.
Pentaho: Pentaho Big Data Analytics is a platform that provides visual Big Data analytics tools to extract, prepare and blend data regardless of source, analytic requirements or deployment environment. Its open, standards-based architecture integrates with or extends existing infrastructure, providing a full array of analytics from data access and integration to data visualization and predictive analytics.
Platfora: Platfora is an end-to-end software platform that runs natively on Hadoop. It provides raw data preparation, in-memory acceleration, and rich visuals to better share insights. The APIs, flexible workflow options, and plug-in frameworks enable open data access and extensibility while providing enterprise-class security and governance. Platfora natively supports all major Hadoop distributions.
SAP: SAP HANA combines database, data processing and application platform capabilities in a single in-memory platform. It also provides libraries for predictive, planning, text processing, spatial and business analytics. Developers can quickly build highly scalable applications that leverage the speed and scale of the SAP HANA platform.
SOASTA: SOASTA Performance Platform is an advanced platform that allows decision-makers to understand users so that they can continually optimize application performance to meet and exceed expectations. Business insights are unveiled by mining current and historical metrics and performance data to show the real-time impact of user performance on key business metrics.
Splunk: Hunk is a platform that allows users to explore, analyze and visualize data in Hadoop and NoSQL data stores. Developers can use Hunk to build applications on top of data in Hadoop using multiple languages and frameworks. Hunk includes a standards-based Web framework, a documented REST API, and SDKs for C#, Java, JavaScript, Python, PHP and Ruby, as well as libraries that stream data from NoSQL and other data stores.
Talend: Big Data Integration is a Hadoop-based data integration platform that provides graphical tools and wizards so developers don’t have to write and maintain Hadoop code. Talend provided early support and technical previews for MapReduce, YARN, Spark and Storm. As other frameworks are released, developers can take advantage of them without learning new coding languages.
Tamr: Tamr is a data unification platform that catalogs, connects and curates hundreds or thousands of internal and external data sources through a combination of machine-learning algorithms and human expertise. It maps enterprise information, and matches entities and attributes across sources to deliver a consolidated view via RESTFUL APIs.
Texifter: DiscoverText is a cloud-based text analytics solution that pulls text from diverse sources, combining information and associated structured metadata from unique information channels. It can merge data from text files, e-mail, open-ended answers on surveys, and online sources such as Facebook, G+, blogs and Twitter.
TIBCO: Jaspersoft natively connects and provides data visualizations for Hadoop Analytics, MongoDB analytics, Cassandra analytics and more. Users can build reports, dashboards and analytics directly from those stores, without having to move the data to another database. They can also embed visualizations and reports into their apps, or use the insights to optimize business.
Trifacta: Trifacta Data Transformation Platform couples human and machine intelligence so users can easily transform raw data into actionable insights and for use in analysis tools. The latest version simplifies data preparation for Hadoop by providing advanced visual data profiling capabilities, native support for complex data formats, and leveraging the multi-workload processing features of Hadoop.