Big Data has introduced some notable challenges to the enterprise world. At first, it was enough to wrangle the data (structured and not) and then to glean actionable information from it, but today, enterprises are shifting their priorities toward real-time analytics and data streams. Early tools couldn’t cope with the scale or speed involved in analyzing data, but available databases and analytics tools are equipped to handle today’s business problems. It’s time to push away the hype of Big Data and hone in on real-time insights that actually create technical and business value.
Rather than build out the capabilities themselves, many enterprises are adopting cloud services from third parties to deliver the big, complete analytical picture. Since there are different types of analytics, companies must choose the right service to process the data, analyze it, and communicate the results in a visual format. While there is still a need for batch analysis, enterprises are seeing more interest in the area of real-time data streaming.
How to uncover enterprise data’s value
There are still companies that struggle with how they can uncover the value of their data, but they seem to understand that looking at the right data is more important than having an abundance of information that might not mean much to the business objectives of the organization.
According to Manish Gupta, CMO of Redis Labs, regardless of the business, there is a lot of data that companies will have to deal with one way or another, especially if they want to derive insights and make appropriate decisions.
(Related: What’s better than Bid Data? Twice the Big Data)
He added that in many ways, all data has become “Big Data,” and the term is no longer meaningful. One thing is certain: Everyone has a competitive need to figure out what the data is saying and how the company should make decisions, he said.
All of this, said Gupta, is coupled with the fact that data comes in many forms. Structured data is what most companies use since it is organized and easily displayed, but today more companies are using unstructured data, or data not confined in a database or other type of data structure, such as social media posts, documents and images. Additionally, there is data that comes fast, data that comes in ad hoc, and data that needs to be treated in batch capacity.
These data dimensions have “evolved so much that the enterprise and its infrastructure, both the network as well as the application infrastructure, has to keep up with it in order for the enterprise to remain competitive,” said Gupta.
Right now, one of the important things companies can look at is performance metrics and how performance correlates with revenue. The saying “time is money” is an easy way to sum things up, according to James Urquhart, senior vice president of performance analytics at SOASTA.
He described a “time is money” scenario where a company could be affected by a small programming mistake that occurs during a second of page load time. In this scenario, the programming mistake would have a severe impact on the outcome of acquiring revenue from customers, and it wouldn’t lead to a satisfying user experience.
This is why companies are beginning to ask themselves how performance affects revenue. To answer this question, businesses can use a predictive modeling approach to understand scenarios like trimming 500 milliseconds off of a page load time and determining what that would do to the conversion rate. Urquhart said that the ability to correlate performance directly to revenue is just the first big step to uncovering the value of a company’s data.
Urquhart said that the way to look at analytics is by asking, “How does performance affect ‘x,’ and how does ‘x’ affect performance?” Businesses can review things outside of performance analytics, but some of the performance data can be particularly helpful for looking at user paths. It’s also possible for businesses to take all that information and bring it to an external environment, like Hadoop or Spark.
“I see a lot of data integration correlating sets of data that are related mostly around the user—sometimes around the application, but mostly around the user,” said Urquhart. “They correlate across multiple systems that are tracking that user and that user experience, and beginning to understand what’s the effect of performance on the user buying and user behavior.”
Moving beyond Big Data
One of the challenges in the world of Big Data comes from a sort of collision between traditional Big Data analytics, real-time analytics, and even between the Internet of Things and real-time data gathering across devices. With a real-time system, companies can quickly see problems, change their campaigns, or correct an issue right away to make the customer experience better, according to Urquhart.
“If you look at it industry-wide, what’s the problem here? It’s really kind of bringing in this ability to understand and trade and share data and put them in different contexts, correlate across different contexts, and do that in a way that you don’t lose data integrity,” he said. “And you don’t negatively affect the experience of the user of your web application, your mobile application or even your devices.”
The challenge for companies is realizing that the industry is going from a centralized view of analytics capabilities on stale or older data, to a real-time view of data with downstream analytics capabilities. Plus, companies will need to collect data from a large amount of space in a very distributed fashion, according to Urquhart. This is going to be one of the biggest challenges the industry faces in the next five years, he added.
To tackle some of these challenges, companies need to start by thinking about the context of the data, since there are different tools designed for analyzing different aspects of Big Data.
“The context will overlap a little bit, and so that’s the point where you look for companies that are willing to partner, willing to share their data and be able to pass things on where further analytics are needed,” said Urquhart. “To me, it’s increasingly less about the tools that you choose and more about the services that you choose and the capabilities that are brought forward by those services.”
Other challenges come from the fact that business models and the Big Data landscape are changing. Companies are going through a massive shift from being entirely on-premises to running things in the cloud, but some businesses haven’t completed the transition.
This forms a hybrid environment that is creating challenges for companies managing data, especially as the competitive landscape evolves. The companies that move fast, evolve fast, and are agile tend to be the most successful, said Gupta.
“I think agility, timeliness, real-time insights are all challenges, not only because the data is available but because it’s a competitive imperative,” he said.
While real-time streaming isn’t the answer to every Big Data problem, the importance of real-time data analysis is on the rise. Enterprises are no longer collecting as much data as they can, or stashing data in a warehouse with no actual plan of how they would even use it. With the rise of tooling options like Hadoop and Spark, and options coming out of the open-source community and data-management providers, enterprises have plenty of tooling options if they want to build out real-time analytics and extract insights from their data.
Getting data from the Internet of Things
The Internet of Things will generate large volumes of data, but to uncover the value, companies must find ways to access and understand all of it. According to Paul Miller, a senior analyst for Forrester Research, streaming data capabilities are gaining much interest, and analyzing those data streams is just one of the ways in which businesses can find meaningful insights that can be used in an organization’s IoT strategy.
Miller said the first phase of the Big Data conversation got everyone “a little too excited,” and businesses were more concerned about collecting as much data as they could without ever understanding what they were going to do with it. Now, he said, businesses are getting past this mindset a little bit, and organizations are learning how to be more selective and are far more interested in collecting the right data, versus lots of data.
However, the connected devices of today—including consumer products—are generating a lot of data back and forth. Some of this are small packages of data, but in the industrial Internet of Things, those packages are much bigger, said Miller.
Also, the IoT has created issues in terms of security. On a smaller level, a lot of the early development in the Internet of Things, particularly in the consumer space, assumed that all the pieces of their network were “friendly,” said Miller. This opened up the door to things like hacking, and for connected devices like baby monitors, you really do not want to assume that everyone on an open public network is a “good person,” he said. Companies both big and small should be aware of these possible security and network issues, so they can protect their data and their users.
Companies looking to build out real-time analytics for the IoT need to consider a few factors. First, they should consider Big Data engines like Apache Spark or frameworks like Apache Hadoop, as well as other enterprise tooling. Companies should also take a look at the business processes that actually need real-time Big Data streaming.
“There is no point streaming data in real time back from a system if all of your internal processes are geared to look at it once a month or once every six months,” said Miller. “There’s a real mismatch there between the technical capability and the way the organization works, so unless you confuse those two together, you are not really going to see much value.”
A guide to Big Data analytics tools
AppDynamics: Application Analytics enables customer-centric enterprises to correlate application performance to user journey and business impact all in real time. Without code changes and the efforts of building and/or maintaining Big Data platforms, enterprises can automatically collect and correlate business transactions, mobile, browser, log and custom data to get insights into IT operations, customer experience and business outcomes. Using its web-based interface, customers can search and query data with a SQL-like query language, utilize prebuilt widgets to create powerful visualizations, set up alerts, and share custom dashboards to influence business outcomes in real time.
BigPanda: BigPanda is a data science platform for automating IT event correlation, which helps IT, NOC and DevOps teams detect and resolve critical issues 90% faster. The platform helps IT teams keep up with the explosion of scale and complexity in their data centers by using data science to correlate massive amounts of daily IT alerts from fragmented clouds, applications, services and servers, and automatically turning them into actionable insights.
Cask: Cask provides the Cask Data Application Platform (CDAP), the first unified integration platform for apps, data and things that cuts down the time to production by 80%. CDAP accelerates time to value from Hadoop and enables IT organizations to empower their business through self-service analytics and removing barriers to innovation as an extensible and future-proof platform.
CA Technologies: Big Data Management solutions from CA Technologies deliver visibility and simplicity for monitoring and managing big data projects across all platforms from a single, unified view. Our approach streamlines big data management responsibilities for rapidly changing business needs, isolates system problems, detects negative trends and mitigates them as quickly as possible.
Databricks: Databricks, the company behind Apache Spark, is a just-in-time data platform built on top of Spark for data science. The platform enables data scientists, analysts and engineers to become immediately productive with familiar tools and intuitive interfaces. With Databricks, data scientists and engineers are able to simplify data integration, perform real-time experimentation and share results, in addition to moving their workflows and models to production.
DataStax: DataStax delivers Apache Cassandra to the enterprise, providing a secure, fast, always-on database technology for cloud applications that remains operationally simple when scaled. Its vast capabilities include search, analytics, in-memory computing, advanced security, automated management services, and visual management and monitoring. Additionally, DataStax Enterprise Graph is the only scalable real-time graph database fast enough to power customer-facing applications, capable of scaling to massive datasets and powering deep analytical queries.
DataTorrent: DataTorrent enables you to maximize the value of data-in-motion. Powered by its open-source engine Apache Apex, DataTorrent empowers you with built-in ingestion, transformation, and analytics capabilities to accelerate application development and time-to-production—at a lower cost. With its scalability, fault tolerance, processing guarantees, and monitoring and visualization tools, DataTorrent offers easy operability to let you focus on business results and innovation.
Dynatrace: Dynatrace is the undisputed leader in APM and the pioneer behind Digital Performance Management. The company helps businesses see their cloud, mobile and enterprise applications from the only perspective that matters:―their customers’. Dynatrace enables digital confidence by combining Big Data analytics, real-user monitoring, synthetic checks, and mobile app monitoring to make actionable digital performance information visible for everyone across business and IT.
MapR: The MapR Converged Data Platform integrates the power of Hadoop and Spark with global event streaming, real-time database capabilities, and enterprise storage for developing and running innovative data applications. The MapR Platform is powered by the industry’s fastest, most reliable, secure, and open data infrastructure that dramatically lowers TCO and enables global real-time data applications.
New Relic: New Relic provides a software analytics tool suite used by developers, ops, and software companies to understand how applications are performing in development. The tools provide performance analytics for every part of your software environment. You can easily view and analyze massive amounts of data, and gain actionable insights in real time.
Redis Labs: Redis Labs is the open-source home and provider of enterprise-class Redis, an in-memory NoSQL database platform benchmarked as the world’s fastest. Thousands of customers rely on Redis Labs’ high-performance, seamless scalability, true high availability, versatility, and best-in-class expertise to power their cutting-edge applications. Redis Labs’ software and database-as-a-service solutions enhance popular Redis use cases such as real-time analytics, fast high-volume transactions, in-app social functionality, application job management, queuing, and caching.
Sauce Labs: Sauce Labs provides the world’s largest cloud-based platform for automated testing of web and mobile applications. Its service eliminates the time and expense of maintaining an in-house testing infrastructure, freeing development teams of any size to innovate and release better software, faster. Optimized for use in CI and CD environments, and built with an emphasis on security, reliability and scalability, users can run tests written in any language or framework using Selenium or Appium, both widely adopted open-source standards for automating browser and mobile application functionality. Videos, screenshots, and HTML logs help pinpoint issues faster, while Sauce Connect allows users to securely test apps behind their firewall.
SOASTA: SOASTA is the leader in performance analytics. The SOASTA Digital Performance Management Platform enables digital business owners to gain unprecedented and end-to-end performance insights into their real user experiences on mobile and web devices, providing the intelligence needed to continuously measure, optimize and test in production, in real time and at scale.
Splunk: Hunk is a platform that allows users to explore, analyze, and visualize data in Hadoop and NoSQL data stores. Developers can use Hunk to build applications on top of data in Hadoop using multiple languages and frameworks. Hunk includes a standards-based web framework, a documented REST API, and SDKs for C#, Java, JavaScript, Python, PHP and Ruby, as well as libraries that stream data from NoSQL and other data stores.
WSO2: WSO2 Data Analytics Server combines batch and real-time analytics with predictive analytics through machine learning into one comprehensive data analytics platform. This effectively helps enterprises gather insightful information, monitor it via user-friendly dashboards, and make well-informed decisions. It can be used to implement a variety of use cases, and it includes extensible toolboxes for common use cases. z