Big Data in the cloud: Making M2M analytics a reality

Published: September 23rd, 2013

M2M and connected devices are experiencing renewed interest lately thanks to a relatively recent moniker, “the Internet of things.” But it isn’t just a new name garnering attention.

While every machine has the capability to generate data on its own, the data has been limited to storage within the device for self-action or intelligence. The primary focus to date has been on transferring and sharing data with other devices or systems. According to Machina Research, by 2022, there will be 18 billion M2M connections globally at an annual growth rate of 22%. This many projected connections will generate an astronomical amount of data.

Enterprises invest in M2M to conduct remote monitoring and remote diagnostics of devices. However, they’ve been reluctant to invest in a data warehouse or analytics solution to identify trends for predictive analysis, since most traditional data warehouse solutions are costly and time-consuming. Now, with the availability of Big Data in the cloud, enterprises can get the ROI they need without a huge upfront investment.

While Big Data use cases have revolved around the growing volume, variety and velocity of data, the cloud is more focused on transferring low-byte data in a standard format and less velocity in terms of disk I/O. Considering the availability of Big Data technologies in the cloud, enterprises can finally make M2M analytics a reality.

Look before you leap into M2M
However, M2M does pose inherent challenges. The variety of M2M segments, applications and devices, as well as dynamic and unpredictable traffic, low latency and real-time requirements, all present issues when it comes to building a solution with high throughput and robust data security at a low cost.

To overcome these obstacles and build an M2M analytics solution, enterprise development teams need to gear up for how best to leverage Big Data technologies in the cloud.

M2M analytics building blocks

Understanding the need for M2M analytics: An important first step is justifying the need for building an M2M analytics solution. Find out who the target users would be and determine the benefits they’ll receive, then identify concrete business use cases in order to define the boundary of the analytics solution:
• Remotely monitor the metrics of M2M applications in real time.
• Analyze machine logs as required to optimize performance or find out the root cause of consumer issues.
• Conduct future forecasting of machine failures using trend and predictive analysis.
• Build a recommendation engine to check usage records for an advertising campaign that can recommend new products, services or plans to consumers.

Deciding data sources and targets: Identify data sources for use in data collection and analysis, and output data targets to build a visualization layer for target users. During this phase, it’s important to consider the security and legal aspects of data while finalizing the data sources. In M2M, these data sources will be in semi-structured or unstructured data sources from machines, or sensors containing logs, voice data, images, videos, e-mails, etc. Other prominent internal data sources are ERP and CRM databases containing customer, usage and device records in a structured format.

While deciding on data targets, think about data storage in terms of raw, enriched (transformed) and aggregated data storage and retention policy. Be sure to build reporting interfaces on top of these targets in a way mobile BI users can easily consume for analytics. Stored data should be available using standard APIs to allow seamless integration with fault-management or alert and notification systems.

Choosing a Big Data platform: M2M communication (with its large amount of sensor, image and video data, and machine logs) and traditional data warehouses will be tested against their scaling capability and cost effectiveness. Big Data technology is so much more than a Map/Reduce + an HDFS system; it provides techniques and a platform to build an entire analytics solution. Considerations to keep in mind when choosing a big data platform are:
• Most of the open-source technology is available with dependency on community support and documentation. As of today, however, there is still a lack of development tools. Choose a platform with good community support, active contributing members, and development tools.
• Use technology providing highly distributable processing in memory to reduce disk I/O.
• Evaluate tools for data profiling and transformation to cleanse raw data before loading it into Big Data storage. This is particularly important because Big Data has both structured and unstructured data in massive volumes.
• The analytics solution will likely require the ability to connect with different data sources and targets, so you’ll want to choose technology with built-in connectors to a wide range of front-end query and visualization tools, plus back-end data collection and loading.

Building a proof of concept for M2M analytics solution: To decide whether the technology is suited for the project requirements and use cases, start by building a proof of concept. First, select the basic programming language and frameworks and determine which ones will be best suited. Evaluate the functional object programming language, as code written with it is more concise, elegant and easy for implementing multi-processing logic. Most of these languages interoperate with Java and .NET seamlessly.

M2M requires streaming and caching technologies, so it’s best to evaluate them on non-functional requirements like latency and throughput. Don’t forget that businesses will require highly reliable and zero-fault systems, so look for highly scalable and distributable cluster-based solutions.

Deploying infrastructure on the cloud: Building an M2M analytics solution using Big Data will be more of an on-demand and flexible capacity-provisioning requirement, and these requirements can be short-term. Adding new data centers or hardware for these requirements will be expensive, which is why the cloud becomes the natural choice for building and deploying M2M analytics. When approaching deployment, keep these tips in mind:
• Build APIs to transfer data from each transaction to the cloud for storage and analysis.
• Store M2M data in enterprise systems or local data centers, then replicate the entire data set in batches to the cloud. This involves high latency but affords more control and a more secure environment for enterprise data.
• Store M2M data in enterprise systems, and do general purpose, day-to-day analytics in an on-premise environment. Only send required analysis data over the cloud (to run the algorithm or batch program) when you’re doing batch analytics requiring large numbers of server nodes and processing power. Then send data analysis reports back to the enterprise, where it will be visualized using the on-premise environment. This is useful for a highly secured environment, but has lower costs for analysis.

Hybrid clouds are ideal for M2M analytics because they require analyzing highly secured personal customer data and publicly available censuses, along with social and blog data. Consider using private clouds for customer and billing-related applications, and for storing enterprise data (SQL and NoSQL), as it demands a secure, dedicated performance requirement.

At the same time, consider using public clouds to collect public data, and publish consumer applications and visualization reports on top of analyzed data. Demand for this kind of data fluctuates based on usage and events, which can be easily managed cost-effectively using the public cloud.

Thinking through non-functional aspects: Paying attention to overall performance, scalability, reliability and availability requirements is vital when deciding how best to leverage Big Data for M2M analytics. Deploying and using multi-node clusters with replicating data on multiple nodes will lead to high performance and a reliable system, and since we can expect a large number of clusters, management of the same is also very important. Use built-in administrative tools and, if need be, enhance them for your environment to save time over managing them manually.

For an analytical platform collecting data from multiple enterprises, build a multi-tenancy model. Focus on providing physical databases and logical isolation for consumer and enterprise applications, and don’t forget to define your data retention policy (a necessary step in order to decide the required capacity). When it comes to security, consider the data-retention period and criticality of the data. Define the data security policy in storage, in the network data transfer, and in the access for data reporting, but be aware of increased network security expenditures.

Sunil Agrawal is chief architect of Big Data analytics at Persistent Systems.

Article Tags

Big Data, cloud, M2M

About Sunil Agrawal

View all posts by Sunil Agrawal

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Big Data in the cloud: Making M2M analytics a reality

Article Tags

Subscribe to SDTimes

About Sunil Agrawal

Related Articles

Plotly brings vibe coding to visual data app development

Four trends reshaping Kubernetes platform engineering

Data is the new petroleum; companies need better pipelines — and better oil-spill clean-up methods

CData Sync Cloud brings CData’s ETL/ELT tool to the cloud