“One thing Spark does for streaming is micro-batch,” said Fanelli. “They grab events and batch them. They ultimately do MapReduce on that, and if an individual batch fails, they’ll rerun it. But if batch No. 3 fails, it may not rerun until after batch No. 6 runs, so you can’t do anything that’s predictive: If this occurred, if that occurred, then do this.
“Also, high availability is a big requirement for enterprises. In both Spark Streaming and Storm, the developer has to code in fault tolerance. The developer has to decide what to save, how often to save it, and what to do on recovery. With Apex, the platform takes care of all that. The developers only write business logic, they don’t write operational code.”
Snowflake Computing announced a cloud-based enterprise data-warehousing solution. Its service handles scaling and provisioning, allowing developers and administrators to simply pour in their data, and have it quickly replicate globally in AWS or on premises.
DBSH demonstrated its continuous data integration system for NoSQL and SQL databases. It uses models to match datasets stored in relational and non-relational systems, allowing developers and DBAs to sync data across the two disparate types of databases.
LexisNexis had HPCC, its large data processing platform. As the platform has grown, it has added numerous integrations to storage work systems (such as Apache Cassandra and HDFS), allowing the system to process data from existing data lakes.
Linoma Software gave a look at its managed file transfer products. While the company offers ways to migrate large stores of data around the world, it was the reverse-proxy capabilities that it promoted the most. Using this reverse proxy, data can be pushed into the cloud without needing to open a port in the corporate firewall.
Texifter provided text-mining tools that focus on both traditional mining applications, and the more shallow, outsourced social media variety. DiscoverText is the company’s multi-lingual machine-learning text analytics cloud platform, which can be used to generate insights, clean messy data, or evaluate marketing campaign successes.
Striim demonstrated its end-to-end streaming analytics platform: an in-memory platform that can process and transform data from multiple sources, providing instant access to clean data for processing queues.