Apache Spark is the enterprise data orchestration layer of choice, particularly for complex data pipelines for machine learning applications and predictive data analytics. The Neo4j Connector for Apache Spark provides easy, bi-directional access between Neo4j graph datasets and many other data sources – including relational databases, semistructured and unstructured (NoSQL) repositories – transforming data from tables to graphs and back as needed. The new connector is available at no cost and is fully supported for Neo4j customers.
The Neo4j Connector for Apache Spark comes in response to high demand from the Spark and Neo4j communities to apply graphs to machine learning pipelines, unify data silos and derive greater value from existing data stores. According to an independent survey, “Technology Executive Priorities for Knowledge Graphs” recently conducted by Pulse, the top three reasons motivating enterprise IT decision makers to expand their use of knowledge graphs are to improve machine learning and artificial intelligence systems (60%), open new revenue streams (50%) and connect data silos to make information more accessible (50%).
For Neo4j Customers: Neo4j graphs can be connected to any other system or data source via Spark. The Spark Connector transforms tabular data sources to graph data to reveal more context and insight inside Neo4j. The bidirectional integration means that Spark cleans and transforms data that drives Neo4j graph applications, feeding graph data into any Spark workflow.
For Spark Users: The Neo4j Connector for Apache Spark brings advanced graph capabilities to the Spark ecosystem so businesses can use contextual information to improve forecasting, analytics and predictions. This connector enables teams to easily add Neo4j graph data to improve high-value processes, like machine learning, without reworking existing pipelines.
Amy E. Hodler, Director of Graph Analytics and AI Programs at Neo4j shared why customers are excited to connect Neo4j and Spark.
“The vast majority of Neo4j’s enterprise customer base has Apache Spark in their data environment,” Hodler said. “With the Neo4j Connector for Apache Spark, our customers can consolidate their data pipelines and supercharge their Neo4j Graphs with access to the massive Spark ecosystem. The connector allows data scientists and application developers to easily meld Neo4j graph data and Spark data to answer more questions, gain new insights and create new solutions.”
In January, Gartner published An Introduction to and Evaluation of Apache Spark for Modern Data Architectures*. The report states, “Spark has evolved into a viable production platform to meet enterprise needs. It is easy for developers to learn and use to develop solutions. Spark has also cultivated a vibrant community of committers and solutions. Spark’s architecture and its applicability to ingest, process and analyze both operational and analytical workloads allow it to reduce the time between obtaining data and delivering insights.”