A new open-source project wants to take analytics to the next level. BlazingSQL is a GPU-accelerated SQL engine built on the RAPIDS ecosystem. RAPIDS is an open-source suite of software libraries for executing end-to-end data science and analytics pipelines entirely on GPUs.
According to the team, BlazingSQL was built to address the expense, complexity and sluggish pace users deal with when working on large data sets.
“BlazingSQL addresses these customer concerns not only with an incredibly fast, distributed GPU SQL engine, but also a zealous focus on simplicity,” Rodrigo Aramburu, CEO of BlazingSQL, wrote in a blog post. “With a few lines of code, BlazingSQL can query your raw data, wherever it resides and interoperate with your existing analytics stack and RAPIDS.”
BlazingSQL enables users to query datasets from enterprise data lakes directly into GPU memory as a GPU DataFrame (GDF). GDF is a project that offers support for interoperability between GPU applications. It also defines a common GPU in-memory data layer.
Open sourcing the SQL engine was part of a strategy between NVIDIA’s RAPIDS team and BlazingSQL that brought more than 100 developers to work on the project, according to BlazingSQL.
“NVIDIA and the RAPIDS ecosystem are delighted that BlazingSQL is open-sourcing their SQL engine built on RAPIDS,” said Josh Patterson, GM of data science at NVIDIA. “By leveraging Apache Arrow on GPUs and integrating with Dask, BlazingSQL will extend open-source functionality, and drive the next wave of interoperability in the accelerated data science ecosystem.”
The SQL engine aims to solve the major challenges to analytics pipelines by requiring only a small fraction of the infrastructure to run at an equivalent scale to cut costs, provide GPU-accelerated results in seconds to reduce the time needed for data scientists to quickly iterate over new models, and enables users to write code once and dynamically change the scale of distribution with a single line of code.
“BlazingSQL, in addition to contributing heavily to the RAPIDS ecosystem, will focus on the services and support agreements necessary to make RAPIDS + BlazingSQL deployments successful and accessible to all,” wrote Aramburu.