The Workload Analyzer for Presto was open sourced this week by Varada, a data lake query acceleration innovator that aims to help data engineers gain holistic visibility into the performance of Presto clusters.
Varada originally built the tool because it leverages the distributed SQL query engine Presto in its query acceleration engine Varada Data Platform.
The Workload Analyzer is compatible for both Trino, the distributed SQL query engine that features indexing-based query acceleration technology, and PrestoDB, the distributed SQL query engine for running interactive analytic queries against heterogeneous data sources.
“As part of our deep commitment to the PrestoDB and Trino communities, Varada decided to release a standalone, open source version of our Workload Analyzer tool so that any Presto user can evaluate potential performance improvements in their cluster,” said Eran Vanounou, the CEO of Varada.
The Workload Analyzer collects details and metrics on queries, aggregates and extracts information from them, and delivers charts that describe a cluster’s performance. The tool’s script runs within the Presto cluster in a user’s Virtual Private Cloud (VPC) to collect and analyze query statistics (JSONS).
Data teams can use it to learn how resources are being used on an hourly or weekly basis and to define scaling rules. They can also improve predicate pushdown and significantly reduce IO and CPU.
“Presto democratized Big Data, exponentially expanding the number of business users that can ask questions to a Big Data infrastructure and enlarging the number of underlying data sources they can query,” Varada wrote in an announcement.