Cloud-based data warehouse platforms are making it easier for organizations to enable self-service analytics capabilities that tap disparate data sources. Similarly, modern data lakes offered by the large public cloud providers allow developers to create or customize data models on an ad-hoc basis for machine learning, which enables artificial intelligence and automation.

In either of those scenarios, the proliferation of Information silos still is typically the villain when it comes to any effort to create large data warehouse integration projects. Bringing those silos together to let users gather information from distributed and proprietary data sources remains the key challenge in these projects, and various data warehouse platforms and tools are designed to eliminate those data silos. Hence, the last thing a data architect should want to do is create new silos, but a tool from startup Matillion  takes that counterintuitive approach. Matillion ETL allows developers to create new data silos. 

Matillion’s founders believe that workgroups should feel empowered to create silos and keep data where it belongs, yet information should be accessible without transforming the data to get to it. Matillion ETL extracts data from their source repositories and loads it into a cloud-based data lake of the customer’s choice.  “The reality is, it’s never been easier to create data silos inside of the business,” said Ed Thompson, Matillion’s CTO. 

Instead of transforming and cleansing the data after extracting it from its source — but before loading it to a new source repository  — the cloud data warehouse performs the transformation and joins the data sources with Matillion ETL. “We use the underlying data warehouse as the transformation engine,” Thompson said. “You use the cloud data warehouse technology to transform that data into what you want your actual output to be. And that output also lives in the cloud data warehouse. So, it’s really easy to query.” 

The company offers versions of Matillion ETL designed with cloud-native integration to Amazon Redshift, Snowflake in Azure and Google Big Query, with connectors designed for each source and target repository. Matillion, which has more than 550 enterprise customers such as Bose, Cardinal Health, DocuSign and Siemens, currently has about 80 connectors with native API integrations to popular data sources such as Oracle, SQL Server, Postgre and SaaS applications from the likes of Salesforce. 

Also, it allows customers and partners to build their own custom integrations by making its API available to developers. “There is a very low tale of niche APIs out there,” Thompson said. “Every company worth its salt has some sort of data access API to get the data in its system. Some of those are really niche as they pertain to particular industry verticals and for those, we provide users with the technology to build a custom connector themselves. And we see an awful lot of customers doing that for either internal systems, things that are specific to their industry, with all manner of different use cases. So we see a lot of companies building connectors in Matillion for specific systems.”

At the same time, Matillion is looking to build more custom connectors for various data environments. Matillion recently raised $35 million from Battery Ventures, which provided the Series C investment. It builds on last year’s Series B investment of $20 million, when the U.K.-based company opened a U.S. office in Denver. Thompson said the latest investment will help seed the development of additional connectors. “We’ve doubled the size of our connectors teams,” Thompson said. “The funding thus far is actually being used to improve the internals of the product. And in the future, we will help our customers to develop connectors from scratch with a view that will be able to just crank them out more quickly.”

One currently under development is for the NetSuite business applications suite, Thompson said. The company has partnered with NetSuite to add connector to its business application suite. Matillion has partnered with others as well, according to Thompson. “The more partnerships we can create the better,” Thompson said. “Our goal is to be cranking out more and more connectors.”

While Matillion ETL Is designed for data lakes and cloud data warehouses, it works in hybrid scenarios, Thompson added. Like all data transformation projects though, Thompson underscored that the biggest challenge isn’t technical but organizational. Line-of-business management remains reluctant to share their data. 

“That’s perhaps the the more politically difficult issue to navigate, because quite often, there will be people there who have invested quite a lot of time and effort in what they’ve done,” he said. “And if you’re going in saying, I want to replace it with our products, all of the usual human traits. Well, your product’s not as good as what I’ve done, and it’s not as tailored and not, you know, what sort of stuff.”


Content provided by SD Times and Matillion