As mobile, Internet of Things and Big Data applications loom on the horizon, NoSQL databases have increased in popularity for real-time data analytics, data archiving and batch data analytics. While you can use NoSQL to replace relational databases, Progress’s 2016 Data Connectivity Outlook Survey found that relational databases are still going strong. SD Times spoke with Mike Johnson, senior director of research and development for Progress, about what developers and ISVs need to know about the changing database tool landscape and how to overcome the challenges of exploding data sources.
Is now an exciting time in the data community?
It’s probably one of most exciting times I’ve seen in my career. When I started, there were a large number of databases out there and software companies were struggling to support all of them. Then, in the 90s came consolidation. Most people used the Oracles, the Microsoft SQL Servers and the IBM DB2s of the world. In the last few years, we’ve seen not only an explosion of NoSQL databases, but also on SaaS (Software-as-a-Service) side, every SaaS instantly becomes a database. That’s where it’s up to us to provide some sort of standard way to connect to the data.
If all your lead information is in Eloqua (Oracle Marketing Cloud) and your sales information is in Salesforce, you can access and put that information together with DataDirect connectivity.
What problem does your typical customer need to solve?
We have people coming in and saying, “I need access to 300 to 400 data sources.” Our bread and butter is ISVs. ETL (Extract, Transform, Load) or BI (business intelligence) are probably the most common applications.
You run Progress R&D. What is your current focus?
There are three large areas of growth for us this year: First, grow our offerings on the NoSQL side, document, wide column and key-value stores—all your Mongos, Cassandras, DynamoDBs, Oracle NoSQLs. Second, grow our offerings for SaaS apps. We’re heavily investing in making it faster to produce these types of sources. Finally, as apps are moving from on-premise to being cloud-based, it’s created this issue of getting to your data. The app is locked out of the firewall, but there is still a lot of data inside the firewall. We’re bridging that gap between the cloud app and the on-premise database. We provide a seamless, easy-to-use interface to talk through that firewall.
What kinds of transformations among data vendors do you see?
We’ve always told our customers, “Our commitment to you is we’re going to try to make every source look and act like your traditional database that you know and love.”
Many wrote apps many years ago, with the Oracles and DB2s. When they get into the world of NoSQL data or SaaS data, people are finding that those sources of data don’t behave in the same way. They have a very partial list of functionality compared to what relational databases do. We’re trying to up-level these sources to look and act similar to traditional databases.
The same rationale happened in NoSQL: Originally it was no SQL, now it means “not only” SQL. All our applications talk SQL and most programmers do as well. We see a transformation from old-school NoSQL to “We’re going to have full SQL support.” Some vendors go so far as to do it all in-memory. Seeing these transformations in the industry, they’re really about getting back to the confidence in data that relational databases offered.
Our 2016 survey supports this: SQL-based interfaces (ODBC, JDBC and ADO.NET) are the most popular data access standard, which is consistent with our finding that relational database usage continues to dominate. But there are signs of increasing popularity of other data access standards—SOAP web services, XQuery or XPath, and REST APIs like OData for mobile and hybrid environments.
So what do your drivers do to up-level NoSQL data?
The most obvious is NoSQL normalization. We came up with some tooling and we’ve got some patent-pending algorithms that can imply a schema on top of any data set. When you deal with data sources like MongoDB, people don’t have to deal with a set schema, but that’s also its Achilles heel as any application written against these traditional databases requires a set schema.
Our chief data evangelist Sumit Sarkar’s join example is another great one (“MongoDB and SQL: Bridging the divide,” SD Times, March 2016). The data source vendors say, “Here are these predefined relationships, and that’s all you’ll ever need.” As a customer, I might want to join something that doesn’t have one of these relationships set up. We make this possible.
Is security a big issue in the data world?
From a security standpoint, the thing that’s impacted ISVs recently is the frequency of vulnerabilities that get announced and resolved. We’ve been using a library called OpenSSL for 10 to 12 years, and in the past maybe once a year you had to update it because of a security issue. It’s happened four or five times this year, and it’s not even June yet. Looking at my software team, as we’re monitoring what’s happening in security community, we’re certainly spending more time on it than we did in the past. There’s simply too much going on for a lot of software vendors to keep up.
Another trending technology is Hadoop. What’s new in that space?
We expect at some point someone to win the Hadoop race, but at this point it’s doing nothing but expanding. We started with Hive and Hive’s still out there. Over the past few years, newer SQL technologies for Hadoop have been released such as Impala, HAWQ and Spark. Each new one says, “This is a better way to do SQL against Hadoop.” Most people expect an everything winner, like the Red Hat of Linux. Some days it’s Cloudera, others days it’s Hortonworks. No one is really dropping out.
What’s the challenge now with Big Data?
People are still struggling with that Big Data promise: variety, volume, velocity. Petabytes of data. In the Hadoop space, you hear people trying to deal with aging out data: Bring a ton of info in, then stage it and push it out to a data lake. Let’s summarize it to get to something that’s actionable. That’s why you see the rise of the data scientist.
In an organization like yours, do you need a data scientist?
Sure. We’ve got a team in the U.S. and a team in India. They’ve got their finger on the pulse of what’s changing in the industry. A few years ago I would never have thought that we’d need a different type of person to look at our own data. We’re data people—it’s almost offensive! We’ve been doing this for 25 years, why would we need someone else to tell us what all this means? Now that we have a cloud system, we see that we ourselves have unstructured data, log data, events coming off the system—and we need different types of training to understand this data and different types of tools to do analysis.
So what have you learned by “eating your own dog food”?
Thinking about DataDirect Cloud connectivity, we’ve got a large corporate customer in the financial services industry. They use our cloud service to expose on-premise data out of Oracle servers onto Salesforce screens, because when a customer calls in, they need to know the order history. But that history is nowhere near Salesforce when that account representative pulls up the customer screen.
We’ve had several situations where we’ll say all of a sudden, “Hey, everything’s going red. What’s going on?” Then we realize that something bad has happened to their Oracle system. In a lot of cases, we’ve been able to let them know before they even knew that there were problems. When something like that goes right, that’s a customer we’re going to keep.
Does Progress compete with middleware vendors?
Honestly, most middleware products are our customers. Occasionally we get in a place where we’re doing the same thing. Our role is connectivity.
But the breadth of connectivity you offer is impressive.
The more sources of information out there, the more important we are, and the more we can standardize that data.
Ten years ago, we would have had to support 14 sources of data. Today, there are hundreds out there. We started in the old days with relational databases. When things got crazy with NoSQL, we moved into some of the NoSQL databases. In SaaS—such as Salesforce and Oracle cloud—again we have the opportunity to go into that segment and provide that standards-based connectivity that’s missing.
And the number of digital artifacts continues to expand, right?
Yes. Our Digital Factory concept is all about giving companies the ability to transform into the digital age. Many companies still have a lot of paper and manual processes. The whole digital factory process is around how do you automate things, whether it’s producing web content or generating mobile apps. We recently acquired Telerik, so we have a unified platform and Progress DataDirect is providing the data connectivity piece of that. It’s very exciting.
Progress is 35 years old. Tabulating machines date to 1896. What does that imply about the data market?
The same issues and problems reoccur over the years. First we solved all the problems on the relational side, and then we came to the NoSQL side and did the same thing. It’s important to have that good base of knowledge to ask, “How have we solved all these problems over the last 25 years?” I think the whole NewSQL movement is about putting the best of the relational world and the best of the NoSQL world together, and coming up with something for the next generation.