“Facts are stubborn, but statistics are more pliable,” wrote Mark Twain. Never has this been more true when it comes to data connectivity. If you don’t have good connectivity to your data wherever it may reside, then it’s hard to do applications like artificial intelligence (AI) or analytics. They are very data dependent technologies. Connectivity is a huge, largely unrecognized problem, according to Amit Sharma, CEO of CData. “If you don’t have good data, you won’t be able to have good AI solutions or analytics, or big data solutions,” he said. “I think it’s a problem that’s going to keep being more challenging and relevant in the market in the future.”
A seismic shift
There’s a dramatic shift from data for the sake of data, to data for the sake of business, and this manifests itself in many different ways. Terms like “self-service analytics” services provide enough utility to the business user that they can get the data they need and manipulate it in a very non-technical fashion. Tony Fisher, general manager of Magnitude Software, said, “Data for the sake of data, is a very technical thing, and data for the business is very business-oriented thing. Business users are more concerned about orders and customers or things that are more business-oriented than they are about table or column names. They want to access their data and manipulate it in terms of business needs.”
Fisher believes that it’s very important for technical staff to grasp the concept that data is really just an artifact that’s providing the business analyst with a business-oriented view into their data. “I think that’s one of the big shifts that’s going on now, and we’ll continue to see that for some time to come.”
Roi Avinoam, CTO and co-founder of Panoply Software, said he believes that adaptability plays an important role in how companies master their data connectivity. “I think what some people might miss is that really the way to master data and get insights comes from being adaptable,” he said. “All the time, I notice that when people talk or think about data, the solutions proposed are always the ones where you have to review all the data you have, and then review all the business questions, and ideas for insight requirements, and then after you map these out you come up with solutions. It’s great for about two to three months.”
He added that the “problem is, after you’ve done all of that, the business changes, the industry changes, the market changes, the API’s and the data that you have changes, and then you have to do it all over again. And that’s the kind of state of mind that I think we need as an industry to evolve.”
Avinoam is an engineer, so he compares it to software engineering’s Waterfall development process. “Basically we have to design everything up front, and we have to figure out how it needs to work, and then we go and develop it, and it was so rigid. You can’t make changes, and it’s not adaptable. Now development teams are incredibly agile, right?” He pointed out that now an idea can be brainstormed in the morning and it’s shipped to production that night. He’s trying to impose this agility on his team and wants the industry to follow. “One day is too much, in my opinion. We need to be able to think up ideas and try them out and make drastic changes overnight, without having a big price to pay for it. It should be encouraged, it should be a positive experience that we’ll rotate our entire state of mind to think of completely different types of data or different connections that we might do. And execute on it in a day or two.”
He emphasized that the issue really isn’t how you solve your current problem, that’s easy. What’s important is thinking about how you are going to solve an endless stream of problems, challenges and opportunities that may hit you on a daily basis, and ensure that every system keeps up
Maintaining data consistency
Ensuring data consistency when new technologies like microservices and cloud-based distributed applications come into play is no easy task. There’s been a pretty significant shift over the past couple of years just in terms of the overall approach according to Dion Picco, vice president of product management and product marketing for Progress. He uses data warehousing as an example. “The data warehouse approach of more of a record-oriented, relational or star schema approach, has really served its purpose well. It’s certainly going to continue as a pretty prominent standard, since it’s the whole process behind Extract, Transform, Load (ETL). What I’m seeing that’s largely driven in some ways by the whole data science, big data movement, is a move away from ETL and a move more towards ELT style.”
He explained, “Instead of extracting data from systems, transforming it into the format you want for long term storage, and then putting it into your data warehouse, the ultimate move from just a pure warehousing data perspective has been just dump everything into a data lake. There is no qualified set of records necessarily. You basically have a data lake of stuff that you load and transform on demand. This has really given rise to things like data prep tools, as an example of something that previously was part of the ETL process and driven by IT. Now it’s driven in terms of data scientists, and various folks on the business side who need access to the data when they want, leveraging more citizen-oriented tools to do the data transformation, data access piece.” Using this new approach preserves the fidelity of the original data set in a way that your typical ETL process doesn’t. This represents a fundamental change.
He described the microservices landscape in general and hybrid architectures. “What’s typically happening here is every service often has its own database. So a customer service might have a customer database. A product service might have a product database, and as you scale these services up, the horizontal scalability of that service needs to make sure that they have a consistent view of that data.” It’s not as simple as it sounds because you may hit a level of scalability that you can’t grow beyond.
On the other hand, he said he believes the simplest pattern is the one that’s still the most dominant pattern, which is to not have one database per microservice. The end game, according to him is, “You end up with a shared database architecture behind all of the microservices much as in the old style architecture, but it still removes a lot of that need for dealing with the nuances of this because you ultimately defer to the database system. So if you have a clustered database system, you’re gonna hit a certain level of scalability, and because everybody’s using a shared database, you don’t have those issues of consistency to worry about in the same way.”
Picco pointed out a third approach, “Obviously you can relax certain constraints, and so if you’re not dealing with fully transactional environments and you can deal with eventually consistent sort of scenarios, there’s a wealth of new databases to choose from like Apache Cassandra and Spark. There’s a lot of infrastructure built on the Hadoop ecosystem today. I mean, we just had an explosion of different databases that are really fit for purpose, and so if your purpose isn’t high-scale online transaction processing (OLTP), then likely there’s a database to fit your need.”
He listed several new SQL vendors that also achieve a different level of performance but still enable a full active database. “Google Spanner is a great example. You got Cluster XDB and Volt DB that are out there. They’re combining the best of in-memory along with new architectural patterns from an OLTP perspective that I think a lot of transactional-style applications need.”
APIs vs drivers in the new world
The difference between an API and a data driver is that an API is a specification that describes what to do. A driver is an implementation that describes how to do it. They’re both still relevant in the modern services world. JDBC and ODBC are standards that have been around for more than a decade. According to CData’s Sharma, they’re technologies that are going to stay around. He said, “The choice of connectivity or any of these driver technologies is dependent on the platform choice that people make. While ODBC is still very popular, I see a little bit of trend, not much, of people moving away from the native driver technology. ODBC is C/C++ based technology, and I see people moving to JDBC and ADO.NET instead of that. In some enterprises, I also see resistance to JDBC, just because of how Oracle is handling Java. Separate issue, but still Java is very popular. I see a little bit of trend with people are moving away from native. People had that impression that native technologies were required for performance reasons, but I don’t think that’s true anymore.”
The Java and .NET runtimes have matured so much, that they can be comparable to native technologies, and offer other advantages on top of that.
Mobile platforms are taking off in popularity. Sharma noted, “Our driver technologies offer direct connectivity from the driver to the data source. What’s more popular with the mobile platforms, is to go through an intermediary. If you’re building a mobile application, what you would do is, that application would talk to some server somewhere, which might be, again, JDBC or ADO.NET, or something, and that’s where the connectivity would happen, instead of building the connectivity right into the application on the device itself.”
The challenges that data connectivity present are multi-faceted and require that key players on both the business and technical sides of the issue collaborate and come up with innovative solutions. Sharma predicted, “People think connectivity’s easy, but it’s going to take a lot of effort to actually get it right.”