This month’s column is a transcript of a fascinating conversation I had with MapR executives Jim Scott, the director of enterprise strategy and architecture at Hadoop solution provider MapR, and Jack Norris, senior VP of data and applications, on the subject of microservices and scaling Big Data.
SD Times: So, we know that scaling data can be a hassle. What is the impact of microservices on this issue?
Jack Norris: There are some complementary technologies that really are game-changing in terms of how to take advantage of [microservices]. The underlying data layer is an incredible enabler of microservices. If you’re doing microservices that are ephemeral and don’t require a lot of stateful data, then I think it’s pretty well understood and people can be quite successful with it. But the data issues drive a lot of complexity for the developers and for the administrators, and that’s an area that Jim has championed for quite a while, and his experience as an architect and a developer allowed him to grasp this and see it early on.
(Related: How machine learning handles Big Data)
Jim Scott: There are two different ways to look at it when you look at the more ephemeral services. If you were to take just kind of a general front-end service that’s handling the primary load of a consumer-facing application, it’s probably not going to be doing a lot of work. It’s probably going to be handing off the workload to other services that are sitting behind it. Those services sitting behind it are the ones that are more likely to fall into this model. So, if you were to imagine companies building websites like Amazon, where it consists of 100-plus different service calls to a bunch of different back-end services, there’s the need to compile all the different information to bring back and build a user experience.
When you start looking at those services, being able to have a linearly scalable back-end data flow is pretty important. As you scale out your services, which are going to be doing some of the work, they need to figure out who the user is, what information is relevant to them, they then need to give that information back to the front end to render a front end for the user. The compilation of those different data sets is pretty important. Being able to scale out that tier that is intelligent, where it’s clearly doing some level of computational work, is one thing. But in the same vein, without the data that it depends on, it can’t really do anything.
So, as you scale that service up, you will see how much work each instance of that microservice can perform. You know your scaling factors, and then you know based off of how many different services you have what your workloads are on your back-end data platform, and so when you exceed the total capabilities, you just add another server to that cluster. The same goes for whether it’s a streaming capability, a database capability or a file system capability.
Those microservices, when you imagine for just a moment when you start deploying microservices, if you are the software engineer, you need to have visibility into your services. And that is to say, how fast are they performing? Are there bottlenecks? Are there certain types of requests that are coming in that are causing errors? So, when you look at performance and application monitoring, you must be able to emit data from these instances of microservices so that you can troubleshoot. In the old troubleshooting model, we typically did that by doing complete isolation for different servers, and then each server had its own logs, and you could just trace it that way. Trouble is, that doesn’t scale very well from a cost perspective.
The great thing is, if you imagine how it was done last year, or five years ago, or 10 years ago, however far back you want to go, they were the equivalent of multipurpose applications, monolithic if you will, and those applications had a long life cycle to be able to get updates into them. And the scaling factors for them were all or nothing. You basically typically put one instance on a server, and as soon as you realized that one instance couldn’t consume all the CPU, you then went back and figured out how to run multiple instances by doing some nice DevOps types of procedures that are much easier nowadays, making sure that things were listening on the proper ports so things could be load-balanced.
And then from a microservice perspective, if you think about it in a much more granular approach, I can now say, “OK, I know exactly what I have in each of these services, and I can monitor and measure their performance because I’ve decoupled the communication between the components.” If you don’t decouple the communication, you basically will not have a microservice model because everything will be tightly coupled and just fall over.
Having a decoupled communication model suddenly makes it where this front-end service can now say, “Hey, I’ve got a user that just showed up.” Here’s the message, you drop it on a stream, and then the application—or a cluster of an application—all working as a group, come in and say, “Give me the next message sitting here waiting for me to operate on.” It picks up that message, it does its work, and it puts the return message on another stream for the front end to listen to.
That decoupling is absolutely critical to be able to get the real scaling. As soon as you have the application decoupled, now, because of that, I can come in and say, “Oh man, I had a bug in this service.” Well, you don’t have to redeploy every application in the entire stack. You just go redeploy that one microservice. As long as you’re not changing your API, as long as you’re just fixing bugs, it’s not a big deal. It’s very easy, it’s very fast and it’s very fluid.
What’s different from SOA?
Scott: I think the first thing that pops out is when you look at traditional message queues versus the Kafka model, log-based messaging that Kafka and MapR implement. When you look at that model, it is a high-speed, high-throughput persistent model, whereas [with] the message queues of yesteryear, people would be happy to get 50,000 or 60,000 messages per second to go through it. They’d be cheering. Now, it’s kind of an embarrassment if you’re talking about that because you have access to tools like Kafka and MapR Streams. The general expectation is that if you’re getting less than probably half a million a second, you probably don’t want to tell your friends about it. You’re probably messing something up somewhere.
So fundamentally, the scaling factor changed. In the old technology, if I needed one server to be able to handle 50,000 to 60,000 events per second, and I need to be able handle 100,000 events per second today, 150,000 the day after that, suddenly, that’s a pretty expensive scaling factor. That’s pretty big. And so, going to microservices, where every component receives and sends its own events, every component you break out can be a micro single-purpose service, effectively now has an input and output. So you multiply every different service you break out and it grows very rapidly.
If your monolithic application had 20 or 30 different individual pieces that you pull out into microservices, now take your total events in and multiply it by 30, and then multiply it by 2 because you have an input and an output on every one of those, because it’s all decoupled, so you’re now up to times 60. Now think about the fact that you want to do monitoring, and you want to be able to get metrics out of all of these, and each one of those is going to have an event stream. So you are effectively up to 100x of what you originally started with. That’s not even creating anything new. It’s just shifting from one architecture to another.
That microservices model, really fundamentally, when you look at it as a factor of the scaling cost, it’s ridiculous on the old technology. It really doesn’t work well. It’s kind of illogical to even conceive of a microservices model on that old SOA/ESB type of architecture that was out there. They were just much more heavy-handed architectures then.
The concepts are still valid and true, it’s just the technology implementation models that were there couldn’t keep up with the speed, which made it too expensive to implement this type of a model. But the fundamentals of a message-driven architecture or an event-driven architecture, the value is there. People have seen the value, they understand the value, but now finally the costs of the technologies are there to be able to support these models.
What does this mean for developers?
Scott: For software developers, number one, it’ll mean a pretty significant reduction in workload, to be able to get changes into production, because a single service can be monitored, a single service can be edited and be put back into production with easy bug fixes. When you have API changes. That’s going to be about the same as it was under any old architecture, because you have to deploy and release multiple components for different versions. But fundamentally it is taking the burden off software developers and architectures of how to make things scale.
And I will add, I like to try to point out to people that not everything is rainbows and unicorns. The fundamentals are this is a new technology stack for most people, and they still have to get comfortable with the technologies. Using a microservices model, an event-driven architecture, requires a little bit of discipline and requires getting used to some technologies that people aren’t accustomed used to using.
But I don’t see that as a big hurdle or even a steep learning curve. I just see it as people need constant reminding that this is a new technology; don’t just expect to be running on day one. Give yourself some time, set yourself up for success, give yourself the ability to prove that it works the way you need it to, and learn how to use the tools and technologies to support your use cases.
What is MapR doing in this space?
Scott: The first is MapR Streams. MapR Streams is extremely fast. On a properly defined hardware stack—and by that I mean extremely fast networking—we’ve had benchmark tests done that show MapR, with 200-byte messages, can push through 3.5GB per second of throughput. That’s 18 million events per second sustained, or one and a half trillion events per day on a five-node cluster. Most people would never even come close to that. The reason why I think it’s important to point it out is because it helps put the proof out there that MapR’s not going to be the bottleneck, the technology will not bottleneck your capability, so you can focus on solving the problems. You know exactly what your scaling factor is once you start getting your payloads moving through.
In the data platform itself, we have certain capability sets that really greatly enable the users to get much more creative and think outside of the traditional boxes that they’ve been put into, with predominantly relational databases and such. When you look at MapR DB, it has the same underlying capabilities that MapR Streams and the file system has, and that is the ability to be able to do things like snapshots and organize your data based on volumes.
So, when you think about a volume of data and a microservice, you have stream, you potentially have MapR DB for doing data persistence, so if you think about some full-fledged application stack, and you say, “OK, I have users coming in through my front end and I want to capture every event.” You could have all of those events come in and put them on a stream, and that stream you could locate in volume A.
That stream of data coming in in volume A, over time you consume it with your purpose-filled application stack, you write all of your data into volume B. Maybe you’re creating some user profiles, you’re bringing in third-party data sets, and you’re creating software to make recommendations back to your user. So you get this built, you’re happy, everything’s great. Tomorrow, someone says, “Hey, we have to try an alternate approach because we think we can squeak more out on the front end or increase our revenue stream here because we have a better user profile we can build.” So you could create your alternate implementation of this codebase, and you could use volume C, and you could replay the entire stream of history of the volume A stream. So from the time you started up until now, you could rerun it all through, and you could regenerate all of those user profiles that you had.
You could do profile-by-profile comparisons, and then you could actually use that to create something like an A/B test for your application to see what the performance is for your user profiles in your new implementation versus your old, and see if you want to switch over to it or trash it. And if you want to switch over to it, well, you already have it over there; you don’t have to migrate any data. You just switch off the old. And then instead of doing an A/B test, you just switch it over and you send all of your traffic over to that instance of that data set. So it opens up the door to just a plethora of opportunities for how you can pick and choose to use the data platform to support all the aspirations that you have from an innovative perspective for your business.
The last thing I would say that is beneficial here is we have Project Spyglass that’s part of our product offering. It’s a single-pane monitoring application stack. We ingest metrics from the services running on top of MapR, and we also ingest all of the logs coming from the processes running on MapR. Over time, Project Spyglass will become more and more mature, and in the future it will be completely capable of supporting the entire microservices architectures that people will want, so that they don’t have to figure out how to monitor their microservices. It will just become part of the same single pane of glass that they use to monitor the rest of the data platform.
Norris: To take it back up to the 60,000-foot level, this is all about how you drive agility within an organization. The types of applications that can take data and analytics and have those brought to bear on the operation so that the analytics move from a reporting function to actually impacting business as it happens. That’s a huge area to exploit. Microservices and a converged data platform make that easier for organizations to do.
That insight…in the past was, how does an analyst do a query and somehow get better informed? The insight that we’re seeing now pertains more to automated actions, and how do you kind of bake that into the process. That’s a whole new frontier.