More of today’s backend developers are embracing microservices so they can iterate faster and avoid single points of application or website failure. However, what they gain in speed can be at least partially offset by offset by the complexity of operating, debugging, and coordinating changes in a Microservices system. Microservices communicate over the network and that brings a lot of uncertainty.

“When you break your application or a website into microservices, those microservices need to be able to talk over the network, but networks are unpredictable by definition,” said Christian Posta, chief architect, Cloud Application Development at Red Hat. “Traditionally, developers have solved network resiliency issues in an application-centric way using circuit breaking, retries timeouts, certain types of exception handling and reporting. Service Mesh provides a much more elegant way of solving the problem.”

In the past, big web companies like Twitter, Netflix, Google and others solved these challenges with language-specific solutions. However, a general purpose solution would be preferable.

Container platforms like Kubernetes provide a new means of implementing resiliency solutions. That’s where the Service Mesh comes in,” said Posta. “The Service Mesh sits in between individual services written in Java, Ruby, Python and other languages which addresses distributed-systems concerns, among other things.”

Shape, control microservices traffic
Service resiliency is the most likely outcome of initial Service Mesh efforts, although some of the other possibilities are just as, if not more, intriguing.

“The resiliency aspect is what developers really want to see and get their hands on right now,” said Posta. “However, once their applications have the Service Mesh in between them, they’ll be able to do a lot of interesting things that traditional enterprise developers and operations folks are not as used to or familiar with, such as shaping and routing the traffic between services.”

For example, if Service A needs to talk to Service B, but Service B also has to talk to Services C and D simultaneously, then network traffic fans out from a single request. Using Service Mesh, developers would be able to see the traffic flow and control it.

Lower production risks
Service Mesh could also be used to reduce the scope of unpredictable network issues that adversely impact applications and websites running in production.

“When we deploy code, we do testing in the lower environments, hoping that those lower environments and the production environment are identical. If the tests pass, we assume the new version works and we replace the older version,” said Posta. “However, in big microservices systems where there’s a lot of uncertainty in the network, there’s no such thing has having exact parity between production and those lower environments, so getting new code into production still involves a lot of risks.”

If Service Mesh were used to gain control over the traffic, developers could achieve lower-risk deployments to production.

“Service Mesh would also allow you to do thinks like shadow or canary testing so you could do phased or graduated rollouts,” said Posta. “You could also build complicated A/B and cohort testing with this type of control that reduces the risk of bringing code changes to production.”

Red Hat is currently investing in bringing Service Mesh technology to its OpenShift platform, and is actively discussing the possibilities with customers and partners who are eagerly looking for simpler and more elegant solutions than have been available to date.

“The continuous evolution of container platforms means we must continually reimagine how best to build microservices and implement resiliency,” said Posta. “Container platforms provide a foundation that will allow us to do these things more elegantly. Service Mesh is just one opportunity to reimagine historically difficult problems as a more elegant, polyglot, solution.”

Learn more at