His first presentation may have ended inauspiciously with the moderator cutting him off mid-sentence at the five-minute mark, but the developers sitting in Solomon Hykes’ 2013 PyCon session knew that the founder of Docker, an open-source application container engine, had unveiled a tool that would bring revolutionary simplicity to deployment. They began texting their respective motherships with news of the discovery almost immediately. John Wetherill, technology/PaaS evangelist for the Vancouver-based ActiveState, was one of them.
“When we first shipped the Stackato PaaS in early 2012, we used plain vanilla [Linux] containers and our own tool, called Fence, to orchestrate them,” said Brent Smithurst, vice president of product management for ActiveState. “John [Wetherill] was at the Python conference where Solomon Hykes did his presentation on Docker containers. He texted me from the session, ‘Wow, this guy is showing a great service for us.’ Solomon shared the source code with us before the release on GitHub, and we were able to work it into Stackato.”
“Containers are 15 years old—we were doing containers before they were cool,” said John Gossman, an architect on Microsoft’s Azure core team. “But the industry, until Docker came along, didn’t recognize containers could be a great developer experience. That’s what Solomon did.”
At the nexus of microservices, DevOps, Continuous Delivery and service-oriented architectures (SOAs), San Francisco-based Docker may have set a land-speed record for fastest formation of a vibrant technology ecosystem around a single tool. Launched in March 2013, in just two years the company now claims to have 4 million developers using Docker to deploy platform-independent apps packaged with all the components, libraries and operating systems they need to run.
“They have downloaded the container format close to 500 million times,” said David Messina, vice president of enterprise marketing at Docker. “That points to incredible traction. At DockerCon in June last year, the number of container downloads was 3 million.”
Adrian Cockroft, a technology fellow at Battery Ventures best known for leading Netflix’s cloud migration, concurs: “This has taken off faster than every other ecosystem I’ve ever seen. Hadoop took five years or so to start to grow. Docker took six months to do what most ecosystems do in years.”
Hadoop is an apt comparison, because just as that technology has made crunching Big Data economical, Docker slashes the cost of deployment, according to Cockroft. “In the old days, you had a data center full of machines, and most were idle,” he said.
“What virtual machines did was consolidate the CPU power. Docker takes it an extra step because it consolidates the memory, and that is more expensive.”
Consolidating all the containers into a smaller memory footprint saves money because it uses less RAM to run an equivalent amount of work. “Typically, you’re running inside a VM, but instead of 10 VMs, now you’ve got one,” said Cockroft. “Or on Amazon Web Services, you can have 8x large instead of 10x large. So you have the same memory footprint, but now you’re running eight containers. That could be a 50% to 70% cost savings.”
Docker: What, who and why
In addition to its low memory overhead, Docker wins points for isolation, fast boots and shutdowns, and cloud deployment elegance. Written in Google’s highly portable Go programming language, the Docker engine comprises a daemon/server that manages the containers, with a client to control the daemon. Containers, according to Docker’s Web-based command-line tutorial, are a “process in a box. The box contains everything the process might need, so it has the file system, system libraries, shell and such, but by default none of these are running. You ‘start’ a container by running a process in it.”
Docker has revitalized DevOps in three ways:
- By adding image-management and deployment services to longstanding, difficult-to-use Linux container technology
- By launching a vital developer community around the open-source Docker engine
- By assembling an ecosystem of complimentary technologies that make deploying microservices and monolithic apps alike a push-button affair.
“The Docker Hub has 100,000+ Dockerized services in the hub that I as a developer can pull from,” said Docker’s Messina. “If I have an app that has a Linux distribution, a language stack, a database like MongoDB and a Web server like nginx, I can orchestrate these services from my desktop.”
While much of the DevOps tooling (such as Puppet or Ansible) has focused on the Ops end of the pipeline, Docker owes much of its popularity to a massive developer following. “The driving force for Docker are Dev teams, but Ops is also a critical stakeholder,” said Messina. “In the majority of organizations, development has led the process of containerization. But in others, like Yelp or Groupon, Ops set up a framework around Docker, then began marketing the productivity improvement to development.”
(Related: Docker containers a topic at AnsibleFest)
Not surprisingly, Docker’s rapid rise has sparked some controversy. While software companies like Microsoft, Amazon, ActiveState, JFrog, ClusterHQ and more jockey to ride the containerization wave, criticisms have been lobbed. One is that the tool is too simplistic for enterprise use. That’s the argument made in a December 2014 manifesto by CoreOS, which launched a competitor, Rocket, as a container runtime “designed for composability, security and speed.”
According to Alex Polvi, CEO of CoreOS, the Docker repository’s original manifesto described a simple container standard. “Unfortunately, a simple reusable component is not how things are playing out,” he said.
“Docker now is building tools for launching cloud servers, systems for clustering, and a wide range of functions: building images, running images, uploading, downloading, and eventually even overlay networking, all compiled into one monolithic binary running primarily as root on your server. The standard container manifesto was removed. We should stop talking about Docker containers, and start talking about the Docker Platform. It is not becoming the simple composable building block we had envisioned.”
One of Rocket’s core design principles is security, and Docker’s approach to security has been the other main controversy facing the young company.
Security: What’s in that image?
No one denies there are risks associated with using images downloaded from the public Docker registry, as noted in a May 15, 2015 blog by the container startup Banyan Ops. The report, by Jayanth Gummaraju, Tarun Desikan and Yoshio Turner, was titled “Over 30% of Official Images in Docker Hub Contain High Priority Security Vulnerabilities.” Known exploits such as Shellshock, Heartbleed and POODLE were found in images the company pulled from Docker Hub. But is the claim as damning as it seems?
“It’s inaccurate. The official repositories are the 70-plus repos that we work very specifically with the ISVs to create,” said Docker’s Messina. “There is parity with what they have and what the ISVs have.
“We go through a very rigorous process ourselves. Before they make the official repo, we go through the vulnerabilities ourselves. What that [report] did was take a set of raw numbers that don’t reflect how developers use the images. We don’t remove images from Hub. Also, what they scanned for was inaccurate: They just looked for the release level, just the numbers, as opposed to scanning for vulnerabilities. Debian has much deeper level of code numbering scheme… So basically, their counting is wrong.”
Cockroft added: “The container provides some isolation, but not as much as a VM. When VMs came out, people weren’t happy about VM security. People were saying you could break out and control the host machine. In fact that’s happened very rarely. The isolation that Docker gives you is improving over time.”
Via Docker’s layered image model, it’s easier to get out patches and updates across a codebase as opposed to the non-containerized model, according to a Docker white paper on security best practices. Further, the paper concludes that “The simple deployment of Docker increases the overall system security levels by default, through isolation, confinement, and by implicitly implementing a number of best practice, that would otherwise require explicit configuration in every OS used within the organization.”
That facility has a downside, however. “If there is a container that has a flaw or security issue, people will get it automatically, so the pipeline needs to be secure,” said Fred Simon, cofounder and chief architect for JFrog, maker of the Artifactory binary repository. “You can’t secure one container at a time; it’s going to be too painful.”
Especially in a bare-metal scenario, deployed without x86 virtualization, Docker’s security best practices white paper notes that “Containers do not provide ring-1 hardware isolation, given that it cannot take full advantage of Intel’s VT-d and VT-x technologies. In this scenario, containerization is not a complete replacement of virtualization for host isolation levels.”
Could security concerns around Docker images be overstated? Microsoft’s Gossman doesn’t think so. “We are super paranoid about security here. It’s one of our highest goals. I don’t ever dismiss any sort of security questions. You need good security practices. You don’t download an image without running any tools on it to make it secure. That may be fine for dev and test, but not for production. In the multi-tenant cloud, we assume there’s a very sophisticated hacker there. They can sign up for Azure and they’re going to be sitting right next to customer data.”
Calling for containers at Microsoft and Amazon
As the cloud market matures, it’s interesting to observe the relative positions and philosophies of Microsoft and Amazon, two cloud providers with distinctively different offerings, but both are scrambling to react to an onslaught of customers who began asking to run Docker on their respective platforms in 2013.
In Amazon’s case, much of the action was around batch processing, encapsulating tasks in Linux containers, then running those containers on a fleet of instances. The biggest challenges? Cluster management, scaling, configuration management, container sprawl, availability, security (enforcing isolation) and scheduling. Their answer? Amazon EC2 Container Service, which provides a cluster-management infrastructure for Docker containers and provides existing features like security groups, Elastic Load Balancing, Elastic Block Store volumes and Identity and Access-Management roles.
“Amazon is a strange animal,” said JFrog’s Simon. “They are Infrastructure-as-a-Service, most popular for public cloud, but they already solve quite a lot of the issues of virtualization: The ability to create a VM, spawn a new exact copy of VM, orchestration. They’ve already met the appeal for containers. I don’t know how many people will actually use containers on top of Amazon.” Similar questions abound for Microsoft. The answer, in both cases, is portability.
“The big thing we hear is that people don’t mind running on AWS, but they don’t want to use native tools because what if they want to move it? You can just move Stackato over to Azure, HP Cloud or in-house,” said ActiveState’s Smithurst.
Cross-technology compatibility is definitely a motivation for Microsoft. “I wrote the original thought piece on containers at Microsoft,” said Gossman. “Our strategy is pretty simple. If we wanted to be, two or three years back, the Windows and .NET cloud, we wouldn’t even have succeeded at that. People want to run Java and Oracle on Windows. Customers have asked us to run Docker on Azure, and they’re also asking to run it on Windows. Windows is incredibly popular in private data centers and local clouds and competitive public clouds. Developers really like using Docker. We don’t want to have people choose.”
Getting the Docker command-line interface to run in a Linux VM on Azure wasn’t hard, Gossman said. A bigger effort was needed for the Docker extension, which makes it easy to install Docker and images. Microsoft is working on integrating Docker Compose, and is working on Docker Swarm for Azure, as well as Mesos and CoreOS on Azure. Nano Server, a minimal-footprint installation option of Windows Server optimized for cloud and container-based deployment, is also being prepped for release.
As for orchestration, “We don’t have an exact plan there,” said Gossman. “If you look at the tools, in most cases they haven’t even reached 1.0. We could build our own service, but it’s not clear which version is what the customer wants. We want to expand the Service Fabric that we announced recently to Linux and other languages.”
Finally, with regard to porting the Docker management experience, Gossman said, “There will also be a native API because other people will want other management experiences—even though we believe all the action is for Docker.”
Ecosystem: Orchestration, monitoring, data and more
Orchestration is a major concern with containerization, which has the tendency to produce sprawl. “Microservices are really interesting, but now you have a classic configuration-management issue. Now, instead of single executable, you have a swarm of things,” said James Creasy, vice president of engineering at SKUR, a robotic measurement startup for the construction industry. Creasy and others like him represent a wave of future adopters who are waiting for containers to mature before they hop on board.
A full-fledged production ecosystem will be part of the attraction for Creasy. Luckily for him, Docker’s API for automating and fine-tuning container creation and deployment has led to integrations for deployment, multi-node deployment, dashboards, configuration management, and Continuous Integration.
While Docker works as a simple, one-machine PaaS, managing a fleet of containers takes a different level of automation. That’s where Kubernetes, CloudFoundry, TerraForm, Mesos, CoreOS, Dokku, Deis, Flynn, Docker Swarm and others can add a scheduling layer, creating a mid-point between IaaS and PaaS in what some call “Containers-as-a-Service,” or CaaS.
On the monitoring side, a raft of technologies claim to make peeking into Docker containers and tracking their performance and behavior easy: DockerUI, OpenStack Horizon, Shipyard, cAdvisor, New Relic, ClusterUP, BoxSpy and more. For storage, Flocker is attempting to answer the question of how to add state to containers.
“Containers have been driven by very advanced, forward-thinking developers who say, ‘We don’t put state in our apps. We make sure we don’t have anything that needs to persist, with the data that those apps talk to being external to the app layer,’” said Mark Davis, CEO of ClusterHQ provider, of the new Flocker 1.0 container data-management software.
“While that’s fine, one thing we’ve discovered is that there’s no such thing as an app that is actually stateless. An app without data is useless. Even the most trivial app has data. So how can we deal with these stateless microservices? Is there a way for us to build databases, queues and key-value stores?”
Swisscom, Switzerland’s leading telco provider, has implemented Flocker with EMC ScaleIO as part of a PaaS initiative. With Flocker, Swisscom uses EMC ScaleIO as its persistent storage back end, gaining both scale-out storage for its microservices and data portability between physical and virtual servers, improving operational management and increasing density of distributed server-hosted applications.
“Swisscom has been watching Linux-based containers since their beginning,” said Marco Hochstrasser, head of Swisscom Application Cloud. “We believe that lightweight containers can provide significant benefits to major service providers, including dramatically higher density of applications per server. That means greater efficiency, decreased costs and higher flexibility.
“We decided to use Docker containers for a major new initiative, but as we investigated options, we realized that without a solution for persistent container data management, we wouldn’t be able to achieve the benefits we sought. When we saw Flocker from ClusterHQ, we knew we had found a compelling open-source solution.”
Prediction: Docker moves from dev and test into production
What will the next six months hold for Docker? “It’s evolving extremely quickly. I’m hoping that at DockerCon we’ll see more production-ready case studies,” said Battery Ventures’ Cockroft. “People running Docker in production, they’ve hand-crafted a lot of things. But it’s a matter of months to having more tooling: Products making it easy for test and dev workloads to switch over to production.”
Docker’s own ecosystem tools include Swarm, Machine and Compose. “They’re near 1.0 versions,” said Cockroft. “Mesos announced Mesosphere 1.0. Amazon’s EC2 container service is becoming a solid product. You have Google’s container service coming out and [Docker coming on] Azure. So the public cloud services are moving from evaluation to production phase. HP, IBM, the people that support OpenStack, they are all rolling out containerized ways to do it. They’re just coming together. By the end of year, we should see them. People should be using Docker for test and dev all the way to deployment.”
Will the drawbacks of containers slow them down? Not likely. Yes, the container abstractions add some network overhead, but the productivity gains are worth it. Yes, security is an afterthought, but the industry will patch that. Yes, microservices are harder to herd, but Continuous Delivery demands them. Docker’s utility seems guaranteed to last.
Microservices: ‘SOA for hipsters’?
In all the excitement around containers, the question of what defines a microservice has an elusive answer. While a number of executives were hard-pressed to make a distinction between microservices and components, they may be reassured to know that even software architecture guru Martin Fowler notes in a 2014 paper entitled “Microservices,” “While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data.” He goes on to describe microservices as suites of small services that communicate via lightweight mechanisms such as an HTTP resource API to comprise a single application.
But haven’t component-based development, and later service-oriented architectures, been software development’s goal for decades?
“I see it as next-level SOA, for sure—SOA for hipsters. Because it’s a useful way to architect things if you need it, but not necessarily a new, new badge,” said ActiveState’s Smithurst.
Fowler is optimistic about the new term, however. In the paper, he explains SOA’s spotty record: “When we’ve talked about microservices, a common question is whether this is just service oriented architecture that we saw a decade ago. There is merit to this point, because the microservices style is very similar to what some advocates of SOA have been in favor of.
“The problem, however, is that SOA means too many different things, and that most of the time that we come across something called ‘SOA,’ it’s significantly different to the style we’re describing here, usually due to a focus on ESBs used to integrate monolithic applications. In particular we have seen so many botched implementations of service orientation—from the tendency to hide complexity away in ESBs, to failed multi-year initiatives that cost millions and deliver no value, to centralized governance models that actively inhibit change, that it is sometimes difficult to see past these problems.”
As a result, Fowler writes, microservices might finally mean “service orientation done right.”
Five security tips for container-crazed coders
To avoid kernel exploits, denial of service attacks, cracked database passwords, poisoned images and more, these basics are a must (but by no means a complete security strategy):
- The usual rules of Internet hygiene apply: Verify image quality and provenance.
- Set boundaries: Containers are safest when segregated within VMs.
- Check your privilege: Don’t run containers with the “—” privilege flag, and drop privileges ASAP.
- Stay lean: Containers shouldn’t include anything the application doesn’t need to run.
- Protect the host: Run as non-root whenever possible.