Lorenzo Villani

(╯°□°)╯︵ ┻━┻

A Gentle Introduction To Docker

Jan 6, 2016

Brief History of Containers

In recent times there has been a lot of talk about containers being the definitive solution to simplify application deployments, especially thanks to all the hype around Docker which has become synonym with “container”.

We may be under the impression that containers are a new technology, but we are far from the truth.

We can see containers like the natural evolution of “chroots” and FreeBSD “jails” readily available, respectively, from the ’80s and 2000. On Linux we find that precursors to Docker were already in use since 2005, thanks to commercial solutions like Virtuozzo (which was later open sourced under the OpenVZ brand). An honorable mention goes to LXC which appeared in 2008 and was the foundation for Docker until 2014.

Containers Versus Virtual Machines

What are the main differences between a container and a virtual machine? Why should we choose one or the other?

A “classic” solution like VMware, HyperV or qemu/KVM assumes the creation of a complete virtual machine that simulates a CPU with all its peripherals, disks and network cards. This allows us, for example, to let Windows run on a Linux server. This is clearly heavyweight, even though it can be accelerated by support provided by the most modern CPUs.

Containers instead leverage support offered directly by the operating system’s kernel that allows partitioning and segregation of process groups. Since this mechanism works at the syscall level, without emulation, it has little to no overhead and there are no virtual machines involved. This allows us to greatly increment the “population density” of a server.

Naturally, each solution has up- and downsides. A standard hypervisor gives us maximum flexibility to choose the runtime environment: we can choose the operating system, run custom kernels and manage resources as if we are working on real PC. A container can run applications written for the same operating system and architecture the host is using and doesn’t allow us to run a custom kernel, but in change it gives us finer control over resources.

We also have to consider the security aspects of both solutions: virtual machines offer stronger insulation between each other when compared to containers even though the latter are rapidly improving on this front.

The Rise Of Docker

In 2013 Docker became the synonym with container, thanks to features that made it easier to use than lower-level solutions like LXC and a few technical innovations.

Docker is founded on the concept of “layers”: immutable file-system images that can be stacked on top of one another and can be instanced into containers. The stacking allows developers to compose the final image up of multiple parts and save space, since layers common to many images are shared between them.

Creation of an image happens through a so-called “Dockerfile”, a recipe that contains all steps needed to build a certain image and allows others to reproduce it (other than being a form of “executable documentation”). In the following example you can see how easy it is to create an image to run a simple web application written in Node.JS:

FROM ubuntu:14.04
ENV DEBIAN_FRONTEND noninteractive
RUN sed 's/archive./it.archive./g' -i /etc/apt/sources.list
RUN apt-get update && apt-get install -y curl
RUN curl -sL https://deb.nodesource.com/setup_4.x | sudo -E bash -
RUN apt-get install -y nodejs
RUN mkdir /var/www
ADD app.js /var/www/app.js
EXPOSE 8080
CMD ["/usr/bin/node", "/var/www/app.js"]

Once we have created our image, we can launch one, ten or a thousand times by running:

docker run -P nodejs-example

But let’s remember that images are immutable: containers will use it as a base and changes persist only as long as the container is still alive. Once we destroy it, its changes will be lost. As we expect, Docker allows us to “mount” a non-volatile directory within a container thanks to “data volumes”.

Once we have created an image, we can publish it to a central registry called Docker Hub to share it with the community, thus simplifying the deployment process, since Docker will automatically try to pull an image from the registry if it can’t find it locally. It is also possible to run a private registry or export and subsequently import images transferred via USB drives, if you are so inclined.

All of this greatly simplifies the process of installing an application. To make an example: a typical installation of Mattermost requires the separate configuration of PostgreSQL and Nginx, in addition to Mattermost itself, a process that can take at least half an hour for an expert system administrator.

Thanks to Docker, though, I can run:

docker run --name mattermost-dev -d --publish 8065:80 mattermost/platform

And then, two minutes later, I can point my browser to port 8065 to complete the configuration process.

The developer didn’t have to make packages for each and every Linux distribution and I didn’t have to become crazy configuring all the moving parts. This is a simple example, though. A complex application might require more than configuring just a database and a web server.

Docker as the alternative to package managers

Since images are really easy to create and they contain all the libraries and dependencies that an application needs, it’s easy to think of Docker as an alternative to a package manager, at least for web applications.

Docker allows us to not bother which version of Ubuntu, Python or Apache the user is running. A developer can start off with the base image of its favorite distribution, bundle all files together and make it available through Docker Hub. Said image can be launched as easily on CentOS, Fedora, SUSE, Arch Linux or whatever Linux distribution is trendy these days.

It’s not a coincidence that the Open Container Initiative was recently formed to standardize a packaging format that is compatible across all containerization solutions.

Summing up

In the end Docker and other containerization technologies bring some advantages that we should not overlook: