A few weeks ago a coworker asked me to do an introductory presentation on the way we use Docker. This is an attempt at distilling my thoughts into a more consumable form. In this post I’ll cover everything from setup, to a basic dockerfile, to deployment and advanced usage.

History and Motivation

In the early days of computing, hardware and the environment that an application ran in was largely predictable and unchanging. Since then, computers and servers have evolved into commoditized systems with many users and manifold possible hardware and software configurations. With a modern public cloud architecture, the developer has very little control over the hardware and software that will ultimately be used to deploy production software. In many cases, the developer has no control at all. As environments became more complex, tools popped up to automate provisioning and deployment as much as possible so that an environment could be as predictable as possible, but failures occur often and when you least expect.

A team where I work recently had a problem where their automated deployment script suddenly started pulling a new version of a JAR that was incompatible with the version their application was developed against, leading to a problem that occurred seemingly out of nowhere and was difficult to debug. The team has since fixed this, but the key takeaway is important. The fundamental nature of their problem was not that the deployment script pulled the wrong version of a package, but that this was even possible in the first place.

Process Containers

This may seem like a simple problem, but what developer can honestly say that they have control over every single layer of their deployment stack? For those that can, do you? What version of node/java/go are you running? Do you control it on every production system? Even all the way down to the security patch level? The answer to one of these questions is overwhelmingly likely to be no, and this is the fundamental problem that cgroups (process containers), LXC, and Docker set out to solve.

Cgroups are a layer of abstraction over an operating system similar to virtualization, but without requiring a fully virtualized operating system, and LXC was the first popular tool for managing cgroups. A process in a cgroup can be given limited or unlimited memory, hard disk space, network bandwidth, or access to arbitrary hardware. A process in a cgroup may think it is bound to listen on port 8080, when in fact it is bound on port 3232 of the host system, or not bound at all. It may think the system has only 2GB of memory when it actually has 10, or it may think it has 10 when it only has 2. A process in a cgroup is entirely isolated from the underlying hardware and the rest of the operating system unless it is explicitly given permission and access.

Docker

Docker is a management system for cgroups just like LXC. In fact, Docker used to be a wrapper around LXC. Docker takes container management to the next level with a layering file system, a public image repository, and a simple method for reproducible and fast image builds. It works by building an image in a series of layers, then it runs that image in a cgroup entirely isolated from the rest of the system. The only components that may be different between deployments are the hardware, the Linux kernel, and Docker itself.

Dockerfiles and LayerFS

Docker image layers themselves are described by commands in a Dockerfile. Each command in a Dockerfile creates another immutable layer on top of the Docker image. This is important to understand, once a layer is committed, it is permanent. Layers can only be added on top, not removed or modified. Each layer added to the Docker image creates a new Docker image. The docker image deployed to production is simply a tagged image representing that particular layer on top of all layers that came before it. Because of this, you can build new docker images on top of Docker images that you’ve already built. Below is an example of a simple Dockerfile built on top of an image from the DockerHub image repository.

FROM node:8.4-alpine
WORKDIR /opt/helloworld
RUN apt add —-no-cache make gcc g++ python git
COPY . .
RUN npm install
EXPOSE 3000
CMD [“npm”, “start”]

A basic Dockerfile

This Dockerfile begins with the node:8.4-alpine image, which is itself simply a layered docker image. Next, it creates a working directory and installs the build dependencies. After dependencies are installed, it copies the project files from the local directory into the image working directory. The ADD command is special, because it hashes the contents of any files it adds before adding them and only copies files if the contents are changed. It also only breaks the Docker cache if files changed. The Docker cache will be covered later. This speeds up builds and helps with reproducibility. After these files are copied, the node dependencies are installed, the EXPOSE port tells docker what port our application listens on, and the CMD instruction tells Docker how to start our application.

Building

The next step after creating a docker image is building it. The docker build command runs each command in the Dockerfile in succession, each command building a new layer. The final layer is the Docker image that you will deploy to production.

$ docker build –t helloworld .
Step 1/7 : FROM node:8.4-alpine
 ---> 016382f39a51
Step 2/7 : WORKDIR /opt/helloworld
 ---> 487d815acfa8
Step 3/7 : RUN apk add --no-cache make gcc g++ python git
 ---> 8396e01c1abc
Step 4/7 : COPY . .
 ---> 1b41e980e14e
Step 5/7 : RUN npm install
 ---> d46bf266b41b
Step 6/7 : EXPOSE 3000
 ---> b4a93ac05152
Step 7/7 : CMD [“npm”, “start”]
 ---> 58664ba3b045
Successfully built 58664ba3b045
Successfully tagged helloworld:latest

$ docker images
REPOSITORY   TAG           IMAGE ID       CREATED             SIZE
helloworld   latest        58664ba3b045   About a minute ago  273MB
node         8.4-alpine    016382f39a51   7 weeks ago         66.3MB

In the above build you can see a few things. First, a hash was generated for each step in the build. This is the hash representing the docker image as it existed at that layer. Each layer can be run just like any other docker container. Second, you can see that two images show up in the docker images output. By default, docker only shows tagged images. With a simple -a, you can display all of the images on the system.

$ docker images -a
REPOSITORY   TAG           IMAGE ID       CREATED             SIZE
helloworld   latest        58664ba3b045   About a minute ago  273MB
<none>       <none>        b4a93ac05152   About a minute ago  273MB
<none>       <none>        d46bf266b41b   About a minute ago  273MB
<none>       <none>        1b41e980e14e   About a minute ago  273MB
<none>       <none>        8396e01c1abc   About a minute ago  272MB
<none>       <none>        487d815acfa8   About a minute ago  66.3MB
node         8.4-alpine    016382f39a51   7 weeks ago         66.3MB

Caching

If you look closely at the above output, you can see that the image IDs exactly match the image hashes from the build command. This is used for Docker’s caching system. The caching system is one of Docker’s hallmark features, but it also is a point of friction for many new to Docker. In broad strokes, the caching system works by hashing the next command in the Dockerfile. It then compares the hash with the hashes of all other images in the system, as well as the hashes of all layers used to build the current state. If the hashes match, that image is used rather than being rebuilt. This means that steps like compilation are skipped if the hashes match. This can be a huge time saver in many cases where build dependencies are built from source and rarely change, but it can also be the cause of major headaches when the cache doesn’t rebuild something that you want it to rebuild. Some commands like ADD and COPY break the cache when it is appropriate, and in the advanced usage section we will go over some methods of manually breaking the cache.

The caching system works because of the layered file system described above. Each command is simply layered on top of the previous image, without mutating the underlying image. This means that any build dependencies present when a layer is committed will be in the image forever. Even if they are deleted in another layer later on, they still take up space in a lower layer of the image. In the case of tools like compilers and build toolchains, this can add up to hundreds of MBs if not GBs. In the advanced usage section we will discuss some strategies for reducing the docker image size.

Deployment

Tagging

The first step in a Docker deployment is to push your image to an image host. This can be a private image repository or a public one like Docker Hub. First, tag your image with the repository/name:tag format (tag is optional and will be set to latest by default). In my case I’m going to tag the image as dyladan/helloworld.

$ docker tag helloworld dyladan/helloworld
$ docker images
REPOSITORY         TAG           IMAGE ID       CREATED        SIZE
dyladan/helloworld latest        58664ba3b045   7 minutes ago  273MB
helloworld         latest        58664ba3b045   7 minutes ago  273MB
node               8.4-alpine    016382f39a51   7 weeks ago    66.3MB

Pushing

The next step is to push the image.

## $ docker push dyladan/helloworld:latest
The push refers to a repository [docker.io/dyladan/helloworld]
388bbf91205d: Pushed
6516b07acf4f: Pushed
2e03be0e8df6: Pushed
a1548a35ad87: Pushed
0b3e54ee2e85: Mounted from dyladan/beer-guardian
ad77849d4540: Mounted from dyladan/beer-guardian
5bef08742407: Mounted from dyladan/beer-guardian
latest: digest: sha256:c2731a19724bf48f0026b400a3e0f1678e4095cb6a3b44ba88690ee65e0c6db9 size: 1786

This pushes not only the image that we’ve built, but all of the layers used to build it. You may notice that the last three lines don’t show pushed, but show Mounted from dyladan/beer-guardian. This is because the Docker caching systems works for image repositories as well. It knows that the first three layers are the same as images already in the repository in a different repo, and skips sending them over the network. Again, the is enabled by the layering file system and the immutability of images.

Pulling

Next, on your production host, you pull the image from the repository.

$ docker pull dyladan/helloworld
Using default tag: latest
latest: Pulling from dyladan/helloworld
6d987f6f4279: Pull complete
23922eede9ea: Pull complete
4c0008704272: Pull complete
119c0fce894a: Pull complete
de2f7dd96f79: Pull complete
d3a44d47c6b4: Pull complete
5131935ce65c: Pull complete
Digest: sha256:c2731a19724bf48f0026b400a3e0f1678e4095cb6a3b44ba88690ee65e0c6db9
Status: Downloaded newer image for dyladan/helloworld:latest

$ docker images -a
REPOSITORY         TAG         IMAGE ID       CREATED         SIZE
dyladan/helloworld latest      58664ba3b045   20 minutes ago  273MB

Running

That’s it! Since we built a fully self-contained binary image with no external dependencies, we can simply run the image we pulled. No more downloading, installing, and configuring hundreds of dependencies on every host. Just the lightweight Docker environment and the Linux kernel. The best part is that you can run just one, or thousands of images on a host and the running images don’t even know or care. In the example below, the container listens on port 3000, but that is just the container port. On the host, it is actually listening on port 80.

$ docker run –t –p 80:3000 helloworld
npm info using ~npm@5.3.0~
npm info using ~node@v8.4.0~
npm info lifecycle helloworld@1.0.0~prestart: ~helloworld@1.0.0~
npm info lifecycle helloworld@1.0.0~start: ~helloworld@1.0.0~
helloworld@1.0.0 start /opt/helloworld
> node index.js
Listening on port 3000

Advanced Usage

Reducing Image Size

Breaking the Cache

Multi-Step Builds

Docker Compose

Kubernetes

TODO

  • Advanced Usage
    • Reducing image size
    • Breaking the cache
    • Multi-step builds
  • Docker Compose
    • One host
    • Multiple hosts
    • Docker machine
  • Kubernetes and ECS