A few weeks ago a coworker asked me to do an introductory presentation on the way we use Docker. This is an attempt at distilling my thoughts into a more consumable form. In this post I’ll cover everything from setup, to a basic dockerfile, to deployment and advanced usage.
History and Motivation
In the early days of computing, hardware and the environment that an application ran in was largely predictable and unchanging. Since then, computers and servers have evolved into commoditized systems with many users and manifold possible hardware and software configurations. With a modern public cloud architecture, the developer has very little control over the hardware and software that will ultimately be used to deploy production software. In many cases, the developer has no control at all. As environments became more complex, tools popped up to automate provisioning and deployment as much as possible so that an environment could be as predictable as possible, but failures occur often and when you least expect.
A team where I work recently had a problem where their automated deployment script suddenly started pulling a new version of a JAR that was incompatible with the version their application was developed against, leading to a problem that occurred seemingly out of nowhere and was difficult to debug. The team has since fixed this, but the key takeaway is important. The fundamental nature of their problem was not that the deployment script pulled the wrong version of a package, but that this was even possible in the first place.
This may seem like a simple problem, but what developer can honestly say that they have control over every single layer of their deployment stack? For those that can, do you? What version of node/java/go are you running? Do you control it on every production system? Even all the way down to the security patch level? The answer to one of these questions is overwhelmingly likely to be no, and this is the fundamental problem that cgroups (process containers), LXC, and Docker set out to solve.
Cgroups are a layer of abstraction over an operating system similar to virtualization, but without requiring a fully virtualized operating system, and LXC was the first popular tool for managing cgroups. A process in a cgroup can be given limited or unlimited memory, hard disk space, network bandwidth, or access to arbitrary hardware. A process in a cgroup may think it is bound to listen on port 8080, when in fact it is bound on port 3232 of the host system, or not bound at all. It may think the system has only 2GB of memory when it actually has 10, or it may think it has 10 when it only has 2. A process in a cgroup is entirely isolated from the underlying hardware and the rest of the operating system unless it is explicitly given permission and access.
Docker is a management system for cgroups just like LXC. In fact, Docker used to be a wrapper around LXC. Docker takes container management to the next level with a layering file system, a public image repository, and a simple method for reproducible and fast image builds. It works by building an image in a series of layers, then it runs that image in a cgroup entirely isolated from the rest of the system. The only components that may be different between deployments are the hardware, the Linux kernel, and Docker itself.
Dockerfiles and LayerFS
Docker image layers themselves are described by commands in a Dockerfile. Each command in a Dockerfile creates another immutable layer on top of the Docker image. This is important to understand, once a layer is committed, it is permanent. Layers can only be added on top, not removed or modified. Each layer added to the Docker image creates a new Docker image. The docker image deployed to production is simply a tagged image representing that particular layer on top of all layers that came before it. Because of this, you can build new docker images on top of Docker images that you’ve already built. Below is an example of a simple Dockerfile built on top of an image from the DockerHub image repository.
FROM node:8.4-alpine WORKDIR /opt/helloworld RUN apt add —-no-cache make gcc g++ python git COPY . . RUN npm install EXPOSE 3000 CMD [“npm”, “start”]
A basic Dockerfile
This Dockerfile begins with the
node:8.4-alpine image, which is itself simply a layered docker image. Next, it creates a working directory and installs the build dependencies. After dependencies are installed, it copies the project files from the local directory into the image working directory. The
ADD command is special, because it hashes the contents of any files it adds before adding them and only copies files if the contents are changed. It also only breaks the Docker cache if files changed. The Docker cache will be covered later. This speeds up builds and helps with reproducibility. After these files are copied, the node dependencies are installed, the
EXPOSE port tells docker what port our application listens on, and the
CMD instruction tells Docker how to start our application.
The next step after creating a docker image is building it. The docker build command runs each command in the Dockerfile in succession, each command building a new layer. The final layer is the Docker image that you will deploy to production.
$ docker build –t helloworld . Step 1/7 : FROM node:8.4-alpine ---> 016382f39a51 Step 2/7 : WORKDIR /opt/helloworld ---> 487d815acfa8 Step 3/7 : RUN apk add --no-cache make gcc g++ python git ---> 8396e01c1abc Step 4/7 : COPY . . ---> 1b41e980e14e Step 5/7 : RUN npm install ---> d46bf266b41b Step 6/7 : EXPOSE 3000 ---> b4a93ac05152 Step 7/7 : CMD [“npm”, “start”] ---> 58664ba3b045 Successfully built 58664ba3b045 Successfully tagged helloworld:latest $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE helloworld latest 58664ba3b045 About a minute ago 273MB node 8.4-alpine 016382f39a51 7 weeks ago 66.3MB
In the above build you can see a few things. First, a hash was generated for each step in the build. This is the hash representing the docker image as it existed at that layer. Each layer can be run just like any other docker container. Second, you can see that two images show up in the docker images output. By default, docker only shows tagged images. With a simple
-a, you can display all of the images on the system.
$ docker images -a REPOSITORY TAG IMAGE ID CREATED SIZE helloworld latest 58664ba3b045 About a minute ago 273MB <none> <none> b4a93ac05152 About a minute ago 273MB <none> <none> d46bf266b41b About a minute ago 273MB <none> <none> 1b41e980e14e About a minute ago 273MB <none> <none> 8396e01c1abc About a minute ago 272MB <none> <none> 487d815acfa8 About a minute ago 66.3MB node 8.4-alpine 016382f39a51 7 weeks ago 66.3MB
If you look closely at the above output, you can see that the image IDs exactly match the image hashes from the build command. This is used for Docker’s caching system. The caching system is one of Docker’s hallmark features, but it also is a point of friction for many new to Docker. In broad strokes, the caching system works by hashing the next command in the Dockerfile. It then compares the hash with the hashes of all other images in the system, as well as the hashes of all layers used to build the current state. If the hashes match, that image is used rather than being rebuilt. This means that steps like compilation are skipped if the hashes match. This can be a huge time saver in many cases where build dependencies are built from source and rarely change, but it can also be the cause of major headaches when the cache doesn’t rebuild something that you want it to rebuild. Some commands like
COPY break the cache when it is appropriate, and in the advanced usage section we will go over some methods of manually breaking the cache.
The caching system works because of the layered file system described above. Each command is simply layered on top of the previous image, without mutating the underlying image. This means that any build dependencies present when a layer is committed will be in the image forever. Even if they are deleted in another layer later on, they still take up space in a lower layer of the image. In the case of tools like compilers and build toolchains, this can add up to hundreds of MBs if not GBs. In the advanced usage section we will discuss some strategies for reducing the docker image size.
The first step in a Docker deployment is to push your image to an image host. This can be a private image repository or a public one like Docker Hub. First, tag your image with the
repository/name:tag format (tag is optional and will be set to latest by default). In my case I’m going to tag the image as
$ docker tag helloworld dyladan/helloworld $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE dyladan/helloworld latest 58664ba3b045 7 minutes ago 273MB helloworld latest 58664ba3b045 7 minutes ago 273MB node 8.4-alpine 016382f39a51 7 weeks ago 66.3MB
The next step is to push the image.
## $ docker push dyladan/helloworld:latest The push refers to a repository [docker.io/dyladan/helloworld] 388bbf91205d: Pushed 6516b07acf4f: Pushed 2e03be0e8df6: Pushed a1548a35ad87: Pushed 0b3e54ee2e85: Mounted from dyladan/beer-guardian ad77849d4540: Mounted from dyladan/beer-guardian 5bef08742407: Mounted from dyladan/beer-guardian latest: digest: sha256:c2731a19724bf48f0026b400a3e0f1678e4095cb6a3b44ba88690ee65e0c6db9 size: 1786
This pushes not only the image that we’ve built, but all of the layers used to build it. You may notice that the last three lines don’t show
pushed, but show
Mounted from dyladan/beer-guardian. This is because the Docker caching systems works for image repositories as well. It knows that the first three layers are the same as images already in the repository in a different repo, and skips sending them over the network. Again, the is enabled by the layering file system and the immutability of images.
Next, on your production host, you pull the image from the repository.
$ docker pull dyladan/helloworld Using default tag: latest latest: Pulling from dyladan/helloworld 6d987f6f4279: Pull complete 23922eede9ea: Pull complete 4c0008704272: Pull complete 119c0fce894a: Pull complete de2f7dd96f79: Pull complete d3a44d47c6b4: Pull complete 5131935ce65c: Pull complete Digest: sha256:c2731a19724bf48f0026b400a3e0f1678e4095cb6a3b44ba88690ee65e0c6db9 Status: Downloaded newer image for dyladan/helloworld:latest $ docker images -a REPOSITORY TAG IMAGE ID CREATED SIZE dyladan/helloworld latest 58664ba3b045 20 minutes ago 273MB
That’s it! Since we built a fully self-contained binary image with no external dependencies, we can simply run the image we pulled. No more downloading, installing, and configuring hundreds of dependencies on every host. Just the lightweight Docker environment and the Linux kernel. The best part is that you can run just one, or thousands of images on a host and the running images don’t even know or care. In the example below, the container listens on port 3000, but that is just the container port. On the host, it is actually listening on port 80.
$ docker run –t –p 80:3000 helloworld npm info using ~email@example.com~ npm info using ~firstname.lastname@example.org~ npm info lifecycle email@example.com~prestart: ~firstname.lastname@example.org~ npm info lifecycle email@example.com~start: ~firstname.lastname@example.org~ email@example.com start /opt/helloworld > node index.js Listening on port 3000
Reducing Image Size
Breaking the Cache
- [ ] Advanced Usage
- [ ] Reducing image size
- [ ] Breaking the cache
- [ ] Multi-step builds
- [ ] Docker Compose
- [ ] One host
- [ ] Multiple hosts
- [ ] Docker machine
- [ ] Kubernetes and ECS