Keeping container image sizes small is one of the most common "best practice" tips out there. There is good reason for this; it's very simple to let a container with a complex Dockerfile
and a large application turn into a large container image.
A large container image can eventually become troublesome if left unchecked. When deploying a container into production, that production system must download the container from your registry. Ideally this process should be quick, however, network latency (i.e., downloading an image in London from a server in San Franscisco) can cause this process can take a long time.
If you are using services to build your containers, a large container could easily cause those services to timeout. This is also true of deployment automation such as Puppet, SaltStack, or Ansible. Each of these services has a max execution time -- a large image and a slow network connection could make for a messy or failed deployment.
With that said, there are several techniques for keeping images small. In today's article, we will explore an often-ignored technique, using the .dockerignore
file.
Exploring the .dockerignore File
The .dockerignore
file is a special file that can be placed within the build context directory. The build context directory is the directory that we specify at the end of a docker build
command. The file itself is a simple text file that contains a list of glob patterns for files and directories to exclude from the final build image.
By leveraging the .dockerignore
file, we can exclude files and directories we do not need within our final image. To explain this better, let's walk through a real world example.
Adding
One of the most common locations to store a dockerfile
is the top level of the application's code repository. My personal projects are no exception. For this article, we will go ahead and add a .dockerignore
file to one of my personal open-source projects that leverages Docker.
To get started, let's first clone the project.
$ git clone https://github.com/madflojo/automatron.git
With the project cloned, let's take a look at the files included in this repository.
$ cd automatron/ $ ls -la total 208 drwxr-xr-x 24 madflojo users 816 Apr 15 23:00 . drwxrwxrwt 7 root users 238 Apr 15 23:00 .. -rw-r--r-- 1 madflojo users 29 Apr 15 23:00 .coveragerc drwxr-xr-x 13 madflojo users 442 Apr 15 23:00 .git -rw-r--r-- 1 madflojo users 834 Apr 15 23:00 .gitignore -rw-r--r-- 1 madflojo users 5760 Apr 15 23:00 CONTRIBUTING.md -rw-r--r-- 1 madflojo users 408 Apr 15 23:00 Dockerfile -rw-r--r-- 1 madflojo users 11343 Apr 15 23:00 LICENSE -rw-r--r-- 1 madflojo users 220 Apr 15 23:00 Procfile -rw-r--r-- 1 madflojo users 4538 Apr 15 23:00 README.md -rw-r--r-- 1 madflojo users 9411 Apr 15 23:00 actioning.py drwxr-xr-x 4 madflojo users 136 Apr 15 23:00 config drwxr-xr-x 8 madflojo users 272 Apr 15 23:00 core -rw-r--r-- 1 madflojo users 7842 Apr 15 23:00 discovery.py -rw-r--r-- 1 madflojo users 415 Apr 15 23:00 docker-compose.yml drwxr-xr-x 10 madflojo users 340 Apr 15 23:00 docs -rw-r--r-- 1 madflojo users 2208 Apr 15 23:00 mkdocs.yml -rw-r--r-- 1 madflojo users 8609 Apr 15 23:00 monitoring.py drwxr-xr-x 9 madflojo users 306 Apr 15 23:00 plugins -rw-r--r-- 1 madflojo users 114 Apr 15 23:00 requirements.txt -rw-r--r-- 1 madflojo users 5992 Apr 15 23:00 runbooks.py drwxr-xr-x 5 madflojo users 170 Apr 15 23:00 tests -rw-r--r-- 1 madflojo users 1018 Apr 15 23:00 tests.py
With just a quick look, we can see several files and directories that could be omitted from a production Docker image. Files and directories such as .git/
, tests/
, mkdocs.yml
, and even the CONTRIBUTING.md
file.
Let's see if these files are included when we perform a docker build
.
$ docker build -t automatron .
The Dockerfile
within this repository adds files using the following instruction.
ADD . /
This instruction essentially adds all of the files located within the build directory to the /
directory within the container. We can see this if we run the container executing the ls -la
command.
$ docker run automatron ls -la / | grep 2017 -rw-r--r-- 1 root root 29 Apr 16 2017 .coveragerc drwxr-xr-x 8 root root 4096 Apr 16 2017 .git -rw-r--r-- 1 root root 834 Apr 16 2017 .gitignore -rw-r--r-- 1 root root 5760 Apr 16 2017 CONTRIBUTING.md -rw-r--r-- 1 root root 408 Apr 16 2017 Dockerfile -rw-r--r-- 1 root root 11343 Apr 16 2017 LICENSE -rw-r--r-- 1 root root 220 Apr 16 2017 Procfile -rw-r--r-- 1 root root 4538 Apr 16 2017 README.md -rw-r--r-- 1 root root 9411 Apr 16 2017 actioning.py drwxr-xr-x 3 root root 4096 Apr 16 2017 config drwxr-xr-x 2 root root 4096 Apr 16 2017 core -rw-r--r-- 1 root root 7842 Apr 16 2017 discovery.py -rw-r--r-- 1 root root 415 Apr 16 2017 docker-compose.yml drwxr-xr-x 6 root root 4096 Apr 16 2017 docs -rw-r--r-- 1 root root 2208 Apr 16 2017 mkdocs.yml -rw-r--r-- 1 root root 8609 Apr 16 2017 monitoring.py -rw-r--r-- 1 root root 114 Apr 16 2017 requirements.txt -rw-r--r-- 1 root root 5992 Apr 16 2017 runbooks.py drwxr-xr-x 5 root root 4096 Apr 16 2017 tests -rw-r--r-- 1 root root 1018 Apr 16 2017 tests.py
If we look above, we can see that all of the files from the build directory have been added to the container. Let's start excluding some of these files, starting with the .git/
directory. I am starting with the .git/
directory because it's a commonly large directory that can easily be overlooked.
The .git/
directory is a special directory that is used by git
to store all of the version control meta information. This includes details and even differences of each commit.
This means the more active a project, the larger the .git/
directory will be. For my project, the .git/
directory is only 1MB in size. However, if we look at the Apache Cassandra project's .git/
directory it is over 200MB in size. This is due to both the size of the codebase and the active nature of the project.
For our example, the .git/
directory might not add that much value, but if we were building a container from the Cassandra project's repository, removing the .git/
directory would greatly reduce the size of the resulting container image.
With that said, let's go ahead and add the .git/
directory to a newly created .dockerignore
file. We can do this by adding the following:
.git
Once this line is added, let's build the container again and check the resulting contents.
$ docker build -t automatron . $ docker run automatron ls -la / | grep 2017 -rw-r--r-- 1 root root 29 Apr 16 2017 .coveragerc -rw-r--r-- 1 root root 5 Apr 16 2017 .dockerignore -rw-r--r-- 1 root root 834 Apr 16 2017 .gitignore -rw-r--r-- 1 root root 5760 Apr 16 2017 CONTRIBUTING.md -rw-r--r-- 1 root root 408 Apr 16 2017 Dockerfile -rw-r--r-- 1 root root 11343 Apr 16 2017 LICENSE -rw-r--r-- 1 root root 220 Apr 16 2017 Procfile -rw-r--r-- 1 root root 4538 Apr 16 2017 README.md -rw-r--r-- 1 root root 9411 Apr 16 2017 actioning.py drwxr-xr-x 3 root root 4096 Apr 16 2017 config drwxr-xr-x 2 root root 4096 Apr 16 2017 core -rw-r--r-- 1 root root 7842 Apr 16 2017 discovery.py -rw-r--r-- 1 root root 415 Apr 16 2017 docker-compose.yml drwxr-xr-x 6 root root 4096 Apr 16 2017 docs -rw-r--r-- 1 root root 2208 Apr 16 2017 mkdocs.yml -rw-r--r-- 1 root root 8609 Apr 16 2017 monitoring.py -rw-r--r-- 1 root root 114 Apr 16 2017 requirements.txt -rw-r--r-- 1 root root 5992 Apr 16 2017 runbooks.py drwxr-xr-x 5 root root 4096 Apr 16 2017 tests -rw-r--r-- 1 root root 1018 Apr 16 2017 tests.py
As we can see from the resulting output, the container is now missing the .git/
directory.
The above is a simple example of using the .dockerignore
file. At this point, we could simply add a similar entry for each file and directory we wish to omit and we could have a smaller resulting image. There is, however, an easier way.
As I mentioned earlier, the .dockerignore
file understands Unix glob patterns. If, for example, we wanted to omit all files that started with a .
, we could simply add .*
to the file.
It is important to note that Unix style glob patterns are not regular expressions. .*
is a prime example of this. In a "glob" pattern, this matches everything that starts with a .
. In a regular expression, this would match every character, essentially matching every file and directory.
Since the .dockerignore
file uses Unix style glob patterns, we can safely add .*
and only dot-files will be excluded.
In addition to .*
, let's go ahead and add a few more items to omit.
.* docs mkdocs.yml docker-compose.yml test* *.md
In the above, we have some clearly specified items such as docs/
, docker-compose.yml
, and mkdocs.yml
. We also have some glob patterns such as test*
, which will cause us to omit tests/
and tests.py
. We also have another interesting one: *.md
, which will cause Docker to omit any markdown file such as README.md
and CONTRIBUTING.md
.
Let's see how this comes together by running another build and ls -la
.
$ docker build -t automatron . $ docker run automatron ls -la / | grep 2017 -rw-r--r-- 1 root root 408 Apr 16 2017 Dockerfile -rw-r--r-- 1 root root 11343 Apr 16 2017 LICENSE -rw-r--r-- 1 root root 220 Apr 16 2017 Procfile -rw-r--r-- 1 root root 9411 Apr 16 2017 actioning.py drwxr-xr-x 3 root root 4096 Apr 16 2017 config drwxr-xr-x 2 root root 4096 Apr 16 2017 core -rw-r--r-- 1 root root 7842 Apr 16 2017 discovery.py -rw-r--r-- 1 root root 8609 Apr 16 2017 monitoring.py -rw-r--r-- 1 root root 114 Apr 16 2017 requirements.txt -rw-r--r-- 1 root root 5992 Apr 16 2017 runbooks.py
The output this time is quite a bit less than our previous run. We can see that we are now missing the files we wanted to omit.
At this point, we have achieved our goal: we eliminated files that were not needed within our final image. There is, however, one file missing that I wanted to include.
!Sign up for a free Codeship Account
Using ! to include files
The missing file is the README.md
file. In the .dockerignore
file, we added a line *.md
to omit all markdown files. My project has a few markdown files already and I fully expect more to pop up in the future.
The problem is I'd like to include only the README.md
and no other markdown files. I'd also like to not have to specify each and every markdown file to accomplish this. Luckily, Docker provides this ability.
By adding the following, we can keep our removal of all markdown files but still retain our README.md
:
.* docs mkdocs.yml docker-compose.yml test* *.md !README.md
With the above, we simply added the README.md
file with the !
character in front of it. This tells Docker to include the README.md
or rather exclude it from other exclusions.
Let's go ahead and see what files our container includes with our changes applied.
$ docker build -t automatron . $ docker run automatron ls -la / | grep 2017 -rw-r--r-- 1 root root 408 Apr 16 2017 Dockerfile -rw-r--r-- 1 root root 11343 Apr 16 2017 LICENSE -rw-r--r-- 1 root root 220 Apr 16 2017 Procfile -rw-r--r-- 1 root root 4538 Apr 16 2017 README.md -rw-r--r-- 1 root root 9411 Apr 16 2017 actioning.py drwxr-xr-x 3 root root 4096 Apr 16 2017 config drwxr-xr-x 2 root root 4096 Apr 16 2017 core -rw-r--r-- 1 root root 7842 Apr 16 2017 discovery.py -rw-r--r-- 1 root root 8609 Apr 16 2017 monitoring.py -rw-r--r-- 1 root root 114 Apr 16 2017 requirements.txt -rw-r--r-- 1 root root 5992 Apr 16 2017 runbooks.py
With the !README.md
entry added, we can now see our README.md
was included but not our CONTRIBUTING.md
. This means our instruction to omit all markdown files (*.md
) was applied to all except the README.md
.
Summary
In this article, we covered how to leverage the .dockerignore
file to exclude unnecessary files and directories from the container build. As we found out, the usage of the .dockerignore
file is very simple. Do you have any .dockerignore
tips or tricks? Add it to the comments or tweet it to us.