As you already may know, Docker doesn’t automatically remove images. An image composes of layers that are cached and they are reused if they are not updated between the updates. If you update your images or download new images from Docker Hub Docker creates a new cache for those layers and disk usage is growing. We modify something in our code and then create a new image during our development. It means that our development pc’s disk gets full easily.
Download my source code first from here
This is one of Docker learning series posts. If you want to learn Docker deeply, I highly recommend Learn Docker in a month of lunches.
- Start Docker from scratch
- Docker volume
- Bind host directory to Docker container for dev-env
- Communication with other Docker containers
- Run multi Docker containers with compose file
- Container’s dependency check and health check
- Override Docker compose file to have different environments
- Creating a cluster with Docker swarm and handling secrets
- Update and rollback without downtime in swarm mode
- Container optimization
- Visualizing log info with Fluentd, Elasticsearch and Kibana
How to check the disk usage Docker uses
Let’s check the disk usage first. Following is my result.
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 45 4 3.914GB 3.885GB (99%)
Containers 5 0 844B 844B (100%)
Local Volumes 14 0 48.49MB 48.49MB (100%)
Build Cache 0 0 0B 0B
It shows that there are 45 images in my pc but it includes obsoleted images because I updated the same image multiple times. When we update our images Docker change the repository name to <none>
if the tag name is the same.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
poke-app v2 457c8be9a52d 8 days ago 960MB
poke-app v1 531c51873cb5 8 days ago 960MB
<none> <none> 91c7e42ff112 9 days ago 960MB
<none> <none> 875fff8d24a5 9 days ago 960MB
<none> <none> ae60a82f9ad0 9 days ago 960MB
I recommend to run docker system prune
regularly to remove unnecessary data. If you run the command you will see following result. When you enter y
Docker starts deleting data. I didn’t copy the result here.
$ docker system prune
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all dangling images
- all dangling build cache
Are you sure you want to continue? [y/N] y
When I see the disk usage again it shows number of images is now 25 and unused containers were removed.
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 25 0 3.506GB 3.506GB (100%)
Containers 0 0 0B 0B
Local Volumes 14 0 48.49MB 48.49MB (100%)
Build Cache 0 0 0B 0B
Optimize Dockerfile
In Dockerfile we use COPY command to copy necessary files into a container. Do you think following Docker files creates exactly the same image? There are three files in src directory, app.js, big-image.jpg and README.md.
# Dockerfile.v1
FROM alpine:latest
COPY ./src /src/
# Dockerfile.v2
FROM alpine:latest
COPY ./src /src/
WORKDIR /src
RUN rm README.md big-image.jpg
# Dockerfile.v3
FROM alpine:latest
COPY ./src/app.js /src/
The result is following. The first two images are the same size but third is not. Even though v2 removes big-image.jpg file the disk size is the same because Docker stores the data in a layer. Since the image composes of those layers the total size doesn’t change even if the file is removed on the last layer.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
opt v2 5428ba82c105 8 seconds ago 8.63MB
opt v1 19177113b2c0 12 seconds ago 8.63MB
opt v3 a0297eb93b4a 6 minutes ago 5.57MB
We learnt how to reduce the disk usage from the simple example but we don’t want to specify all necessary files in a COPY command. So let’s place .dockerignore
file in the directory where the Dockerfile exists. The format of the file is the same as .gitignore
and it excludes specified file.
Choose proper base image
In my blog post I use two nodejs images. One is for development and another is for production. As you can see following result, the disk size is really different. yutona/nodejs-dev
image includes lots of tools for development like npm
but those tools are actually not necessary in the production container. Choosing right image is not only to reduce the disk size but also security risk. The more unnecessary tools exists in a container, the more vulnerable the container is. We should choose right Docker image to reduce both of them. I don’t write about vulnerability check software in this post but I will try to use it in the future. For example, Anchore, Clair or Aqua… If we can find some vulnerabilities in our base image that we can’t overlook we should skip the update. It means that we should use a base image with fixed version in our Dockerfile in order not to update it automatically. We should decide when to update the image by ourselves.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
yutona/nodejs latest 975fff6bc723 About an hour ago 203MB
yutona/nodejs-dev latest ee0c97e21dc0 3 weeks ago 960MB
Minimize layer size
As mentioned above, Docker generates cache for each command. Even if we download compressed file, unpack it and remove some files in subsequent commands the total disk usage doesn’t change because the subsequent layer just hides the deleted files. However, if the all commands written on a command layer size is reduced.
Multi stage build
Putting multiple commands into one command works but we can also use multi stage build for optimization. The Dockerfile looks like this below. It’s really small example but there are several artifacts in prepare stage in real. The final image created by this Dockerfile is the same as Dockerfile.v3 listed above.
FROM alpine:latest as prepare
COPY ./src /src/
FROM alpine:latest
COPY --from=prepare /src/app.js /src/
Let’s create two images from this Dockerfile and compare the disk size.
# Create an image from prepare stage
$ docker image build -t opt:multi-stage-prepare -f Dockerfile-multi-stage --target prepare .
# Create the final image
$ docker image build -t opt:multi-stage -f Dockerfile-multi-stage .
$ docker images -f reference=opt
REPOSITORY TAG IMAGE ID CREATED SIZE
opt v2 107dcabe64c5 38 hours ago 8.63MB
opt multi-stage-prepare 19177113b2c0 38 hours ago 8.63MB
opt v1 19177113b2c0 38 hours ago 8.63MB
opt multi-stage a0297eb93b4a 38 hours ago 5.57MB
opt v3 a0297eb93b4a 38 hours ago 5.57MB
As you can see the result here, the disk size of the prepare stage is the same as v1/v2 but final version is less than those images. Interesting thing is that the image ID is exactly the same. This multi stage build is very good way because it looks simple and easy to maintain. In addition to that, Docker generate cache for each command which means that our build process becomes faster because cache can be used for some of layers.
Conclusion
Image size optimization matters when the container runs with limited resources. If we can reuse cache as much as possible it can reduce development time as well. We frequently build our image during our development and we don’t want to wait for a long time if possible. Whether we can use cache depends on the command order in the Dockerfile so we should consider how we can optimize it.
Comments