Basic concept of writing Dockerfile
The very first step to start writing a Dockerfile is defining the environment stack what we really need. It is pretty similar with how we put all necessary tools, packages and source code into a computer and then start running our application as we expect.
Therefore, we define a environment stack as left image shows. The base image is Ubuntu 18.04 Operating System, and then install Python 3.6 and Flask web framework, at the end install Apache http server.
That is a very general Python Web framework combination.
Image & layers
Let’s convert our Python Web Stack into Dockerfile as following image and save it with filename: Dockerfile.
docker build -t first_image . to build up first docker image. Once we build the image successfully, we could list all images with command
docker images , and the result should similar as following.
Next step, we could use command
docker history first_image to view all stacked layers of our first docker image.
Each command we wrote in the Dockerfile should be also available in the docker history in sequence. Each line represents the changes it made, and only stores difference in each image layer.
Every Docker command generates a specified image as cache, and once re-build from a Dockerfile, Docker looks for an existing image in its cache that it can reuse. Once the cache is invalidated or not exist, all subsequent Dockerfile commands will generate new image and the cache will not be not used.
Let’s re-build the previous Dockerfile, it shows “Using cache” for each layers.
Do a little bit change, add one more package: requests for pip3 install at Step3. Then observe the behavior of docker caching methodology.
Step 1/4 from the base image
Step 2/4 is using cache
Step 3/4 not using cache due to command changed
Step 4/4 not using cache due to previous step not cached
Example with interpreted language: Python
Following shows a intuitional version for python application. It can work perfectly as we expect, but it would not gain any benefit from docker caching methodology. The step: COPY . /python/workplace/ would always trigger rerun for all further steps if there is any change under this directory.
# base image
FROM python:3.6.10-alpine3.11WORKDIR /python/workplace/# copy source code
COPY . /python/workplace/# install dependencies
RUN pip install -r requirements.txtENTRYPOINT ["/usr/local/bin/python", "main.py"]
Follow the caching methodology, we fine tune the Dockerfile as following example. We copy the requirements.txt and install dependencies first, it would help us to prevent re-install all dependencies if there is only code change from our python code.
# base image
FROM python:3.6.10-alpine3.11WORKDIR /python/workplace/# copy python requirements and install dependencies
COPY ./requirements.txt /python/workplace/
RUN pip install -r requirements.txt# copy source code
COPY . /python/workplace/ENTRYPOINT ["/usr/local/bin/python", "main.py"]
Example with compiled language: Go
Different with interpreted language, compiled language does not need source code for execution, therefore we could setup a build environment which contains the necessary libraries and source codes for compiling the executable binary, and then copy it into another docker image to minimize the image size. Following example separates the Dockerfile as two parts, Build Image and Execution Image.
Build Image is in charge to compile the executable binary, it is reusable if there is no special changes.
Execution Image is the target image which we are willing to gain, and it does not waste any unnecessary disk size, and maximum the speed in invoke this kind of docker image.
# Build environment
FROM golang:1.14.2-alpine3.11 AS build# copy source file
WORKDIR /go/src/project/COPY ./src /go/src/project/# build executable file
RUN go build -o /bin/go-app# =============================================# This result without source code
FROM alpine:3.11COPY --from=build /bin/go-app /bin/go-appENTRYPOINT ["/bin/go-app"]