How to reduce Docker image size for IoT devices

2024-01-15

IoT devices sometimes have too little resources to pull and run heavyweight Docker images. In this article we show how to reduce the size by 36-91% using patchelf and strace tools without recompiling containerized applications. We also show how to build minimal images for your own Rust, Go, C/C++ applications.

«Underwater sponge oil painting» by DALL-E.

Why reduce Docker image size?
Patchelf.
Strace.
You own images.
Conclusion.

Why reduce Docker image size?

Docker image size and the number of layers affects how much memory and disk space a device needs to pull and unpack the image. Devices like Raspberry Pi Zero have too little resources to pull and unpack e.g. Home Assistant image, however, have more than enough resources to actually run it. Reducing the size improves Docker performance in such use cases. Also including only the files that are actually used by the application helps reduce the attack surface. This benefit goes beyond just IoT devices and is applicable to servers as well.

It is easy to reduce the image size of the containerized applications that you developed yourself: just compile the static binary and include only this file in the final image. However, there are several approaches for third-party applications that do not require recompilation. Last but not the least there are approaches to reduce image size of containerized scripts.

Patchelf

If the application in question is compiled into ELF binary (usually this is the case for C, C++, Fortran, Rust, Go etc.), you can use patchelf tool to find all the libraries that the application uses and copy them into the final image.

ELF — executable and linkable format — specifies program interpreter path (e.g. /lib64/ld-linux-x86-64.so.2 on x86_64 platform) and runtime search path abbreviated as rpath (e.g. /lib64) among a multitude of other metadata.

Program interpreter is used to dynamically load ELF file and all its dependencies (libraries) into the memory and execute it. On Linux you can do this manually: /lib64/ld-linux-x86-64.so.2 /bin/sh is a «shortcut» for just /bin/sh.

Runtime search path (rpath) is used by the program interpreter to find the dependencies. On most Linux distributions (Guix and Nix are the only exceptions that I know) this path is empty and the interpreter searches for dependencies in hard-coded paths (e.g. /lib64).

We use patchelf tool to modify the interpreter and rpath and readelf to inspect ELF file. Another useful tool is ldd. It shows both the interpreter and all the dependencies.

# Debian
$ readelf --headers /bin/sh | grep -A2 INTERP
  INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
$ readelf --dynamic /bin/sh | grep RUNPATH
$ patchelf --set-interpreter /lib/ld-linux-x86-64.so.2 --set-rpath /lib /path/to/some/elf/binary
$ ldd /bin/sh
        linux-vdso.so.1 (0x00007ffce0f91000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fedf9b66000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fedf9d6d000)

As you can see from the output rpath is empty on Debian and /bin/sh depends only on libc. The output of the same commands on Guix is quite different. This is just an example, we will not dive into why Guix uses non-empty rpath.

# Guix
$ readelf --headers /bin/sh | grep -A2 INTERP
  INTERP         0x0000000000000318 0x0000000000400318 0x0000000000400318
                 0x0000000000000050 0x0000000000000050  R      0x1
      [Requesting program interpreter: /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2]
$ readelf --dynamic /bin/sh | grep RUNPATH
 0x000000000000001d (RUNPATH)            Library runpath: [/gnu/store/lxfc2a05ysi7vlaq0m3w5wsfsy0drdlw-readline-8.1.2/lib:/gnu/store/bcc053jvsbspdjr17gnnd9dg85b3a0gy-ncurses-6.2.20210619/lib:/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib:/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib:/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/gcc/x86_64-unknown-linux-gnu/11.3.0/../../..]
$ ldd /bin/sh
        linux-vdso.so.1 (0x00007ffe777f6000)
        libreadline.so.8 => /gnu/store/lxfc2a05ysi7vlaq0m3w5wsfsy0drdlw-readline-8.1.2/lib/libreadline.so.8 (0x00007efca9070000)
        libhistory.so.8 => /gnu/store/lxfc2a05ysi7vlaq0m3w5wsfsy0drdlw-readline-8.1.2/lib/libhistory.so.8 (0x00007efca9063000)
        libncursesw.so.6 => /gnu/store/bcc053jvsbspdjr17gnnd9dg85b3a0gy-ncurses-6.2.20210619/lib/libncursesw.so.6 (0x00007efca8ff1000)
        libgcc_s.so.1 => /gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1 (0x00007efca8fd7000)
        libc.so.6 => /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6 (0x00007efca8dd9000)
        /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2 (0x00007efca90c9000)

Motivating example: Stubby

Let's use patchelf to reduce the Docker image size for Stubby — a name resolver that supports DNS-over-TLS. We will use Debian as the base image but the process is specific neither to this Linux distribution nor to this application.

First we write a Dockerfile that installs Stubby and the required packages from Debian repositories on the first stage, and on the second stage copies only the necessary files into the final image created from scratch.

# Dockerfile
FROM debian:latest AS builder

# install stubby and patchelf
RUN apt-get update && apt-get install -y stubby ca-certificates patchelf

# copy and run patchelf script
COPY patchelf.sh /tmp/patchelf.sh
RUN /tmp/patchelf.sh

# create the final image from scratch (i.e. without the base image)
FROM scratch

# copy only the /out directory that contains the files that are actually used by stubby
COPY --from=builder /out /

EXPOSE 53/udp
EXPOSE 53/tcp

CMD ["/bin/stubby"]

Second we write patchelf script that determines which files need to be copied. The script copies all the dependencies, the program interpreter, the binary itself and the configuration file, and finally the OpenSSL library's configuration files and the list of trusted SSL certificates.

#!/bin/sh
set -ex
mkdir -p /out/lib /out/bin /out/etc /out/var/cache/stubby /out/var/run /out/usr/lib
# copy the libraries that stubby uses
ldd /usr/bin/stubby |
    sed -rne 's/.*=> (.*) \(.*\)$/\1/p' |
    while read -r path; do
        cp "$path" /out/lib
    done
# copy the interpreter
cp /lib64/ld-linux-x86-64.so.2 /out/lib
# copy stubby and its configuration file
cp /usr/bin/stubby /out/bin/stubby
# make stubby listen on all addresses to access it from outside the container
sed -i 's/127\.0\.0\.1/0.0.0.0/g' /etc/stubby/stubby.yml
cp -r /etc/stubby /out/etc/stubby
# copy openssl library configuration and certificates
cp -r /etc/ssl /out/etc/ssl
cp -r /usr/lib/ssl /out/usr/lib/ssl
find /out/etc/ssl/certs -not -type d -not -name ca-certificates.crt -delete
rm -rf /out/usr/lib/ssl/misc
# patch stubby binary to use the copied interpreter and libraries
patchelf --set-interpreter /lib/ld-linux-x86-64.so.2 --set-rpath /lib /out/bin/stubby
ldd /out/bin/stubby
find /out
# check that stubby works
chroot /out /bin/stubby -V

Now we build the image and check that it runs correctly.

$ docker build --tag stubby:debian-patchelf .
$ docker inspect docker inspect -f "{{ .Size }}" stubby:debian-patchelf
13120030
$ docker run --init --rm --publish 53:53/udp stubby:debian-patchelf stubby -l
# in the other terminal window
$ dig @127.0.0.1 +short google.com
142.251.220.206

Results

We compare the resulting image size using docker inspect command. The competing images are Debian-based and Alpine-based images created without patchelf script.

Image	Size, MiB	Comment
`stubby:debian-patchelf`	12.5	9% of `stubby:debian`
`stubby:debian`	143.4
`stubby:alpine-patchelf`	9.0	64% of stubby:alpine
`stubby:alpine`	14.1

The results speak for themselves. We reduced the size of Debian-based Stubby image by 91% and Alpine-based Stubby image by 36% by including only the files that Stubby actually uses. Impressive.

Limitations

Patchelf fully automates copying dependencies and program interpreter, however, any other files need to be copied manually. Also, if your program is not compiled to ELF binary (e.g. NodeJS, Python) then you're out of luck. This is where strace can help.

Strace

This tool intercepts system calls the binary makes and prints their arguments. Strace uses the same kernel API as debuggers and may considerably slow down the traced program. Luckily we will use this tool only on the Docker image build stage.

Motivating example: Home Assistant

This is the image that I failed to install on Raspberry Pi Zero while using the official Docker image. When you pull this image Docker downloads the many layers in parallel and then fails to extract them due to a lack of disk space. I had to temporarily attach an external USB drive and move /var/lib/docker directory there, then pull the image and move the directory back to the Raspberry Pi to successfully pull and run this image.

Now we create a new Docker image for Home Assistant that has only one layer and consumes only a fraction of disk space of the original image.

First we create Dockerfile with the official image as the base.

# Dockerfile
FROM ghcr.io/home-assistant/home-assistant:stable AS builder

RUN apk update && apk add strace

COPY strace.sh /tmp/strace.sh
RUN /tmp/strace.sh

FROM scratch

COPY --from=builder /out /

# default Home Assistant port
EXPOSE 8123/tcp

# default Home Assistant command
CMD ["/usr/local/bin/python3", "-m", "homeassistant", "--config", "/config"]

Then we write strace script that finds all the files accessed by Home Assistant and copies them into the final image.

#!/bin/sh
set -ex
mkdir -p /out/lib /out/usr/local/bin /out/usr/bin /out/usr/local/lib
# copy ffmpeg and its dependencies
ldd /usr/bin/ffmpeg |
    sed -rne 's/.*=> (.*) \(.*\)$/\1/p' |
    while read -r path; do
        cp "$path" /out/lib
    done
cp /lib/ld-musl-x86_64.so.1 /out/lib
cp /usr/local/bin/python3 /out/usr/local/bin/python3
cp /usr/bin/ffmpeg /out/usr/bin/ffmpeg
# copy frontend files manually
mkdir -p /out/usr/local/lib/python3.11/site-packages
cp -r /usr/local/lib/python3.11/site-packages/hass_frontend /out/usr/local/lib/python3.11/site-packages/hass_frontend
# copy all the files that home assistant actually opens
strace -f -e open,stat,lstat timeout 30s python3 -m homeassistant --config /config 2>&1 |
    sed -rne 's/.*(open|stat)\(.*"([^"]+)".*/\2/p' |
    grep -vE '^/(dev|proc|sys|tmp)' |
    sort -u |
    while read -r path; do
        if ! test -e "$path"; then
            continue
        fi
        if test -d "$path"; then
            # create directories
            mkdir -p /out/"$path"
        else
            # copy files
            mkdir -p /out/"$(dirname "$path")"
            cp -n "$path" /out/"$path" 2>/dev/null || true
        fi
    done
# recreate config directory
rm -rf /out/config
mkdir /out/config

Now we build the image and check that it runs correctly.

$ docker build --tag home-assistant:strace .
$ docker run --rm --publish=8123:8123/tcp home-assistant:strace \
    python3 -m homeassistant --config /config
# now open https://127.0.0.1:8123/ in the browser

Results

We compare the resulting image size using docker inspect command to the original image.

Image	Size, MiB	Size, %
`home-assistant:strace`	590	31
`ghcr.io/home-assistant/home-assistant:stable`	1886	100

We were able to reduce the image size by 69%. Most importantly now Raspberry Pi Zero can pull and run the new image without hitting the disk space limit.

Limitations

The obvious limitation of strace is that frontend files are not copied automatically because they are read only if an HTTP request is made. Of course we can do some HTTP requests with curl but usually all frontend files are needed. It is much easier to just copy them all to the final image.

Your own images

Dealing with your own Docker images is much easier than with third-party ones. You either compile your program into a static binary or compile to a dynamically linked binary and use patchelf tool to copy the dependencies and the interpreter. In this section we show how to compile static binaries for Rust, Go and C/C++. The general approach is to use musl library and the accompanying musl-gcc tool to build your project, but some languages make it simpler.

Rust static binaries

In order to use musl library in your project you need to install musl-based toolchain and then compile for the corresponding target.

$ rustup toolchain add stable --target x86_64-unknown-linux-musl 
# here we remove debugging information and optimize for size
$ env RUSTFLAGS='-Copt-level=z -Cstrip=symbols' \
    cargo build --release --target x86_64-unknown-linux-musl

Now we build Docker image that includes only the resulting binary file.

FROM scratch
COPY target/x86_64-unknown-linux-musl/release/app /bin/app
CMD ["/bin/app"]

As you can see the resulting image contains only the binary but not the dependencies. This means that Docker is merely a distribution format for static binaries.

Go static binaries

Go does not use musl, but contains its own static implementation of libc. This makes compiling static binaries even simpler.

$ env CGO_ENABLED=0 go build -ldflags '-s -w' -o app ./cmd/app

Now we build Docker image similar to Rust static binary.

FROM scratch
COPY app /bin/app
CMD ["/bin/app"]

C/C++ static binaries

The idea here is to replace C/C++ compiler with musl-gcc and enable static compilation in GCC via -static linker flag. All the dependencies have to be recompiled this way as well. This makes the approach especially problematic for dependencies that prefer dynamic linking for whatever reason (e.g. use features of GNU libc that does not support static linking; dynamically load other libraries; use sophisticated build instructions that make it impossible for a mere human to modify to enable static linking). That's why the general approach for C/C++ binaries is to use patchelf.

The following snippet shows how to compile static binary for a cmake-based project.

$ cat > CMakeLists.txt << 'EOF'
project (HelloWorld)
add_executable (app app.c)
EOF

$ cat > app.c << 'EOF'
#include <stdio.h>
int main() {
    printf("Hello world\n");
    return 0;
}
EOF

$ mkdir build-musl
$ cd build-musl
$ env CC=musl-gcc LDFLAGS='-static' cmake -DCMAKE_BUILD_TYPE=Release ..
$ make
[ 50%] Building C object CMakeFiles/app.dir/app.c.o
[100%] Linking C executable app
[100%] Built target app
$ ldd ./app
        not a dynamic executable

Conclusion

There are multiple approaches that help reduce Docker image size:

including only the required dependencies with patchelf,
including only the required files with strace,
compiling your own program into a static binary that includes all the dependencies.

On average you can reduce the image size by approximately 50% (at least in our experiments). The smaller size improves Docker performance on resource constrained devices such as Raspberry Pi Zero. However, the major benefit for any platform is the fact that the attack surface is much smaller when your image does not contain tools like wget, curl and shell interpreters.

Staex is a secure public network for IoT devices that can not run a VPN such as smart meters, IP cameras, and EV chargers. Staex encrypts legacy protocols, reduces mobile data usage, and simplifies building networks with complex topologies through its unique multi-hop architecture. Staex is fully zero-trust meaning that no traffic is allowed unless specified by the device owner which makes it more secure than even some private networks. With this, Staex creates an additional separation layer to provide more security for IoT devices on the Internet, also protecting other Internet services from DDoS attacks that are usually executed on millions of IoT machines.

To stay up to date subscribe to our newsletter, follow us on LinkedIn and Twitter for updates and subscribe to our YouTube channel.

How to reduce Docker image size for IoT devices

Table of contents

Why reduce Docker image size?

Patchelf

Motivating example: Stubby

Results

Limitations

Strace

Motivating example: Home Assistant

Results

Limitations

Your own images

Rust static binaries

Go static binaries

C/C++ static binaries

Conclusion

See also

Staex latest release features on-premise fleet management via web UI

Cijail: How to protect your CI/CD pipelines from supply chain attacks?

German public-private consortium steps into machine-to-machine economy