How to reduce Docker image size for IoT devices
2024-01-15
IoT devices sometimes have too little resources to pull and run heavyweight Docker images. In this article we show how to reduce the size by 36-91% using patchelf
and strace
tools without recompiling containerized applications. We also show how to build minimal images for your own Rust, Go, C/C++ applications.
Table of contents
Why reduce Docker image size?
Docker image size and the number of layers affects how much memory and disk space a device needs to pull and unpack the image. Devices like Raspberry Pi Zero have too little resources to pull and unpack e.g. Home Assistant image, however, have more than enough resources to actually run it. Reducing the size improves Docker performance in such use cases. Also including only the files that are actually used by the application helps reduce the attack surface. This benefit goes beyond just IoT devices and is applicable to servers as well.
It is easy to reduce the image size of the containerized applications that you developed yourself: just compile the static binary and include only this file in the final image. However, there are several approaches for third-party applications that do not require recompilation. Last but not the least there are approaches to reduce image size of containerized scripts.
Patchelf
If the application in question is compiled into ELF binary (usually this is the case for C, C++, Fortran, Rust, Go etc.), you can use patchelf
tool to find all the libraries that the application uses and copy them into the final image.
ELF — executable and linkable format — specifies program interpreter path (e.g. /lib64/ld-linux-x86-64.so.2
on x86_64 platform) and runtime search path
abbreviated as rpath (e.g. /lib64
) among a multitude of other metadata.
Program interpreter is used to dynamically load ELF file and all its dependencies (libraries) into the memory and execute it. On Linux you can do this manually: /lib64/ld-linux-x86-64.so.2
/bin/sh
is a «shortcut» for just /bin/sh
.
Runtime search path (rpath) is used by the program interpreter to find the dependencies. On most Linux distributions (Guix and Nix are the only exceptions that I know) this path is empty and the interpreter searches for dependencies in hard-coded paths (e.g. /
lib64
).
We use patchelf
tool to modify the interpreter and rpath and readelf
to inspect ELF file. Another useful tool is ldd
. It shows both the interpreter and all the dependencies.
# Debian
$ readelf --headers /bin/sh | grep -A2 INTERP
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
$ readelf --dynamic /bin/sh | grep RUNPATH
$ patchelf --set-interpreter /lib/ld-linux-x86-64.so.2 --set-rpath /lib /path/to/some/elf/binary
$ ldd /bin/sh
linux-vdso.so.1 (0x00007ffce0f91000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fedf9b66000)
/lib64/ld-linux-x86-64.so.2 (0x00007fedf9d6d000)
As you can see from the output rpath is empty on Debian and /bin/sh
depends only on libc
. The output of the same commands on Guix is quite different. This is just an example, we will not dive into why Guix uses non-empty rpath.
# Guix
$ readelf --headers /bin/sh | grep -A2 INTERP
INTERP 0x0000000000000318 0x0000000000400318 0x0000000000400318
0x0000000000000050 0x0000000000000050 R 0x1
[Requesting program interpreter: /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2]
$ readelf --dynamic /bin/sh | grep RUNPATH
0x000000000000001d (RUNPATH) Library runpath: [/gnu/store/lxfc2a05ysi7vlaq0m3w5wsfsy0drdlw-readline-8.1.2/lib:/gnu/store/bcc053jvsbspdjr17gnnd9dg85b3a0gy-ncurses-6.2.20210619/lib:/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib:/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib:/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/gcc/x86_64-unknown-linux-gnu/11.3.0/../../..]
$ ldd /bin/sh
linux-vdso.so.1 (0x00007ffe777f6000)
libreadline.so.8 => /gnu/store/lxfc2a05ysi7vlaq0m3w5wsfsy0drdlw-readline-8.1.2/lib/libreadline.so.8 (0x00007efca9070000)
libhistory.so.8 => /gnu/store/lxfc2a05ysi7vlaq0m3w5wsfsy0drdlw-readline-8.1.2/lib/libhistory.so.8 (0x00007efca9063000)
libncursesw.so.6 => /gnu/store/bcc053jvsbspdjr17gnnd9dg85b3a0gy-ncurses-6.2.20210619/lib/libncursesw.so.6 (0x00007efca8ff1000)
libgcc_s.so.1 => /gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1 (0x00007efca8fd7000)
libc.so.6 => /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6 (0x00007efca8dd9000)
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2 (0x00007efca90c9000)
Motivating example: Stubby
Let's use patchelf
to reduce the Docker image size for Stubby — a name resolver that supports DNS-over-TLS. We will use Debian as the base image but the process is specific neither to this Linux distribution nor to this application.
First we write a Dockerfile that installs Stubby and the required packages from Debian repositories on the first stage, and on the second stage copies only the necessary files into the final image created from scratch.
# Dockerfile
FROM debian:latest AS builder
# install stubby and patchelf
RUN apt-get update && apt-get install -y stubby ca-certificates patchelf
# copy and run patchelf script
COPY patchelf.sh /tmp/patchelf.sh
RUN /tmp/patchelf.sh
# create the final image from scratch (i.e. without the base image)
FROM scratch
# copy only the /out directory that contains the files that are actually used by stubby
COPY --from=builder /out /
EXPOSE 53/udp
EXPOSE 53/tcp
CMD ["/bin/stubby"]
Second we write patchelf
script that determines which files need to be copied. The script copies all the dependencies, the program interpreter, the binary itself and the configuration file, and finally the OpenSSL library's configuration files and the list of trusted SSL certificates.
#!/bin/sh
set -ex
mkdir -p /out/lib /out/bin /out/etc /out/var/cache/stubby /out/var/run /out/usr/lib
# copy the libraries that stubby uses
ldd /usr/bin/stubby |
sed -rne 's/.*=> (.*) \(.*\)$/\1/p' |
while read -r path; do
cp "$path" /out/lib
done
# copy the interpreter
cp /lib64/ld-linux-x86-64.so.2 /out/lib
# copy stubby and its configuration file
cp /usr/bin/stubby /out/bin/stubby
# make stubby listen on all addresses to access it from outside the container
sed -i 's/127\.0\.0\.1/0.0.0.0/g' /etc/stubby/stubby.yml
cp -r /etc/stubby /out/etc/stubby
# copy openssl library configuration and certificates
cp -r /etc/ssl /out/etc/ssl
cp -r /usr/lib/ssl /out/usr/lib/ssl
find /out/etc/ssl/certs -not -type d -not -name ca-certificates.crt -delete
rm -rf /out/usr/lib/ssl/misc
# patch stubby binary to use the copied interpreter and libraries
patchelf --set-interpreter /lib/ld-linux-x86-64.so.2 --set-rpath /lib /out/bin/stubby
ldd /out/bin/stubby
find /out
# check that stubby works
chroot /out /bin/stubby -V
Now we build the image and check that it runs correctly.
$ docker build --tag stubby:debian-patchelf .
$ docker inspect docker inspect -f "{{ .Size }}" stubby:debian-patchelf
13120030
$ docker run --init --rm --publish 53:53/udp stubby:debian-patchelf stubby -l
# in the other terminal window
$ dig @127.0.0.1 +short google.com
142.251.220.206
Results
We compare the resulting image size using docker inspect
command. The competing images are Debian-based and Alpine-based images created without patchelf
script.
Image | Size, MiB | Comment |
---|---|---|
stubby:debian-patchelf | 12.5 | 9% of stubby:debian |
stubby:debian | 143.4 | |
stubby:alpine-patchelf | 9.0 | 64% of stubby:alpine |
stubby:alpine | 14.1 |
The results speak for themselves. We reduced the size of Debian-based Stubby image by 91% and Alpine-based Stubby image by 36% by including only the files that Stubby actually uses. Impressive.
Limitations
Patchelf fully automates copying dependencies and program interpreter, however, any other files need to be copied manually. Also, if your program is not compiled to ELF binary (e.g. NodeJS, Python) then you're out of luck. This is where strace
can help.
Strace
This tool intercepts system calls the binary makes and prints their arguments. Strace uses the same kernel API as debuggers and may considerably slow down the traced program. Luckily we will use this tool only on the Docker image build stage.
Motivating example: Home Assistant
This is the image that I failed to install on Raspberry Pi Zero while using the official Docker image. When you pull this image Docker downloads the many layers in parallel and then fails to extract them due to a lack of disk space. I had to temporarily attach an external USB drive and move /var/lib/docker
directory there, then pull the image and move the directory back to the Raspberry Pi to successfully pull and run this image.
Now we create a new Docker image for Home Assistant that has only one layer and consumes only a fraction of disk space of the original image.
First we create Dockerfile with the official image as the base.
# Dockerfile
FROM ghcr.io/home-assistant/home-assistant:stable AS builder
RUN apk update && apk add strace
COPY strace.sh /tmp/strace.sh
RUN /tmp/strace.sh
FROM scratch
COPY --from=builder /out /
# default Home Assistant port
EXPOSE 8123/tcp
# default Home Assistant command
CMD ["/usr/local/bin/python3", "-m", "homeassistant", "--config", "/config"]
Then we write strace
script that finds all the files accessed by Home Assistant and copies them into the final image.
#!/bin/sh
set -ex
mkdir -p /out/lib /out/usr/local/bin /out/usr/bin /out/usr/local/lib
# copy ffmpeg and its dependencies
ldd /usr/bin/ffmpeg |
sed -rne 's/.*=> (.*) \(.*\)$/\1/p' |
while read -r path; do
cp "$path" /out/lib
done
cp /lib/ld-musl-x86_64.so.1 /out/lib
cp /usr/local/bin/python3 /out/usr/local/bin/python3
cp /usr/bin/ffmpeg /out/usr/bin/ffmpeg
# copy frontend files manually
mkdir -p /out/usr/local/lib/python3.11/site-packages
cp -r /usr/local/lib/python3.11/site-packages/hass_frontend /out/usr/local/lib/python3.11/site-packages/hass_frontend
# copy all the files that home assistant actually opens
strace -f -e open,stat,lstat timeout 30s python3 -m homeassistant --config /config 2>&1 |
sed -rne 's/.*(open|stat)\(.*"([^"]+)".*/\2/p' |
grep -vE '^/(dev|proc|sys|tmp)' |
sort -u |
while read -r path; do
if ! test -e "$path"; then
continue
fi
if test -d "$path"; then
# create directories
mkdir -p /out/"$path"
else
# copy files
mkdir -p /out/"$(dirname "$path")"
cp -n "$path" /out/"$path" 2>/dev/null || true
fi
done
# recreate config directory
rm -rf /out/config
mkdir /out/config
Now we build the image and check that it runs correctly.
$ docker build --tag home-assistant:strace .
$ docker run --rm --publish=8123:8123/tcp home-assistant:strace \
python3 -m homeassistant --config /config
# now open https://127.0.0.1:8123/ in the browser
Results
We compare the resulting image size using docker inspect
command to the original image.
Image | Size, MiB | Size, % |
---|---|---|
home-assistant:strace | 590 | 31 |
ghcr.io/home-assistant/home-assistant:stable | 1886 | 100 |
We were able to reduce the image size by 69%. Most importantly now Raspberry Pi Zero can pull and run the new image without hitting the disk space limit.
Limitations
The obvious limitation of strace
is that frontend files are not copied automatically because they are read only if an HTTP request is made. Of course we can do some HTTP requests with curl
but usually all frontend files are needed. It is much easier to just copy them all to the final image.
Your own images
Dealing with your own Docker images is much easier than with third-party ones. You either compile your program into a static binary or compile to a dynamically linked binary and use patchelf
tool to copy the dependencies and the interpreter. In this section we show how to compile static binaries for Rust, Go and C/C++. The general approach is to use musl library and the accompanying musl-gcc
tool to build your project, but some languages make it simpler.
Rust static binaries
In order to use musl library in your project you need to install musl-based toolchain and then compile for the corresponding target.
$ rustup toolchain add stable --target x86_64-unknown-linux-musl
# here we remove debugging information and optimize for size
$ env RUSTFLAGS='-Copt-level=z -Cstrip=symbols' \
cargo build --release --target x86_64-unknown-linux-musl
Now we build Docker image that includes only the resulting binary file.
FROM scratch
COPY target/x86_64-unknown-linux-musl/release/app /bin/app
CMD ["/bin/app"]
As you can see the resulting image contains only the binary but not the dependencies. This means that Docker is merely a distribution format for static binaries.
Go static binaries
Go does not use musl, but contains its own static implementation of libc. This makes compiling static binaries even simpler.
$ env CGO_ENABLED=0 go build -ldflags '-s -w' -o app ./cmd/app
Now we build Docker image similar to Rust static binary.
FROM scratch
COPY app /bin/app
CMD ["/bin/app"]
C/C++ static binaries
The idea here is to replace C/C++ compiler with musl-gcc
and enable static compilation in GCC via -static
linker flag. All the dependencies have to be recompiled this way as well. This makes the approach especially problematic for dependencies that prefer dynamic linking for whatever reason (e.g. use features of GNU libc that does not support static linking; dynamically load other libraries; use sophisticated build instructions that make it impossible for a mere human to modify to enable static linking). That's why the general approach for C/C++ binaries is to use patchelf
.
The following snippet shows how to compile static binary for a cmake
-based project.
$ cat > CMakeLists.txt << 'EOF'
project (HelloWorld)
add_executable (app app.c)
EOF
$ cat > app.c << 'EOF'
#include <stdio.h>
int main() {
printf("Hello world\n");
return 0;
}
EOF
$ mkdir build-musl
$ cd build-musl
$ env CC=musl-gcc LDFLAGS='-static' cmake -DCMAKE_BUILD_TYPE=Release ..
$ make
[ 50%] Building C object CMakeFiles/app.dir/app.c.o
[100%] Linking C executable app
[100%] Built target app
$ ldd ./app
not a dynamic executable
Conclusion
There are multiple approaches that help reduce Docker image size:
- including only the required dependencies with
patchelf
, - including only the required files with
strace
, - compiling your own program into a static binary that includes all the dependencies.
On average you can reduce the image size by approximately 50% (at least in our experiments). The smaller size improves Docker performance on resource constrained devices such as Raspberry Pi Zero. However, the major benefit for any platform is the fact that the attack surface is much smaller when your image does not contain tools like wget
, curl
and shell interpreters.
Staex is a secure public network for IoT devices that can not run a VPN such as smart meters, IP cameras, and EV chargers. Staex encrypts legacy protocols, reduces mobile data usage, and simplifies building networks with complex topologies through its unique multi-hop architecture. Staex is fully zero-trust meaning that no traffic is allowed unless specified by the device owner which makes it more secure than even some private networks. With this, Staex creates an additional separation layer to provide more security for IoT devices on the Internet, also protecting other Internet services from DDoS attacks that are usually executed on millions of IoT machines.
To stay up to date subscribe to our newsletter, follow us on LinkedIn and Twitter for updates and subscribe to our YouTube channel.
See also
Cijail: How to protect your CI/CD pipelines from supply chain attacks?
2024-06-02
In this article we introduce Cijail tool that protects your CI/CD pipelines from supply chain attacks.
WebRTC security: Are truly decentralized and private calls possible?
2024-02-04
WebRTC emerged as a technology for peer-to-peer calls, chats and file sharing for browsers, but is it fully peer-to-peer and private?
How to build and test your OpenWRT packages with Docker
2024-01-25
How to use Docker and QEMU to streamline the process of building and testing software packages for OpenWRT.