Undocking - Containers without Docker

Creating images locally

July 19, 2020 at 23:25 (UTC)

I have spent a few weeks figuring out how to change my container development environment from one that uses Docker to one that (almost) completely avoids it.¹

Why avoid Docker

It all started with Docker splitting into free and paid variants. While this made sense to me, it started me worrying about the future of Docker for the free and open community. This worry increased as the project further split, and renamed.

This was furthered as the Docker Desktop would ask for ratings and ask if I would recommend it. These are the sorts of intrusive, annoying, metric-for-money driven behaviors we’ve come to loathe in the app space on our phones. I was not happy about seeing them come to the developer tools I use.

I finally reached my breaking point when downloads of Docker Desktop were placed behind an email wall.² It wasn’t very difficult to circumvent the wall, but it was a meaningless nuisance seemingly driven by Docker starting to care more about the business than about the developer.

On top of that, Dan Walsh’s points about Docker being a bloated daemon are absolutely accurate. Docker is a massive, invasive install, and comes fraught with potential security implications.

Requirements

In order to have a successful migration, I wanted it to be something similar to how I intend to deploy containerized applications to production. I strongly believe that local development should closely mirror production (but not require resources outside the local machine).

I also didn’t want to have to install VM software, or run my own VMs that I would be running programs on for development. I prefer my developer environments to be native experiences.

This leads to a complete set of requirements:

Must

✅ Cross platform (Windows, macOS, and Linux)
✅ Not require installing Docker into the native OS
✅ Use native VM hypervisor if VMs are necessary
- ✅ Hyper-V and/or WSL 2 on Windows
- ✅ hypervisor.framework on macOS
- ❓ No VM in Linux
✅ Provide a means to run containers
✅ Provide a means for creating container images
✅ Be easy and consistent for developers to set up
✅ Be (reasonably) fast

Should

🚧 Not use Docker at all

Dan Walsh is an inspiration to us all

Dan Walsh of RedHat has been railing against “big, fat daemons” for a while. This has led to the creation of several good alternatives to Docker for various pieces of the pipeline.

One of the alternatives is the CRI-O runtime for Kubernetes, which makes Kubernetes the first piece of our new container stack. While not fantastic for one-off things, Kubernetes has become the go-to standard for deploying container-based applications. Kubernetes also supports hypervisor.framework via the hyperkit driver, Hyper-V via the hyperv driver, and native Linux via the (experimental) podman driver. Additionally, with Minikube, Kubernetes is easy to install locally for development. This meets several of our requirements, and gets us well on our way.

Unfortunately, Kubernetes does not provide a way to create images, so we need another tool. Dan Walsh helps us out again, with Buildah.

Getting started with Buildah

Buildah is only available for Linux, so we can’t use a local build process on all of our target architectures. Instead, we’ll need to run our build process inside of Kubernetes. But for now, we need to get familiar with Buildah and figure out how to make it build our images as quickly as we can.

Since the end goal is to run Buildah in Kubernetes, we don’t want to just run it locally, as that won’t replicate anything like the final environment. Instead, we want to run Buildah in a container. Fortunately, in addition to Buildah for building containers, there’s also Podman for running them.

Podman, like Buildah, is designed around Linux containers (at least for now). It requires a working Linux OS available, either in a VM , natively, or remote. To get started, I’m going to use WSL 2 (with Ubuntu 20.04, but it should work with any distribution, as well as Linux as host OS or in a VM). After installing Podman, we’re ready to run our first Buildah container.

$ podman run quay.io/buildah/stable:v1.14.8 echo "Hello world!"
Trying to pull quay.io/buildah/stable:v1.14.8...
Getting image source signatures
Copying blob d15499bd65d4 done  
Copying blob 5796af10c83e done  
Copying blob b70dbda2c312 done  
Copying blob 7c43a36ba5ed done  
Copying blob f40340f463b1 done  
Copying config 0e58e549e4 done  
Writing manifest to image destination
Storing signatures
Hello world!

Using a bash script to build

Buildah supports building from Containerfiles (Dockerfiles) using the bud command, but it also supports running individual commands to modify the image it’s building. After doing some reading, I chose to use the scripts to build images. It seems a lot less complicated, and more flexible. The only downsides seem to be the lack of portability back to Docker for builds and the loss of caching and layers. The lack of portability to Docker is not something we’re interested in, as we’re moving away from Docker. The loss of caching and layers is unfortunate, but is being worked on.

To test out our new workflow, we’re going to build a custom Nginx image. This will allow us to test the primary actions done during building an image: create the image, install software, and configure the software. We get started with a very simple script, which we save as buildah.sh

#!/bin/bash -ue

container=$(buildah from scratch)                     # 1
mount=$(buildah mount ${container})                   # 2

dnf --installroot ${mount} install --assumeyes nginx  # 3

buildah config --entrypoint '["nginx"]' ${container}  # 4
buildah config --cmd '-g "daemon off;"' ${container}  # 5

buildah config --port 80 ${container}                 # 6

buildah commit ${container} buildah/nginx:latest      # 7

First we create a container to serve as the basis for our new image. We make it from scratch to start with an empty filesystem. This allows us to make a minimally-sized image (1). Then we mount the container’s file system into our own, which will allow us to easily run commands on it (2). With the container’s file system mounted, we can run dnf install to install Nginx (3). We tell DNF to install to the container’s file system by setting the --installroot to the recorded mount point. Then we configure the entrypoint and command for the image (4 and 5). We expose port 80 so it can be forwarded easily when we run the resulting image (6). Finally, we commit the container to turn it into an image (7).

To run our build, we need to mount our build script³ into the Buildah container and run it. Using Podman, we run the build.

$ podman run -v ./buildah.sh:/buildah.sh buildah/stable:v1.14.8 /buildah.sh
mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: operation not permitted

That “operation not permitted” error came from trying to mount the container’s file system. A quick search will find us a blog post on using Buildah with Podman which tells us

Because both the container and the container within the container will be using fuse-overlayfs, they won’t be happy trying to mount their respective directories over each other. So, the first step is to create a directory for the container within the container to use, and I’ve named it /var/lib/mycontainer"
…
mounts the host’s mycontainer to the container’s containers directory

So let’s make that directory, but we’ll put it in ~/.local/share/containers,⁴ and then mount it into our Buildah container as /var/lib/containers.

$ mkdir -p ~/.local/share/containers
$ podman run -v ~/.local/share/containers:/var/lib/containers \
    -v ./buildah.sh:/buildah.sh buildah/stable:v1.14.8 /buildah.sh
mount /var/lib/containers/storage/overlay:/var/lib/containers/storage/overlay, flags: 0x1000: operation not permitted

Huh. We get the same error. If we reread our reference blog posts carefully, we find something that seems related.

One thing to remember is in rootless mode all commands have to be done in the user namespace of the user. You can enter the user namespace using the buildah unshare command. If you don’t do this, the buildah mount, command will fail. After entering the user namespace the user is allowed access to the containers root file system as a non-root user. To execute the script as a non root user, you can execute buildah unshare build_buildah_upstream.sh.

While the name of the user account we’re running as inside the container is root, we’re sharing a kernel with the host system, and the root inside the container is not the root account for the kernel. So, from the kernel’s perspective, we are running rootless. Let’s try using buildah unshare to run our script and see if that helps.

$ mkdir -p ~/.local/share/containers
$ podman run -v ~/.local/share/containers:/var/lib/containers \
    -v ./buildah.sh:/buildah.sh buildah/stable:v1.14.8 \
    buildah unshare /buildah.sh
level=error msg="error unmounting /var/lib/containers/storage/overlay/87a46046716b780de38e3b8346a6cbc1a15bb2b0f315738921e0dfdeb16aed94/merged: invalid argument"
error mounting "working-container-5" container "working-container-5": error mounting build container "29f3587b5f53fdf0d996b9bedf4b962bc96349cc5774972d156816db3e990ed1": error creating overlay mount to /var/lib/containers/storage/overlay/87a46046716b780de38e3b8346a6cbc1a15bb2b0f315738921e0dfdeb16aed94/merged: using mount program /usr/bin/fuse-overlayfs: fuse: device not found, try 'modprobe fuse' first
fuse-overlayfs: cannot mount: No such file or directory
: exit status 1
level=error msg="exit status 1"
level=error msg="exit status 1"

Progress! We have a new error. A bit of searching leads to another helpful blog post

Note that using Fuse requires people running the Buildah container to provide the /dev/fuse device.

Passing along the /dev/fuse device into our Buildah container gives us

$ mkdir -p ~/.local/share/containers
$ podman run --device /dev/fuse \
    -v ~/.local/share/containers:/var/lib/containers \
    -v ./buildah.sh:/buildah.sh buildah/stable:v1.14.8 \
    buildah unshare /buildah.sh
Unable to detect release version (use '--releasever' to specify release version)
Fedora $releasever openh264 (From Cisco) - x86_  27 kB/s |  63 kB     00:02    
Errors during downloading metadata for repository 'fedora-cisco-openh264':
  - Status code: 404 for https://mirrors.fedoraproject.org/metalink?repo=fedora-cisco-openh264-$releasever&arch=x86_64 (IP: 152.19.134.198)
Error: Failed to download metadata for repo 'fedora-cisco-openh264': Cannot prepare internal mirrorlist: Status code: 404 for https://mirrors.fedoraproject.org/metalink?repo=fedora-cisco-openh264-$releasever&arch=x86_64 (IP: 152.19.134.198)
Fedora Modular $releasever - x86_64              29 kB/s |  63 kB     00:02    
Errors during downloading metadata for repository 'fedora-modular':
  - Status code: 404 for https://mirrors.fedoraproject.org/metalink?repo=fedora-modular-$releasever&arch=x86_64&countme=1 (IP: 152.19.134.198)
  - Status code: 404 for https://mirrors.fedoraproject.org/metalink?repo=fedora-modular-$releasever&arch=x86_64 (IP: 152.19.134.198)
Error: Failed to download metadata for repo 'fedora-modular': Cannot prepare internal mirrorlist: Status code: 404 for https://mirrors.fedoraproject.org/metalink?repo=fedora-modular-$releasever&arch=x86_64 (IP: 152.19.134.198)
level=error msg="exit status 1"
level=error msg="exit status 1"

More progress, with another new error! This error is from DNF. DNF needs to know which release of the OS (distribution) we’re using so it can find and use the correct RPM repository. The error message tells us it can’t do that automatically, which makes sense because the empty file system we’re installing into doesn’t have the files DNF reads to automatically figure it out. The error message also tells us we can provide the release with --releasever. Fedora 32 is the latest Fedora release at the time of this writing, so let’s use that release version. We’ll need to modify the dnf line of our build script.

--- buildah.sh.orig     2020-07-08 20:30:47.777000000 -0700
+++ buildah.sh  2020-07-08 20:55:26.839106100 -0700
@@ -3,7 +3,7 @@
 container=$(buildah from scratch)                     # 1
 mount=$(buildah mount ${container})                   # 2
 
-dnf --installroot ${mount} install --assumeyes nginx  # 3
+dnf --installroot ${mount} --releasever 32 install --assumeyes nginx  # 3
 
 buildah config --entrypoint '["nginx"]' ${container}  # 4
 buildah config --cmd '-g "daemon off;"' ${container}  # 5

Rerunning our build yields

$ mkdir -p ~/.local/share/containers
$ podman run --device /dev/fuse \
    -v ~/.local/share/containers:/var/lib/containers \
    -v ./buildah.sh:/buildah.sh buildah/stable:v1.14.8 \
    buildah unshare /buildah.sh
Fedora 32 openh264 (From Cisco) - x86_64        4.8 kB/s | 5.1 kB     00:01    
Fedora Modular 32 - x86_64                      1.9 MB/s | 4.9 MB     00:02    
Fedora Modular 32 - x86_64 - Updates            2.2 MB/s | 3.5 MB     00:01    
Fedora 32 - x86_64 - Updates                    2.4 MB/s |  18 MB     00:07    
Fedora 32 - x86_64                              8.1 MB/s |  70 MB     00:08    
Dependencies resolved.
======================================================================================
 Package                       Arch    Version                          Repo      Size
======================================================================================
Installing:
 nginx                         x86_64  1:1.18.0-1.fc32                  updates  571 k

...

Complete!
Getting image source signatures
Copying blob sha256:2a871e5a9e7da9e807ac0b16daa365b5a6248dc8566abb25e0808eaa92674bfb
Copying config sha256:e24833ba33cab60c450b27b4a078c22a4a1b7a10d246276a56d615fbcabc0ab8
Writing manifest to image destination
Storing signatures
e24833ba33cab60c450b27b4a078c22a4a1b7a10d246276a56d615fbcabc0ab8

Excellent! We successfully built our image. Let’s try to run it to ensure everything works.

We run with -ti so we can see the output, with --rm to clean up the container when we’re done, and --pull never because we just built the image locally, so it shouldn’t need pulling. We also don’t want to accidentally pull it from a remote registry and end up running who knows what. We use the --publish-all option to get port-forwarding to port 80 with the OS selecting a port. This makes it more portable and won’t conflict with any other servers that may be running.

$ podman run -ti --rm --pull never --publish-all buildah/nginx:latest
Error: unable to find a name and tag match for buildah/nginx:latest in repotags: no such image

Welp, that’s not what we wanted⁵. Podman can’t find the image we just built, which we know we built successfully. Let’s see if we can find our newly built image in the file system and see what’s going wrong.

$ ls ~/.local/share/containers/
cache  storage
$ ls ~/.local/share/containers/storage/
cache   mounts   overlay-containers  overlay-layers  tmp          vfs             vfs-images
libpod  overlay  overlay-images      storage.lock    userns.lock  vfs-containers  vfs-layers

Interesting. It looks like we have storage layouts for both OverlayFS and VFS storage drivers. We know our Buildah container is using OverlayFS, so maybe Podman is using VFS. That might explain why Podman can’t see our new container. Let’s try using OverlayFS with Podman to find our image and see if that helps.

$ podman --storage-driver overlay images
Error: database storage graph driver "vfs" does not match our storage graph driver "overlay": database configuration mismatch

I think that means that Podman is not happy that we’re changing from the VFS driver we had been using to OverlayFS. To the internet to see what we can do about this! Searching for the error message turns up an issue on GitHub, where one of the comments says

The current solution is to delete ~/.share/containers to wipe the libpod DB and start fresh.

So let’s give that a shot.⁶

$ sudo rm -rf ~/.local/share/containers/
$ podman --storage-driver overlay images
Error: kernel does not support overlay fs: 'overlay' is not supported over extfs at "/home/zeffron/.local/share/containers/storage/overlay": backing file system is unsupported for this graph driver

That’s an interesting error. We know it can’t be fully correct because our Buildah container used OverlayFS just fine, and it was using the same kernel. Again, a quick search for the error finds us a useful GitHub issue. While this issue isn’t covering our exact situation, and is resolved, it does shed some light on the situation. The issue constantly mentions fuse-overlayfs and fuse3, which Ubuntu 20.04 does not install by default (at least, not in WSL 2). So let’s install them and see if that fixes anything.

$ sudo apt install fuse-overlayfs
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  fuse3 libfuse3-3

...

Processing triggers for initramfs-tools (0.136ubuntu6.2) ...
$ podman images
REPOSITORY  TAG     IMAGE ID  CREATED  SIZE
$ ls ~/.local/share/containers/storage/
libpod  mounts  overlay  overlay-containers  overlay-images  overlay-layers  storage.lock  tmp  userns.lock

That looks to have switched the storage driver used by Podman to OverlayFS, which is what we want. Unfortunately, we have no images because we wiped everything out. We need to re-run our build script and then try rerunning to resulting container.

$ podman run --device /dev/fuse \
    -v ~/.local/share/containers:/var/lib/containers \
    -v ./buildah.sh:/buildah.sh buildah/stable:v1.14.8 \
    buildah unshare /buildah.sh

...

Complete!
Getting image source signatures
Copying blob sha256:a91104507a0d54fa8d76e73ab7a8168327475ded5928debf661bf7014a7e0606
Copying config sha256:c00695ae86d811e66f04238bc953532f559d2849a8a1e0ded5fcfd1aae21fafc
Writing manifest to image destination
Storing signatures
c00695ae86d811e66f04238bc953532f559d2849a8a1e0ded5fcfd1aae21fafc
$ $ podman run -ti --rm --pull never --publish-all buildah/nginx:latest

This time the container starts, and stays running, with no output. To confirm that, we can (in a new terminal), look at which port was forwarded, and then we can use our web browser to connect to localhost on that port.

$ podman ps --format '{% raw %}{{ .Ports }}{% endraw %}'
0.0.0.0:37345->80/tcp

For me, port 37345 has been forwarded, so I need to connect to localhost:37345⁷ in my browser. When navigating there, I see a successful test page.

Nginx Test Page

Now that we have a working build, we should make it smaller and not take so long to build, but we’ll explore that next time.

So far, some of the features required for a functional environment require Kubernetes to use the docker container runtime. I am working to identify the remaining issues with cri-o and/or containerd, and then solve or raise them with appropriate projects. Once either one of them works (with a preference for containerd), Docker can be removed entirely. ↩︎
It seems like Docker has recently undergone some changes and they no longer have an email signup wall. They may have also stopped with the other behaviors I find objectionable, but I’ve been working to replace them for long enough that I don’t see a reason to switch back just because they’re no longer doing objectionable things. Also, it’s still a big, bulky daemon. ↩︎
The build script also needs to be executable. If it’s not already, you can make it so by running chmod a+x buildah.sh. ↩︎
This seems like a magic directory, but it comes from the documentation for the container storage configuration. The documentation has a rootless_storage_path key which is the path to container storage when running rootless (as a non-root user), which, as we’re about to cover, we are. The default value for that key is
$XDG_DATA_HOME/containers/storage, if XDG_DATA_HOME is set. Otherwise $HOME/.local/share/containers/storage is used.
According to the specification
If $XDG_DATA_HOME is either not set or empty, a default equal to $HOME/.local/share should be used.
These two defaults make the effective default location for the containers directory ~/.local/share/containers. If this does not work for you, please use $XDG_DATA_HOME/containers or check where your rootless_storage_path is configured to place the storage. ↩︎
It’s entirely possible that you may not have encountered an error. If you keep reading, you’ll see the error is caused by Ubuntu not installing fuse-overlayfs by default, resulting in fallback to the VFS storage driver, while the Buildah container uses the OverlayFS storage driver. If your Linux environment has fuse-overlayfs already, you should not experience this issue. ↩︎
The use of sudo to delete the containers directory is required because we used different namespaces for building containers. This gives them different UIDs which means we can’t delete them as our user. ↩︎
For some reason I don’t understand, in order for the connection from a Windows native web browser to connect successfully to my Nginx container running in WSL 2, I need to use “localhost” as the host. Using “127.0.0.1” will not work. If I’m making the request from within WSL 2 (i.e. using a WSL 2 installed web browser or cURL), then “127.0.0.1” works. ↩︎