With bugs!

Importing GPG keys when building Fedora-based containers

tl;dr: When using gpg in Containerfile build steps for container images based on Fedora1-based (39+), ensure you also include gpgconf --kill keyboxd by the end of the same build step. Alternately, and probably better, create a ~/.gnupg directory before the first invocation of gpg.

Trying to verify MediaWiki

My discovery of today’s issue occured when I was trying to install MediaWiki from a release tarball into a container image. MediaWiki provides GPG signatures for their release bundles, so we can use their public GPG keys to verify that what we have downloaded is, in fact, from Wikimedia2.

Downloading the keys

If you’d like to follow along, you can download the Wikimedia public key list. The Containerfile expects it to be at the root of the build context as keys.txt.

A naive attempt to import the keys and verify the signature

Typically, at least as I understand it, the steps for verifying a download with GPG keys is

  1. Import the keys
  2. Download the file to be verified
  3. Download the signature for the file
  4. Use GPG to verify the package against the signature

Putting those steps into a Containerfile gives us

FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz

which we can build with podman build . to get an error.

...
gpg: Note: database_open 134217901 waiting for lock (held by 3) ...
gpg: keydb_search failed: Connection timed out
gpg: Can't check signature: No public key
...

That’s very odd. Why is there no public key? We just imported it. And what’s this about a connection timing out and a lock being held?

Finding the lock

I know nothing about GPG3, so I don’t really know where to begin with this error. I’m assuming something is holding some lock file, because files are the only thing that carry over from one Containerfile step to another, but which lock file and why it wasn’t released are a mystery.

Fortunately, u/No_Feedback_8594 in r/debian knows a lot more than I do. Modifying the Containerfile slightly lets us figure out which lock file was left behind.

FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && find ~/.gnupg/public-keys.d
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
# RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz

Building again gives us a small list of files:

One of them (/root/.gnupg/public-keys.d/pubring.db.lock) looks like a lock file! Great, so now let’s see what happens if we remove the lock file.

FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && rm /root/.gnupg/public-keys.d/pubring.db.lock
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
gpg: Signature made Thu Mar 28 22:16:05 2024 UTC
gpg:                using DSA key 1D98867E82982C8FE0ABC25F9B69B3109D3BB7B0
gpg: Good signature from "Sam Reed <[email protected]>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1D98 867E 8298 2C8F E0AB  C25F 9B69 B310 9D3B B7B0
...

Look at that! Removing the lock file allowed us to verify the signature just fine! So we’ve confirmed that the errant lock file is the problem.

But why is the lock held?

Now we know which file represents the lock, and we know that we can verify signatures by removing the lock file. But that seems like a bad solution. If the lock file is still in place, it means something was holding the lock. A lock being held usually means state is not consistent. We could be causing corruption by deleting the lock file. So let’s see if we can find out why the lock file isn’t already being cleaned up.

First, let’s see who’s holding the lock file. Another small modification4 to our Containerfile will allow us to check that.

FROM fedora:latest
COPY keys.txt keys.txt
RUN dnf install --assumeyes procps-ng
RUN gpg --import keys.txt && ps $(pgrep gpg)
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
# RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
    PID TTY      STAT   TIME COMMAND
      8 ?        Ssl    0:00 gpg-agent --homedir /root/.gnupg --use-standard-socket --daemon
...

Okay, so it looks like after we finish importing the keys, we still have gpg-agent hanging around. I’ve never heard of gpg-agent, but let’s see if we can find some information on it. According to its man page

gpg-agent is a daemon to manage secret (private) keys independently from any protocol. It is used as a backend for gpg and gpgsm as well as for a couple of other utilities.

The agent is automatically started on demand by gpg, gpgsm, gpgconf, or gpg-connect-agent.

Sure, that lines up with what we’re seeing. We’re running gpg, so gpg-agent is started automatically.

Releasing the lock

Let’s see if there’s a way to prevent that. If we search the gpg man page for “gpg-agent”, we quickly come across the --no-autostart option, which sounds like it might do what we want. The documentation says

Do not start the gpg-agent or the dirmngr if it has not yet been started and its service is required.

Let’s try it out!

FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --no-autostart --import keys.txt
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
gpg: directory '/root/.gnupg' created
gpg: no keyboxd running in this session
gpg: error opening key DB: No Keybox daemon running
gpg: key 73F146FECF9D333C: public key not found: Input/output error
gpg: error reading 'keys.txt': Input/output error
gpg: import from 'keys.txt' failed: Input/output error
gpg: Total number processed: 0
...

Huh. Looks like when they used the word “required”, they meant it. Unfortunately for us, there doesn’t seem to be anything else in the gpg man page that seems like it will help.

If we take another look at the gpg-agent man page, just beneath where it told us it was automatically started are instructions for safely shutting it down.

If you want to manually terminate the currently-running agent, you can safely do so with:

gpgconf --kill gpg-agent

Perfect! Let’s give that a shot.

FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && gpgconf --kill gpg-agent
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
gpg: Note: database_open 134217901 waiting for lock (held by 4) ...
gpg: keydb_search failed: Connection timed out
gpg: Can't check signature: No public key
...

Interesting… something is still holding the5 lock. Let’s cast a wider net and see what else could be responsible for that.

FROM fedora:latest
COPY keys.txt keys.txt
RUN dnf install --assumeyes procps-ng
RUN gpg --import keys.txt && gpgconf --kill gpg-agent && ps $(pgrep gpg)
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
# RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
    PID TTY          TIME CMD
      1 ?        00:00:00 ps
      4 ?        00:00:00 keyboxd
...

Okay, so we still have a keyboxd daemon running. Never heard of it. A quick search just returns page after page of GPG-related result, so I’m betting it’s what’s responsible for the lock. Which means it was probably started automatically along with or by gpg-agent. I don’t find any good documentaton for it, though, so I’m not sure of its true purpose, why it started up, why it’s holding the lock, or why it didn’t shut down.

Actually releasing the lock

Because I like to understand what commands are going to do before I run them, I didn’t blindly run the gpgconf command that was recommended for shuting down gpg-agent. Instead, I read its man page, including the documentation for the --kill option.

--kill [component]
-K
Kill the given component that runs as a daemon, including gpg-agent, dirmngr, and scdaemon. A component which does not run as a daemon will be ignored. Using “all” for component kills all components running as daemons. Note that as of now reload and kill have the same effect for scdaemon.

keyboxd is not mentioned explicitly, but it does appear to be a GPG component and it is certainly running as a daemon. Maybe we need to replace gpg-agent with all as the component?

FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && gpgconf --kill all
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
gpg: Signature made Thu Mar 28 22:16:05 2024 UTC
gpg:                using DSA key 1D98867E82982C8FE0ABC25F9B69B3109D3BB7B0
gpg: Good signature from "Sam Reed <[email protected]>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1D98 867E 8298 2C8F E0AB  C25F 9B69 B310 9D3B B7B0
...

Hey! We did it! We verified the signature again, and this time we can be confident we are not in an inconsistent state!

One last bit of cleanup

What we’ve essentially learned is that, if it’s not already running, invoking gpg will leave both gpg-agent and keyboxd running. When we verify the signature, we invoke gpg. Which means, we probably left gpg-agent and keyboxd running, as well as the lock file. So while we were able to verify our signature, future invocations will probably fail, including future key imports. Let’s confirm.

FROM fedora:latest
COPY keys.txt keys.txt
RUN dnf install --assumeyes procps-ng
RUN gpg --import keys.txt && gpgconf --kill all
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz && \
    ps && \
    find ~/.gnupg/public-keys.d
...
    PID TTY          TIME CMD
      1 ?        00:00:00 sh
      8 ?        00:00:00 keyboxd
     11 ?        00:00:00 ps
/root/.gnupg/public-keys.d
/root/.gnupg/public-keys.d/pubring.db
/root/.gnupg/public-keys.d/.#lk0x00007f52e4002270.c8260bcc644b.8
/root/.gnupg/public-keys.d/pubring.db.lock
...

It looks like we did, in fact, leave things behind that will break future builds. Surprisingly, we did not leave behind gpg-agent, but we should still be good neighbors for future steps and perform the GPG daemon shutdown when we’re done verifying the signature. So our final Containerfile looks like

FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && gpgconf --kill all
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz && \
    gpgconf --kill all

But wait, there’s more!

After discovering that only recent versions Fedora 39+ images were affected1, I decided to see what was different between the two versions. The fedora:38 base image has gpg 2.4.0-3, and the fedora:39 base image has gpg 2.4.4-1. When looking at the changelog for the gpg RPM in koji, I suspected that 2.4.3-3, which restored systemd units and sockets, was the likely specific RPM version that broke things. This turned out to most likely be incorrect.

While filing a bug against the Fedora Container, I came across another bug against the gnupg2 RPM talking about keyboxd. It linked to the 2.4.1 release notes6 which discuss the purpose of keyboxd as well as how to disable it.

The important points are that keyboxd was introduced in 2.3.0 as a SQLite database for keys, instead of the previous keyring, to improve performance and allow higher concurrency. Prior to being made the default in 2.4.1, it required a configuration option to be enabled. Since version 2.4.0 came after the introduction of keyboxd, we can assume that keyboxd was not enabled in Fedora, and that the change that resulted in the leaked lock file was its use by default.

If that’s the case, then all we need to do is stop keyboxd, not everything.

FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && gpgconf --kill keyboxd
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz && \
    gpgconf --kill keyboxd

There’s another little tidbit in the release notes that’s of interest. keyboxd is only the default for a fresh installation. And the installation is assumed to be stale if the ~/.gnupg directory exists. So a more permanent solution that does not require changing every invocation is to create the ~/.gnupg directory before the first invocation of gpg. This will revert the behavior back to using a keyring instead of the more performant database but does not require changes to multiple RUN steps.

FROM fedora:latest
COPY keys.txt keys.txt
RUN mkdir ~/.gnupg
RUN gpg --import keys.txt
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz

  1. I tried to replicate this issue on Ubuntu by changing the base image to ubuntu:latest. The problem did not replicate. I then tried to replicate for RedHat by using redhat/ubi9:latest as the base image. Again, the problem did not replicate. So I tried with Amazon Linux 2023 (amazonlinux:2023 base image) which is based on Fedora. It spewed warnings about the keys having preferences for unavailable algorithms and failed to start gpg-agent, so it completely failed to even import the keys. The issue covered by this post seems to be a problem exclusively with Fedora.

    Specifically, it is a problem with recent versions of Fedora. Binary searching Fedora versions between 40 (latest) and 35 found that Fedora 39 is the first to exhibit the issue. ↩︎ ↩︎

  2. We probably don’t have to use GPG keys to do this, as we’re downloading directly from MediaWiki, and we’re downloading over HTTPS. This is probably proof enough that we’re downloading a valid release. Yes, if someone does manage to upload altered tarballs to the MediaWiki releases page, they can almost certainly upload new signatures and new keys, meaning if we download the keys right before verifying, verifying gives us nothing. However, because keys change so rarely, we can pre-download and hang on to the keys and reuse them. That way, if someone can gain access to publish altered tarballs, the signatures will not validate against the old keys, and we will have an opportunity to detect the tampering. ↩︎

  3. It’s not really true that I know nothing about GPG. But what little I do know is mostly about how annoying those who use it to sign every email are, how terrible its UX is, and how much that leads to people accidentally not providing correct verifiability or privacy. Essentially, I know only what I need to know to have decided I hate it and never want to use it. ↩︎

  4. You might be wondering why, if we’re trying to identify who is hanging on to a lock file, we’re not using lsof. The answer is that I tried, and lsof didn’t find anyone with the lock file open. I’m assuming that GPG creates the lock file by path to take the lock, and then deletes it by path to release the lock, but does not hold on to it as an open file. ↩︎

  5. Technically, all we know is that something is holding a lock. We don’t know that it’s the same lock. But using find to list all of the files in ~/.gnupg/public-keys.d confirms that, yes, it is the same lock. ↩︎

  6. The bug actually linked to the 2.4.3 release notes, but since the relevant change was introduced in 2.4.1, and the 2.4.1 release notes also have the relevant documentation, I chose to link to the 2.4.1 release notes. ↩︎

Tags