Importing GPG keys when building Fedora-based containers
tl;dr: When using gpg
in Containerfile build steps for container images based
on Fedora1-based (39+), ensure you also include
gpgconf --kill keyboxd
by the end of the same build step. Alternately, and
probably better, create a ~/.gnupg directory before the first invocation of
gpg
.
Trying to verify MediaWiki
My discovery of today’s issue occured when I was trying to install MediaWiki from a release tarball into a container image. MediaWiki provides GPG signatures for their release bundles, so we can use their public GPG keys to verify that what we have downloaded is, in fact, from Wikimedia2.
Downloading the keys
If you’d like to follow along, you can download the Wikimedia public key list. The Containerfile expects it to be at the root of the build context as keys.txt.
A naive attempt to import the keys and verify the signature
Typically, at least as I understand it, the steps for verifying a download with GPG keys is
- Import the keys
- Download the file to be verified
- Download the signature for the file
- Use GPG to verify the package against the signature
Putting those steps into a Containerfile gives us
FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
which we can build with podman build .
to get an error.
...
gpg: Note: database_open 134217901 waiting for lock (held by 3) ...
gpg: keydb_search failed: Connection timed out
gpg: Can't check signature: No public key
...
That’s very odd. Why is there no public key? We just imported it. And what’s this about a connection timing out and a lock being held?
Finding the lock
I know nothing about GPG3, so I don’t really know where to begin with this error. I’m assuming something is holding some lock file, because files are the only thing that carry over from one Containerfile step to another, but which lock file and why it wasn’t released are a mystery.
Fortunately, u/No_Feedback_8594 in r/debian knows a lot more than I do. Modifying the Containerfile slightly lets us figure out which lock file was left behind.
FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && find ~/.gnupg/public-keys.d
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
# RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
Building again gives us a small list of files:
- /root/.gnupg/public-keys.d
- /root/.gnupg/public-keys.d/.#lk0x00007f9628002270.6f073117b3f2.4
- /root/.gnupg/public-keys.d/pubring.db.lock
- /root/.gnupg/public-keys.d/pubring.db
One of them (/root/.gnupg/public-keys.d/pubring.db.lock) looks like a lock file! Great, so now let’s see what happens if we remove the lock file.
FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && rm /root/.gnupg/public-keys.d/pubring.db.lock
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
gpg: Signature made Thu Mar 28 22:16:05 2024 UTC
gpg: using DSA key 1D98867E82982C8FE0ABC25F9B69B3109D3BB7B0
gpg: Good signature from "Sam Reed <[email protected]>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1D98 867E 8298 2C8F E0AB C25F 9B69 B310 9D3B B7B0
...
Look at that! Removing the lock file allowed us to verify the signature just fine! So we’ve confirmed that the errant lock file is the problem.
But why is the lock held?
Now we know which file represents the lock, and we know that we can verify signatures by removing the lock file. But that seems like a bad solution. If the lock file is still in place, it means something was holding the lock. A lock being held usually means state is not consistent. We could be causing corruption by deleting the lock file. So let’s see if we can find out why the lock file isn’t already being cleaned up.
First, let’s see who’s holding the lock file. Another small modification4 to our Containerfile will allow us to check that.
FROM fedora:latest
COPY keys.txt keys.txt
RUN dnf install --assumeyes procps-ng
RUN gpg --import keys.txt && ps $(pgrep gpg)
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
# RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
PID TTY STAT TIME COMMAND
8 ? Ssl 0:00 gpg-agent --homedir /root/.gnupg --use-standard-socket --daemon
...
Okay, so it looks like after we finish importing the keys, we still have
gpg-agent
hanging around. I’ve never heard of gpg-agent
, but let’s see if
we can find some information on it. According to
its man page
gpg-agent is a daemon to manage secret (private) keys independently from any protocol. It is used as a backend for gpg and gpgsm as well as for a couple of other utilities.
The agent is automatically started on demand by gpg, gpgsm, gpgconf, or gpg-connect-agent.
Sure, that lines up with what we’re seeing. We’re running gpg
, so gpg-agent
is started automatically.
Releasing the lock
Let’s see if there’s a way to prevent that. If we search the
gpg
man page for “gpg-agent”, we quickly come across the
--no-autostart
option, which sounds like it might do what we want. The
documentation says
Do not start the gpg-agent or the dirmngr if it has not yet been started and its service is required.
Let’s try it out!
FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --no-autostart --import keys.txt
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
gpg: directory '/root/.gnupg' created
gpg: no keyboxd running in this session
gpg: error opening key DB: No Keybox daemon running
gpg: key 73F146FECF9D333C: public key not found: Input/output error
gpg: error reading 'keys.txt': Input/output error
gpg: import from 'keys.txt' failed: Input/output error
gpg: Total number processed: 0
...
Huh. Looks like when they used the word “required”, they meant it. Unfortunately for us, there doesn’t seem to be anything else in the gpg
man page that seems like it will help.
If we take another look at the gpg-agent
man page, just
beneath where it told us it was automatically started are instructions for
safely shutting it down.
If you want to manually terminate the currently-running agent, you can safely do so with:
gpgconf --kill gpg-agent
Perfect! Let’s give that a shot.
FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && gpgconf --kill gpg-agent
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
gpg: Note: database_open 134217901 waiting for lock (held by 4) ...
gpg: keydb_search failed: Connection timed out
gpg: Can't check signature: No public key
...
Interesting… something is still holding the5 lock. Let’s cast a wider net and see what else could be responsible for that.
FROM fedora:latest
COPY keys.txt keys.txt
RUN dnf install --assumeyes procps-ng
RUN gpg --import keys.txt && gpgconf --kill gpg-agent && ps $(pgrep gpg)
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
# RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
# RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
PID TTY TIME CMD
1 ? 00:00:00 ps
4 ? 00:00:00 keyboxd
...
Okay, so we still have a keyboxd
daemon running. Never heard of it. A
quick search just returns page after page of GPG-related
result, so I’m betting it’s what’s responsible for the lock. Which means it
was probably started automatically along with or by gpg-agent
. I don’t find
any good documentaton for it, though, so I’m not sure of its true purpose, why
it started up, why it’s holding the lock, or why it didn’t shut down.
Actually releasing the lock
Because I like to understand what commands are going to do before I run them, I
didn’t blindly run the gpgconf
command that was recommended for shuting down
gpg-agent
. Instead, I read its man page, including the
documentation for the --kill
option.
--kill [
component
]
-K
Kill the given component that runs as a daemon, including gpg-agent, dirmngr, and scdaemon. A component which does not run as a daemon will be ignored. Using “all” for component kills all components running as daemons. Note that as of now reload and kill have the same effect for scdaemon.
keyboxd
is not mentioned explicitly, but it does appear to be a GPG component
and it is certainly running as a daemon. Maybe we need to replace gpg-agent
with all
as the component?
FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && gpgconf --kill all
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
...
gpg: Signature made Thu Mar 28 22:16:05 2024 UTC
gpg: using DSA key 1D98867E82982C8FE0ABC25F9B69B3109D3BB7B0
gpg: Good signature from "Sam Reed <[email protected]>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1D98 867E 8298 2C8F E0AB C25F 9B69 B310 9D3B B7B0
...
Hey! We did it! We verified the signature again, and this time we can be confident we are not in an inconsistent state!
One last bit of cleanup
What we’ve essentially learned is that, if it’s not already running, invoking
gpg
will leave both gpg-agent
and keyboxd
running. When we verify the
signature, we invoke gpg
. Which means, we probably left gpg-agent
and
keyboxd
running, as well as the lock file. So while we were able to verify
our signature, future invocations will probably fail, including future key
imports. Let’s confirm.
FROM fedora:latest
COPY keys.txt keys.txt
RUN dnf install --assumeyes procps-ng
RUN gpg --import keys.txt && gpgconf --kill all
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz && \
ps && \
find ~/.gnupg/public-keys.d
...
PID TTY TIME CMD
1 ? 00:00:00 sh
8 ? 00:00:00 keyboxd
11 ? 00:00:00 ps
/root/.gnupg/public-keys.d
/root/.gnupg/public-keys.d/pubring.db
/root/.gnupg/public-keys.d/.#lk0x00007f52e4002270.c8260bcc644b.8
/root/.gnupg/public-keys.d/pubring.db.lock
...
It looks like we did, in fact, leave things behind that will break future
builds. Surprisingly, we did not leave behind gpg-agent
, but we should still
be good neighbors for future steps and perform the GPG daemon shutdown when
we’re done verifying the signature. So our final Containerfile looks like
FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && gpgconf --kill all
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz && \
gpgconf --kill all
But wait, there’s more!
After discovering that only recent versions Fedora 39+ images were
affected1, I decided to see what was different between the two
versions. The fedora:38
base image has gpg
2.4.0-3, and the fedora:39
base image has gpg
2.4.4-1. When looking at the changelog for
the gpg
RPM in koji, I suspected that 2.4.3-3, which restored
systemd units and sockets, was the likely specific RPM version that broke
things. This turned out to most likely be incorrect.
While filing a bug against the Fedora Container, I
came across another bug against the gnupg2
RPM
talking about keyboxd
. It linked to the
2.4.1 release notes6 which discuss the
purpose of keyboxd
as well as how to disable it.
The important points are that keyboxd
was introduced in 2.3.0 as a SQLite
database for keys, instead of the previous keyring, to improve performance and
allow higher concurrency. Prior to being made the default in 2.4.1, it required
a configuration option to be enabled. Since version 2.4.0 came after the
introduction of keyboxd
, we can assume that keyboxd
was not enabled in
Fedora, and that the change that resulted in the leaked lock file was its use
by default.
If that’s the case, then all we need to do is stop keyboxd
, not everything.
FROM fedora:latest
COPY keys.txt keys.txt
RUN gpg --import keys.txt && gpgconf --kill keyboxd
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz && \
gpgconf --kill keyboxd
There’s another little tidbit in the release notes that’s of interest.
keyboxd
is only the default for a fresh installation. And the installation
is assumed to be stale if the ~/.gnupg directory exists. So a more permanent
solution that does not require changing every invocation is to create the
~/.gnupg directory before the first invocation of gpg
. This will revert the
behavior back to using a keyring instead of the more performant database but
does not require changes to multiple RUN
steps.
FROM fedora:latest
COPY keys.txt keys.txt
RUN mkdir ~/.gnupg
RUN gpg --import keys.txt
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz
RUN curl -fSLO https://releases.wikimedia.org/mediawiki/1.41/mediawiki-core-1.41.1.tar.gz.sig
RUN gpg --verify mediawiki-core-1.41.1.tar.gz.sig mediawiki-core-1.41.1.tar.gz
I tried to replicate this issue on Ubuntu by changing the base image to
ubuntu:latest
. The problem did not replicate. I then tried to replicate for RedHat by usingredhat/ubi9:latest
as the base image. Again, the problem did not replicate. So I tried with Amazon Linux 2023 (amazonlinux:2023
base image) which is based on Fedora. It spewed warnings about the keys having preferences for unavailable algorithms and failed to startgpg-agent
, so it completely failed to even import the keys. The issue covered by this post seems to be a problem exclusively with Fedora.Specifically, it is a problem with recent versions of Fedora. Binary searching Fedora versions between 40 (latest) and 35 found that Fedora 39 is the first to exhibit the issue. ↩︎ ↩︎
We probably don’t have to use GPG keys to do this, as we’re downloading directly from MediaWiki, and we’re downloading over HTTPS. This is probably proof enough that we’re downloading a valid release. Yes, if someone does manage to upload altered tarballs to the MediaWiki releases page, they can almost certainly upload new signatures and new keys, meaning if we download the keys right before verifying, verifying gives us nothing. However, because keys change so rarely, we can pre-download and hang on to the keys and reuse them. That way, if someone can gain access to publish altered tarballs, the signatures will not validate against the old keys, and we will have an opportunity to detect the tampering. ↩︎
It’s not really true that I know nothing about GPG. But what little I do know is mostly about how annoying those who use it to sign every email are, how terrible its UX is, and how much that leads to people accidentally not providing correct verifiability or privacy. Essentially, I know only what I need to know to have decided I hate it and never want to use it. ↩︎
You might be wondering why, if we’re trying to identify who is hanging on to a lock file, we’re not using
lsof
. The answer is that I tried, andlsof
didn’t find anyone with the lock file open. I’m assuming that GPG creates the lock file by path to take the lock, and then deletes it by path to release the lock, but does not hold on to it as an open file. ↩︎Technically, all we know is that something is holding a lock. We don’t know that it’s the same lock. But using
find
to list all of the files in ~/.gnupg/public-keys.d confirms that, yes, it is the same lock. ↩︎The bug actually linked to the 2.4.3 release notes, but since the relevant change was introduced in 2.4.1, and the 2.4.1 release notes also have the relevant documentation, I chose to link to the 2.4.1 release notes. ↩︎