1174 words
6 minutes
Patching a Proxmox host and its LXCs for Copy Fail (CVE-2026-31431)

CVE-2026-31431 (“Copy Fail”) is a local privilege escalation in algif_aead. An unprivileged user can write four controlled bytes anywhere in the page cache and pivot that into root by corrupting a setuid binary. The mainline fix landed in 6.18.22 / 6.19.12 / 7.0, but every distro and hypervisor backports on its own schedule — so “I’m patched” is a per-environment question.

The short version#

  • LXCs share the host kernel. Patch the host once and all seven containers are covered.
  • The modprobe /bin/false mitigation is a one-minute, no-reboot fix. Apply it before scheduling kernel work, not after.
  • apt-cache policy proxmox-kernel-6.17 is the fastest way to tell if your APT repo is healthy. If the candidate equals the installed version, the repo is broken — not the kernel.

The diagnosis narrows to three axes#

1. Does uname -r clear the patch cutoff?
2. Are AF_ALG / algif_* loaded — or, more importantly, autoload-able? (lsmod, modprobe.d)
3. Does APT actually see new kernels? (apt-cache policy)

The host was running 6.17.2-1-pve, built 2025-10-21. That’s six months before the public fix even existed, so backport status was the only thing that could save it.

apt-cache policy proxmox-kernel-6.17 came back with the candidate equal to the installed version. Translation: APT couldn’t see anything newer. Root cause: only pve-enterprise.sources was active, and without a subscription it 401s on enterprise.proxmox.com. The pve-no-subscription repo was missing.

Mitigation closes the attack surface in under a minute#

I closed the attack surface before touching the kernel. AF_ALG modules are autoloaded on the first socket(AF_ALG, ...) call, so “not in lsmod” is not safety. You have to block the load itself.

Terminal window
cat > /etc/modprobe.d/disable-algif-cve-2026-31431.conf <<'EOF'
install af_alg /bin/false
install algif_aead /bin/false
install algif_skcipher /bin/false
install algif_hash /bin/false
install algif_rng /bin/false
EOF
# unload anything currently loaded
modprobe -r algif_aead algif_skcipher algif_hash algif_rng af_alg 2>/dev/null
# verify: load attempts must fail
modprobe algif_aead
# modprobe: ERROR: Error running install command '/bin/false' for module af_alg: retcode 1
# modprobe: ERROR: could not insert 'algif_aead': Invalid argument

The two error lines are the proof. They survive reboot.

Side effects to think about: anything using AF_ALG via the kernel crypto API will break. LUKS / cryptsetup, fscrypt, and a handful of userspace crypto fallbacks can hit this. On a stock Proxmox host nothing noticed. On a host doing cryptsetup luksOpen inside containers, validate before applying.

Fixing the APT repos#

The host had pve-enterprise.sources active without a subscription, plus ceph.sources pointed at the enterprise channel even though Ceph wasn’t installed (pveceph status returned binary not installed: /usr/bin/ceph-mon). Both got disabled, pve-no-subscription added.

Terminal window
mv /etc/apt/sources.list.d/pve-enterprise.sources \
/etc/apt/sources.list.d/pve-enterprise.sources.disabled
mv /etc/apt/sources.list.d/ceph.sources \
/etc/apt/sources.list.d/ceph.sources.disabled
cat > /etc/apt/sources.list.d/pve-no-subscription.sources <<'EOF'
Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-no-subscription
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
EOF
apt update

After that, apt-cache policy proxmox-kernel-6.17 started reporting candidates like 6.17.13-6 — exactly what I needed.

Picking 6.17.13-6 — one changelog line decided it#

The proxmox-default-kernel metapackage points at the 7.0 line. Jumping to 7.0 was an option, but I didn’t want to validate ZFS, intel-microcode, and NIC driver compatibility on the same change window. Stayed on 6.17 and went to 6.17.13-6.

The decision came down to one snippet from the changelog:

proxmox-kernel-6.17 (6.17.13-5) trixie; urgency=medium
* Fix "copy.fail" Local Privilage Escalation / CVE-2026-31431:
An unprivileged local user can write 4 controlled bytes into the page cache
of any readable file on a Linux system, and use that to gain root.
-- Proxmox Support Team Thu, 30 Apr 2026 08:30:46 +0200
proxmox-kernel-6.17 (6.17.13-6) trixie; urgency=medium
* cherry-pick follow-up commits for copy.fail fixes

-5 got the fix, -6 added follow-up commits. Explicit. No guesswork.

I held the metapackage so it wouldn’t drag 7.0 in, then installed the 6.17 line directly.

Terminal window
apt-mark hold proxmox-default-kernel
apt-get install -y proxmox-kernel-6.17
apt-get full-upgrade -y
proxmox-boot-tool refresh

update-grub registers both the new and the old kernel as menu entries. The new one becomes the default; the old remains as a fallback you can pick from GRUB if anything goes wrong.

After reboot, all 7 LXCs follow automatically — kernel-side only#

When you trigger a reboot from your own SSH session, you want the command to return cleanly before the system goes down. systemd-run with a five-second delay is the cleanest pattern I’ve found:

Terminal window
systemd-run --on-active=5sec --unit=manual-reboot.timer systemctl reboot

Then poll until SSH comes back. One bash loop is enough:

Terminal window
for i in $(seq 1 30); do
if ssh -o ConnectTimeout=5 prod-host 'uname -r; uptime' 2>/dev/null; then
break
fi
sleep 5
done

In my run it took about 30 seconds. New kernel 6.17.13-6-pve, uptime 0.

Once the host is back, all seven LXCs share the new kernel. pct exec <vmid> -- uname -r returns the host’s value. CVE-2026-31431 exposure is closed for every container at this point.

That is not the whole story. Userspace packages are per-container. openssl, libc6, sudo, openssh-server — those are managed by each container’s apt, not the host’s. I checked them all anyway, and that’s where things got interesting.

Isolation policy needs a paired patch path#

Two of the containers — call them ct-foo and ct-bar — were both Ubuntu 24.04 LTS. Their report looked like this:

ct-foo:
pkgs upgradable: total=0 security=0
unattended-upgrades: missing
/var/lib/apt/lists/*Release → 2024-05-07

upgradable=0 here is a lie. Their APT catalog hadn’t been updated since May 2024, almost two full years. They couldn’t tell me about new security updates because they’d never looked. unattended-upgrades wasn’t even installed.

The cause: their nameserver pointed only at an internal private DNS, which couldn’t resolve archive.ubuntu.com.

Terminal window
# inside the container
$ getent hosts archive.ubuntu.com
$ # no response — DNS fail
$ cat /etc/resolv.conf
nameserver 10.x.x.x # private DNS only

The isolation itself is a sensible policy — containers shouldn’t have unfettered internet egress. But isolation without a paired patch path (an internal mirror, scheduled bootstrap, anything) just means containers quietly rot. ct-foo rotted for almost two years.

While you’re there: apt’s 0 upgraded line is also a trap.

0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

It’s tempting to read this as “fully patched.” It isn’t. It looks identical when apt update silently failed. A false negative.

The check you actually want is three lines, not one:

Terminal window
# when did we last see a fresh catalog?
ls -la /var/lib/apt/lists/*Release | head -5
# does apt update actually succeed?
apt-get update 2>&1 | grep -E "Err:|Failed"
# does the mirror respond at all?
timeout 5 curl -sI http://archive.ubuntu.com/ubuntu/dists/noble/Release

If any of those three is broken, apt list --upgradable is lying to you.

Takeaways#

  • One host kernel = every LXC. That’s the gift and the bill of Type-2 containerization. The host being late means every container is late.
  • The modprobe block is a free lunch. A kernel upgrade requires a maintenance window. The mitigation closes the attack surface in under a minute. They’re not interchangeable; they complement each other.
  • Isolation policy needs a patch path. Cutting containers off from the public internet is fine. Forgetting to give them a way to receive updates anyway is how you end up with two-year-old userspace running production traffic.
  • apt-cache policy <pkg> is the shortest diagnostic in the bag. Candidate equal to installed means a repo problem, not a missing fix.

I’m putting this in the runbook so the next CVE in this shape — and there will be one — turns into a 30-minute job, not a half-day audit. This post is that runbook.

BleepingComputer coverage, oss-security advisory, theori-io PoC repository.

Patching a Proxmox host and its LXCs for Copy Fail (CVE-2026-31431)
https://typhoon.is-a.dev/en/posts/copy-fail-cve-2026-31431-proxmox-lxc-patch/
Author
Typhoon
Published at
2026-05-02
License
CC BY-NC-SA 4.0