CVE-2026-31431 (“Copy Fail”) is a local privilege escalation in algif_aead. An unprivileged user can write four controlled bytes anywhere in the page cache and pivot that into root by corrupting a setuid binary. The mainline fix landed in 6.18.22 / 6.19.12 / 7.0, but every distro and hypervisor backports on its own schedule — so “I’m patched” is a per-environment question.
The short version
- LXCs share the host kernel. Patch the host once and all seven containers are covered.
- The modprobe
/bin/falsemitigation is a one-minute, no-reboot fix. Apply it before scheduling kernel work, not after. apt-cache policy proxmox-kernel-6.17is the fastest way to tell if your APT repo is healthy. If the candidate equals the installed version, the repo is broken — not the kernel.
The diagnosis narrows to three axes
1. Does uname -r clear the patch cutoff?2. Are AF_ALG / algif_* loaded — or, more importantly, autoload-able? (lsmod, modprobe.d)3. Does APT actually see new kernels? (apt-cache policy)The host was running 6.17.2-1-pve, built 2025-10-21. That’s six months before the public fix even existed, so backport status was the only thing that could save it.
apt-cache policy proxmox-kernel-6.17 came back with the candidate equal to the installed version. Translation: APT couldn’t see anything newer. Root cause: only pve-enterprise.sources was active, and without a subscription it 401s on enterprise.proxmox.com. The pve-no-subscription repo was missing.
Mitigation closes the attack surface in under a minute
I closed the attack surface before touching the kernel. AF_ALG modules are autoloaded on the first socket(AF_ALG, ...) call, so “not in lsmod” is not safety. You have to block the load itself.
cat > /etc/modprobe.d/disable-algif-cve-2026-31431.conf <<'EOF'install af_alg /bin/falseinstall algif_aead /bin/falseinstall algif_skcipher /bin/falseinstall algif_hash /bin/falseinstall algif_rng /bin/falseEOF
# unload anything currently loadedmodprobe -r algif_aead algif_skcipher algif_hash algif_rng af_alg 2>/dev/null
# verify: load attempts must failmodprobe algif_aead# modprobe: ERROR: Error running install command '/bin/false' for module af_alg: retcode 1# modprobe: ERROR: could not insert 'algif_aead': Invalid argumentThe two error lines are the proof. They survive reboot.
Side effects to think about: anything using AF_ALG via the kernel crypto API will break. LUKS / cryptsetup, fscrypt, and a handful of userspace crypto fallbacks can hit this. On a stock Proxmox host nothing noticed. On a host doing cryptsetup luksOpen inside containers, validate before applying.
Fixing the APT repos
The host had pve-enterprise.sources active without a subscription, plus ceph.sources pointed at the enterprise channel even though Ceph wasn’t installed (pveceph status returned binary not installed: /usr/bin/ceph-mon). Both got disabled, pve-no-subscription added.
mv /etc/apt/sources.list.d/pve-enterprise.sources \ /etc/apt/sources.list.d/pve-enterprise.sources.disabledmv /etc/apt/sources.list.d/ceph.sources \ /etc/apt/sources.list.d/ceph.sources.disabled
cat > /etc/apt/sources.list.d/pve-no-subscription.sources <<'EOF'Types: debURIs: http://download.proxmox.com/debian/pveSuites: trixieComponents: pve-no-subscriptionSigned-By: /usr/share/keyrings/proxmox-archive-keyring.gpgEOF
apt updateAfter that, apt-cache policy proxmox-kernel-6.17 started reporting candidates like 6.17.13-6 — exactly what I needed.
Picking 6.17.13-6 — one changelog line decided it
The proxmox-default-kernel metapackage points at the 7.0 line. Jumping to 7.0 was an option, but I didn’t want to validate ZFS, intel-microcode, and NIC driver compatibility on the same change window. Stayed on 6.17 and went to 6.17.13-6.
The decision came down to one snippet from the changelog:
proxmox-kernel-6.17 (6.17.13-5) trixie; urgency=medium
* Fix "copy.fail" Local Privilage Escalation / CVE-2026-31431: An unprivileged local user can write 4 controlled bytes into the page cache of any readable file on a Linux system, and use that to gain root.
-- Proxmox Support Team Thu, 30 Apr 2026 08:30:46 +0200
proxmox-kernel-6.17 (6.17.13-6) trixie; urgency=medium
* cherry-pick follow-up commits for copy.fail fixes-5 got the fix, -6 added follow-up commits. Explicit. No guesswork.
I held the metapackage so it wouldn’t drag 7.0 in, then installed the 6.17 line directly.
apt-mark hold proxmox-default-kernelapt-get install -y proxmox-kernel-6.17apt-get full-upgrade -yproxmox-boot-tool refreshupdate-grub registers both the new and the old kernel as menu entries. The new one becomes the default; the old remains as a fallback you can pick from GRUB if anything goes wrong.
After reboot, all 7 LXCs follow automatically — kernel-side only
When you trigger a reboot from your own SSH session, you want the command to return cleanly before the system goes down. systemd-run with a five-second delay is the cleanest pattern I’ve found:
systemd-run --on-active=5sec --unit=manual-reboot.timer systemctl rebootThen poll until SSH comes back. One bash loop is enough:
for i in $(seq 1 30); do if ssh -o ConnectTimeout=5 prod-host 'uname -r; uptime' 2>/dev/null; then break fi sleep 5doneIn my run it took about 30 seconds. New kernel 6.17.13-6-pve, uptime 0.
Once the host is back, all seven LXCs share the new kernel. pct exec <vmid> -- uname -r returns the host’s value. CVE-2026-31431 exposure is closed for every container at this point.
That is not the whole story. Userspace packages are per-container. openssl, libc6, sudo, openssh-server — those are managed by each container’s apt, not the host’s. I checked them all anyway, and that’s where things got interesting.
Isolation policy needs a paired patch path
Two of the containers — call them ct-foo and ct-bar — were both Ubuntu 24.04 LTS. Their report looked like this:
ct-foo: pkgs upgradable: total=0 security=0 unattended-upgrades: missing /var/lib/apt/lists/*Release → 2024-05-07upgradable=0 here is a lie. Their APT catalog hadn’t been updated since May 2024, almost two full years. They couldn’t tell me about new security updates because they’d never looked. unattended-upgrades wasn’t even installed.
The cause: their nameserver pointed only at an internal private DNS, which couldn’t resolve archive.ubuntu.com.
# inside the container$ getent hosts archive.ubuntu.com$ # no response — DNS fail$ cat /etc/resolv.confnameserver 10.x.x.x # private DNS onlyThe isolation itself is a sensible policy — containers shouldn’t have unfettered internet egress. But isolation without a paired patch path (an internal mirror, scheduled bootstrap, anything) just means containers quietly rot. ct-foo rotted for almost two years.
While you’re there: apt’s 0 upgraded line is also a trap.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.It’s tempting to read this as “fully patched.” It isn’t. It looks identical when apt update silently failed. A false negative.
The check you actually want is three lines, not one:
# when did we last see a fresh catalog?ls -la /var/lib/apt/lists/*Release | head -5
# does apt update actually succeed?apt-get update 2>&1 | grep -E "Err:|Failed"
# does the mirror respond at all?timeout 5 curl -sI http://archive.ubuntu.com/ubuntu/dists/noble/ReleaseIf any of those three is broken, apt list --upgradable is lying to you.
Takeaways
- One host kernel = every LXC. That’s the gift and the bill of Type-2 containerization. The host being late means every container is late.
- The modprobe block is a free lunch. A kernel upgrade requires a maintenance window. The mitigation closes the attack surface in under a minute. They’re not interchangeable; they complement each other.
- Isolation policy needs a patch path. Cutting containers off from the public internet is fine. Forgetting to give them a way to receive updates anyway is how you end up with two-year-old userspace running production traffic.
apt-cache policy <pkg>is the shortest diagnostic in the bag. Candidate equal to installed means a repo problem, not a missing fix.
I’m putting this in the runbook so the next CVE in this shape — and there will be one — turns into a 30-minute job, not a half-day audit. This post is that runbook.
BleepingComputer coverage, oss-security advisory, theori-io PoC repository.