Patching a Proxmox host and its LXCs for Copy Fail (CVE-2026-31431)

CVE-2026-31431 (“Copy Fail”) is a local privilege escalation in algif_aead. An unprivileged user can write four controlled bytes anywhere in the page cache and pivot that into root by corrupting a setuid binary. The mainline fix landed in 6.18.22 / 6.19.12 / 7.0, but every distro and hypervisor backports on its own schedule — so “I’m patched” is a per-environment question.

The short version#

LXCs share the host kernel. Patch the host once and all seven containers are covered.
The modprobe /bin/false mitigation is a one-minute, no-reboot fix. Apply it before scheduling kernel work, not after.
apt-cache policy proxmox-kernel-6.17 is the fastest way to tell if your APT repo is healthy. If the candidate equals the installed version, the repo is broken — not the kernel.

The diagnosis narrows to three axes#

1
1. Does uname -r clear the patch cutoff?
2
2. Are AF_ALG / algif_* loaded — or, more importantly, autoload-able? (lsmod, modprobe.d)
3
3. Does APT actually see new kernels? (apt-cache policy)

The host was running 6.17.2-1-pve, built 2025-10-21. That’s six months before the public fix even existed, so backport status was the only thing that could save it.

apt-cache policy proxmox-kernel-6.17 came back with the candidate equal to the installed version. Translation: APT couldn’t see anything newer. Root cause: only pve-enterprise.sources was active, and without a subscription it 401s on enterprise.proxmox.com. The pve-no-subscription repo was missing.

Mitigation closes the attack surface in under a minute#

I closed the attack surface before touching the kernel. AF_ALG modules are autoloaded on the first socket(AF_ALG, ...) call, so “not in lsmod” is not safety. You have to block the load itself.

1
cat > /etc/modprobe.d/disable-algif-cve-2026-31431.conf <<'EOF'
2
install af_alg /bin/false
3
install algif_aead /bin/false
4
install algif_skcipher /bin/false
5
install algif_hash /bin/false
6
install algif_rng /bin/false
7
EOF
8

9
# unload anything currently loaded
10
modprobe -r algif_aead algif_skcipher algif_hash algif_rng af_alg 2>/dev/null
11

12
# verify: load attempts must fail
13
modprobe algif_aead
14
# modprobe: ERROR: Error running install command '/bin/false' for module af_alg: retcode 1
15
# modprobe: ERROR: could not insert 'algif_aead': Invalid argument

The two error lines are the proof. They survive reboot.

Side effects to think about: anything using AF_ALG via the kernel crypto API will break. LUKS / cryptsetup, fscrypt, and a handful of userspace crypto fallbacks can hit this. On a stock Proxmox host nothing noticed. On a host doing cryptsetup luksOpen inside containers, validate before applying.

Fixing the APT repos#

The host had pve-enterprise.sources active without a subscription, plus ceph.sources pointed at the enterprise channel even though Ceph wasn’t installed (pveceph status returned binary not installed: /usr/bin/ceph-mon). Both got disabled, pve-no-subscription added.

1
mv /etc/apt/sources.list.d/pve-enterprise.sources \
2
   /etc/apt/sources.list.d/pve-enterprise.sources.disabled
3
mv /etc/apt/sources.list.d/ceph.sources \
4
   /etc/apt/sources.list.d/ceph.sources.disabled
5

6
cat > /etc/apt/sources.list.d/pve-no-subscription.sources <<'EOF'
7
Types: deb
8
URIs: http://download.proxmox.com/debian/pve
9
Suites: trixie
10
Components: pve-no-subscription
11
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
12
EOF
13

14
apt update

After that, apt-cache policy proxmox-kernel-6.17 started reporting candidates like 6.17.13-6 — exactly what I needed.

Picking 6.17.13-6 — one changelog line decided it#

The proxmox-default-kernel metapackage points at the 7.0 line. Jumping to 7.0 was an option, but I didn’t want to validate ZFS, intel-microcode, and NIC driver compatibility on the same change window. Stayed on 6.17 and went to 6.17.13-6.

The decision came down to one snippet from the changelog:

1
proxmox-kernel-6.17 (6.17.13-5) trixie; urgency=medium
2

3
  * Fix "copy.fail" Local Privilage Escalation / CVE-2026-31431:
4
    An unprivileged local user can write 4 controlled bytes into the page cache
5
    of any readable file on a Linux system, and use that to gain root.
6

7
 -- Proxmox Support Team  Thu, 30 Apr 2026 08:30:46 +0200
8

9
proxmox-kernel-6.17 (6.17.13-6) trixie; urgency=medium
10

11
  * cherry-pick follow-up commits for copy.fail fixes

-5 got the fix, -6 added follow-up commits. Explicit. No guesswork.

I held the metapackage so it wouldn’t drag 7.0 in, then installed the 6.17 line directly.

1
apt-mark hold proxmox-default-kernel
2
apt-get install -y proxmox-kernel-6.17
3
apt-get full-upgrade -y
4
proxmox-boot-tool refresh

update-grub registers both the new and the old kernel as menu entries. The new one becomes the default; the old remains as a fallback you can pick from GRUB if anything goes wrong.

After reboot, all 7 LXCs follow automatically — kernel-side only#

When you trigger a reboot from your own SSH session, you want the command to return cleanly before the system goes down. systemd-run with a five-second delay is the cleanest pattern I’ve found:

1
systemd-run --on-active=5sec --unit=manual-reboot.timer systemctl reboot

Then poll until SSH comes back. One bash loop is enough:

1
for i in $(seq 1 30); do
2
  if ssh -o ConnectTimeout=5 prod-host 'uname -r; uptime' 2>/dev/null; then
3
    break
4
  fi
5
  sleep 5
6
done

In my run it took about 30 seconds. New kernel 6.17.13-6-pve, uptime 0.

Once the host is back, all seven LXCs share the new kernel. pct exec <vmid> -- uname -r returns the host’s value. CVE-2026-31431 exposure is closed for every container at this point.

That is not the whole story. Userspace packages are per-container. openssl, libc6, sudo, openssh-server — those are managed by each container’s apt, not the host’s. I checked them all anyway, and that’s where things got interesting.

Isolation policy needs a paired patch path#

Two of the containers — call them ct-foo and ct-bar — were both Ubuntu 24.04 LTS. Their report looked like this:

1
ct-foo:
2
  pkgs upgradable: total=0  security=0
3
  unattended-upgrades: missing
4
  /var/lib/apt/lists/*Release  → 2024-05-07

upgradable=0 here is a lie. Their APT catalog hadn’t been updated since May 2024, almost two full years. They couldn’t tell me about new security updates because they’d never looked. unattended-upgrades wasn’t even installed.

The cause: their nameserver pointed only at an internal private DNS, which couldn’t resolve archive.ubuntu.com.

1
# inside the container
2
$ getent hosts archive.ubuntu.com
3
$ # no response — DNS fail
4
$ cat /etc/resolv.conf
5
nameserver 10.x.x.x   # private DNS only

The isolation itself is a sensible policy — containers shouldn’t have unfettered internet egress. But isolation without a paired patch path (an internal mirror, scheduled bootstrap, anything) just means containers quietly rot. ct-foo rotted for almost two years.

While you’re there: apt’s 0 upgraded line is also a trap.

1
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

It’s tempting to read this as “fully patched.” It isn’t. It looks identical when apt update silently failed. A false negative.

The check you actually want is three lines, not one:

1
# when did we last see a fresh catalog?
2
ls -la /var/lib/apt/lists/*Release | head -5
3

4
# does apt update actually succeed?
5
apt-get update 2>&1 | grep -E "Err:|Failed"
6

7
# does the mirror respond at all?
8
timeout 5 curl -sI http://archive.ubuntu.com/ubuntu/dists/noble/Release

If any of those three is broken, apt list --upgradable is lying to you.

Takeaways#

One host kernel = every LXC. That’s the gift and the bill of Type-2 containerization. The host being late means every container is late.
The modprobe block is a free lunch. A kernel upgrade requires a maintenance window. The mitigation closes the attack surface in under a minute. They’re not interchangeable; they complement each other.
Isolation policy needs a patch path. Cutting containers off from the public internet is fine. Forgetting to give them a way to receive updates anyway is how you end up with two-year-old userspace running production traffic.
apt-cache policy <pkg> is the shortest diagnostic in the bag. Candidate equal to installed means a repo problem, not a missing fix.

I’m putting this in the runbook so the next CVE in this shape — and there will be one — turns into a 30-minute job, not a half-day audit. This post is that runbook.

BleepingComputer coverage, oss-security advisory, theori-io PoC repository.