copy fail 2: electric boogaloo

2026-05-07 · linuxkernelxfrmespipsecmsg_splice_pagespage-cachecopyfaillpeexploit-dev

i'd been living in the Copy Fail world for a week when Steffen Klassert pushed f4c50a4034 to netdev/net.git on 2026-05-05, with Cc: stable@vger.kernel.org. reported-by Hyunwoo Kim and Kuan-Ting Chen. one paragraph in:

that leaves an ESP-in-UDP packet made from shared pipe pages looking like an ordinary uncloned nonlinear skb. ESP input then takes the no-COW fast path for uncloned skbs without a frag_list and decrypts in place over data that is not owned privately by the skb.

Brad Spengler clocked it as copyfail-class before most of us had finished the commit message, and he was right. it's the same primitive shape as Copy Fail / CVE-2026-31431: kernel AEAD running in-place over splice()'d page-cache pages, in a different subsystem. Theori/Xint walked it through algif_aead. this one walks it through xfrm ESP-in-UDP receive.

the bug

MSG_SPLICE_PAGES attaches pages from a pipe straight onto an skb with no copy, so the skb's frags reference the pipe buffer's pages. the TCP path sets SKBFL_SHARED_FRAG on those skbs, and downstream consumers check that flag before they mutate frag bytes. the IPv4/IPv6 datagram append paths never set it. a UDP skb built with MSG_SPLICE_PAGES therefore looked, to a downstream consumer, like an ordinary uncloned nonlinear skb whose frags it could mutate freely.

the consumer in question is esp_input():

// net/ipv4/esp4.c (pre-fix)
} else if (!skb_has_frag_list(skb)) {
    nfrags = skb_shinfo(skb)->nr_frags;
    nfrags++;

    goto skip_cow;
}

skip_cow jumps past the skb_cow_data() call. what's left runs the AEAD decrypt in place over the existing scatterlist. the frags in that scatterlist are pipe pages we still hold open in userspace, which means they are page-cache pages of whatever file we spliced from.

the kernel writes the decrypt output into those pages. they're still mapped into our pipe and they're still the page cache for the file. the kernel just wrote attacker-influenced bytes into the page cache of any readable file we can splice().

the Fixes: chain spans 2017 (esp no-COW fast path for both v4 and v6) and 2023 (UDP/UDP6 MSG_SPLICE_PAGES support). any mainline kernel carrying all four is in scope.

same shape as copy fail

the original Copy Fail (CVE-2026-31431, Theori/Xint, April 2026) lived in algif_aead: a 2017 in-place optimization let splice()'d page-cache pages get chained into the AEAD destination scatterlist, and authencesn's tag write at dst[assoclen + cryptlen] walked into those pages. a controlled 4-byte write into the page cache, used to edit a setuid binary, for root.

this is the same beat. AEAD, in place, over splice'd page-cache pages, in another subsystem. but here it's the receive side of an ESP-in-UDP NAT-T socket, where the AEAD operation is the entire payload decrypt. the write isn't capped at four bytes. it's capped at the ESP payload length, and we pick that.

both bugs share a module-load posture, and spender flagged it. per oracle.github.io/kconfigs, INET_ESP, INET6_ESP, XFRM_INTERFACE, and xfrm_user all build as =m on every non-Android distro. autoload via request_module from the userns netlink path works, because call_usermodehelper runs in init context. the standard Copy Fail mitigation works here too: drop install <mod> /bin/true in /etc/modprobe.d/, blocklist the modules, and the bug stops loading. distros shipping the relevant configs =y bypass the blocklist, and on xfrm that's Android and nothing else.

the primitive in one diagram

attacker (uid=1001, no privs):

  splice(/etc/passwd, off=N, len=1) -> pipe
     pipe buf = page-cache page of /etc/passwd, ref-counted

  splice(pipe -> udp_sock to 127.0.0.1:4500)
     kernel sets MSG_SPLICE_PAGES
     ip_append_data attaches frag = same page-cache page
     pre-fix: SKBFL_SHARED_FRAG NOT set on this skb

  loopback xmit -> udp_rcv -> udp_encap_rcv (UDP_ENCAP_ESPINUDP)
                                -> xfrm_input
                                -> esp_input
                                -> no-COW fast path
                                -> AES-GCM decrypt IN PLACE

  -> /etc/passwd page-cache page now reads attacker plaintext

the receive socket lives in our own netns. we install the matching xfrm SA ourselves with CAP_NET_ADMIN-in-userns, and we choose the SA key. the loopback path keeps page identity the whole way through. loopback_xmit only does skb_orphan(), skb_orphan_frags_rx in __netif_receive_skb only fires on SKBFL_ZEROCOPY_FRAG and not SKBFL_SHARED_FRAG, and xfrm_input only does skb_cow_head. the frags pass through untouched.

arbitrary plaintext? not quite

the plaintext written into the page-cache page is ciphertext_byte XOR keystream(K, IV). we own the SA key, so the keystream is fully attacker-determined. but solving "find K such that AES-CTR outputs specific bytes" means inverting AES, which we can't.

for one-byte targets at chosen offsets none of that matters. fix K, sweep IV, read off keystream byte zero, stop when it XORs the original byte to the value you want. averages ~256 trials per byte, ~30ms per fire including the splice and the round trip.

multi-byte targeted writes scale 2^(8N) in the IV search: one byte costs 2^8, two bytes 2^16, four bytes 2^32 (~seconds), eight bytes out of reach. the LPE chain doesn't need long contiguous writes though. it needs a hundred independent single-byte writes fired in a loop, ~22s wall on a stock Ubuntu 26.04 box.

chain to root

the page-cache write is one byte at a time. we use it to overwrite a victim nologin line in /etc/passwd with a length-matched, valid 7-field entry: name sick, empty password field, UID and GID both 0, shell /bin/bash. the line we target on stock Ubuntu 26.04 is this one:

gnome-remote-desktop:x:980:980:GNOME Remote Desktop:/var/lib/gnome-remote-desktop:/usr/sbin/nologin

99 bytes. the replacement is sick::0:0:<76 X's>:/:/bin/bash, also 99 bytes, valid as a passwd(5) entry. fire 99 single-byte writes in a loop, each through its own ESP packet, each picking up a fresh IV that XORs the current ciphertext byte to the target byte. ~22 seconds end to end on a stock kernel.

then su - sick. pam_unix.so with nullok, which is the default in /etc/pam.d/common-auth on nearly every mainstream distro, treats an empty stored password field as "accept empty input." su reads the password line off stdin, gets EOF or empty, PAM returns success, setuid(0). the resulting shell is uid=0. no sudo, no SUID helper, no real password.

the sick line is persistent. it stays in /etc/passwd after the exploit exits, so re-running drops straight back into root with no work. the repo also ships a --clean mode that runs the same primitive in reverse, restoring the original gnome-remote-desktop line and removing the backdoor. byte-flip in, byte-flip out.

total wall time, cold start, no preloaded modules, no sudo, no privileged group: ~22s for the 99 writes plus negligible PAM. the kernel autoloads esp4, xfrm_user, and xfrm_algo on the first ip xfrm state add from inside the userns.

what mitigations don't apply

this is a page-cache write, not a slab UAF. the entire stack of slab- and heap-shaped userspace-LPE defenses sits to one side of it:

CONFIG_RANDOM_KMALLOC_CACHES
CONFIG_INIT_ON_ALLOC_DEFAULT_ON
CONFIG_SLAB_FREELIST_RANDOM
CONFIG_HARDENED_USERCOPY
CONFIG_SLAB_VIRTUAL
SLAB_BUCKETS
KASLR

not one of them touches a kernel write into a legitimate page-cache page the kernel believes it owns. kernel.apparmor_restrict_unprivileged_userns=1 goes down the same way every other userns-dependent exploit on Resolute goes down in 2026: aa-rootns's crun-then-chrome profile re-execs into a clean userns with full Ambient caps. that bypass is documented separately and shipped in the repo.

the Copy Fail mitigation (install <mod> /bin/true in modprobe.d) applies here just as it did above. the distros that don't blocklist the xfrm modules, which is everyone, ship the bug exploitable by default.

repo

github.com/0xdeadbeefnetwork/Copy_Fail2-Electric_Boogaloo, four files. copyfail2.c is the page-cache write primitive (single-byte, takes target file + offset + desired byte, brute-forces IV against a fixed K, splices the wire frame, fires the bug). aa-rootns.c is the userns harness. run.sh ties it together: pick the longest nologin line in /etc/passwd, compute the per-byte flip set to rewrite it as sick::0:0:<pad>:/:/bin/bash, fire the flips through copyfail2, stash the original line at /var/tmp/.cf2.state so we can revert, then exec su - sick. ./run.sh --clean reads the state file, computes the reverse flip set, fires it, and removes the backdoor.

no sudo. no PAM dance. no precondition past the kernel module autoload, which any unprivileged user gets for free off the userns netlink path.

credits

the bug, the fix, and the framing all belong to other people. this post and the repo are exploitation work on top of their findings.

Hyunwoo Kim (imv4bel) and Kuan-Ting Chen: reported, tested, authored the upstream fix.
Steffen Klassert: IPsec maintainer, signed off and posted the fix to netdev/net.git.
Brad Spengler (@spendergrsec / grsecurity). called this copyfail-class before anyone else read the commit, and corrected the framing on INET_ESP/INET6_ESP module status.
Theori / Xint: original Copy Fail (CVE-2026-31431) discovery and write-up. the conceptual vocabulary is theirs.

. _SiCk · afflicted.sh