copy fail 2: electric boogaloo
i'd been living in the Copy Fail world for a week when Steffen Klassert pushed
f4c50a4034
to netdev/net.git on 2026-05-05, with
Cc: stable@vger.kernel.org. reported-by Hyunwoo Kim and Kuan-Ting
Chen. one paragraph in:
that leaves an ESP-in-UDP packet made from shared pipe pages looking like an ordinary uncloned nonlinear skb. ESP input then takes the no-COW fast path for uncloned skbs without a frag_list and decrypts in place over data that is not owned privately by the skb.
Brad Spengler clocked it as copyfail-class before most of us had finished the
commit message, and he was right. it's the same primitive shape as
Copy Fail / CVE-2026-31431: kernel AEAD
running in-place over splice()'d page-cache pages, in a different
subsystem. Theori/Xint walked it through algif_aead. this one
walks it through xfrm ESP-in-UDP receive.
the bug
MSG_SPLICE_PAGES attaches pages from a pipe straight onto an skb
with no copy, so the skb's frags reference the pipe buffer's pages. the TCP
path sets SKBFL_SHARED_FRAG on those skbs, and downstream
consumers check that flag before they mutate frag bytes. the IPv4/IPv6
datagram append paths never set it. a UDP skb built with
MSG_SPLICE_PAGES therefore looked, to a downstream consumer, like
an ordinary uncloned nonlinear skb whose frags it could mutate freely.
the consumer in question is esp_input():
// net/ipv4/esp4.c (pre-fix)
} else if (!skb_has_frag_list(skb)) {
nfrags = skb_shinfo(skb)->nr_frags;
nfrags++;
goto skip_cow;
}
skip_cow jumps past the skb_cow_data() call. what's
left runs the AEAD decrypt in place over the existing scatterlist. the frags
in that scatterlist are pipe pages we still hold open in userspace, which means
they are page-cache pages of whatever file we spliced from.
the kernel writes the decrypt output into those pages. they're still mapped
into our pipe and they're still the page cache for the file. the kernel just
wrote attacker-influenced bytes into the page cache of any readable file we can
splice().
the Fixes: chain spans 2017 (esp no-COW fast path for both v4 and
v6) and 2023 (UDP/UDP6 MSG_SPLICE_PAGES support). any mainline
kernel carrying all four is in scope.
same shape as copy fail
the original Copy Fail (CVE-2026-31431, Theori/Xint, April 2026) lived in
algif_aead: a 2017 in-place optimization let
splice()'d page-cache pages get chained into the AEAD destination
scatterlist, and authencesn's tag write at
dst[assoclen + cryptlen] walked into those pages. a controlled
4-byte write into the page cache, used to edit a setuid binary, for root.
this is the same beat. AEAD, in place, over splice'd page-cache pages, in another subsystem. but here it's the receive side of an ESP-in-UDP NAT-T socket, where the AEAD operation is the entire payload decrypt. the write isn't capped at four bytes. it's capped at the ESP payload length, and we pick that.
both bugs share a module-load posture, and spender flagged it. per
oracle.github.io/kconfigs,
INET_ESP, INET6_ESP, XFRM_INTERFACE, and
xfrm_user all build as =m on every non-Android distro.
autoload via request_module from the userns netlink path works,
because call_usermodehelper runs in init context. the standard
Copy Fail mitigation works here too: drop
install <mod> /bin/true in /etc/modprobe.d/,
blocklist the modules, and the bug stops loading. distros shipping the relevant
configs =y bypass the blocklist, and on xfrm that's Android and
nothing else.
the primitive in one diagram
attacker (uid=1001, no privs):
splice(/etc/passwd, off=N, len=1) -> pipe
pipe buf = page-cache page of /etc/passwd, ref-counted
splice(pipe -> udp_sock to 127.0.0.1:4500)
kernel sets MSG_SPLICE_PAGES
ip_append_data attaches frag = same page-cache page
pre-fix: SKBFL_SHARED_FRAG NOT set on this skb
loopback xmit -> udp_rcv -> udp_encap_rcv (UDP_ENCAP_ESPINUDP)
-> xfrm_input
-> esp_input
-> no-COW fast path
-> AES-GCM decrypt IN PLACE
-> /etc/passwd page-cache page now reads attacker plaintext
the receive socket lives in our own netns. we install the matching xfrm SA
ourselves with CAP_NET_ADMIN-in-userns, and we choose the SA key.
the loopback path keeps page identity the whole way through.
loopback_xmit only does skb_orphan(),
skb_orphan_frags_rx in __netif_receive_skb only fires
on SKBFL_ZEROCOPY_FRAG and not SKBFL_SHARED_FRAG, and
xfrm_input only does skb_cow_head. the frags pass
through untouched.
arbitrary plaintext? not quite
the plaintext written into the page-cache page is
ciphertext_byte XOR keystream(K, IV). we own the SA key, so the
keystream is fully attacker-determined. but solving "find K such that AES-CTR
outputs specific bytes" means inverting AES, which we can't.
for one-byte targets at chosen offsets none of that matters. fix K, sweep IV, read off keystream byte zero, stop when it XORs the original byte to the value you want. averages ~256 trials per byte, ~30ms per fire including the splice and the round trip.
multi-byte targeted writes scale 2^(8N) in the IV search: one byte costs 2^8, two bytes 2^16, four bytes 2^32 (~seconds), eight bytes out of reach. the LPE chain doesn't need long contiguous writes though. it needs a hundred independent single-byte writes fired in a loop, ~22s wall on a stock Ubuntu 26.04 box.
chain to root
the page-cache write is one byte at a time. we use it to overwrite a victim
nologin line in /etc/passwd with a length-matched,
valid 7-field entry: name sick, empty password field, UID and GID
both 0, shell /bin/bash. the line we target on stock
Ubuntu 26.04 is this one:
gnome-remote-desktop:x:980:980:GNOME Remote Desktop:/var/lib/gnome-remote-desktop:/usr/sbin/nologin
99 bytes. the replacement is
sick::0:0:<76 X's>:/:/bin/bash, also 99 bytes, valid as a
passwd(5) entry. fire 99 single-byte writes in a loop, each through
its own ESP packet, each picking up a fresh IV that XORs the current ciphertext
byte to the target byte. ~22 seconds end to end on a stock kernel.
then su - sick. pam_unix.so with nullok,
which is the default in /etc/pam.d/common-auth on nearly every
mainstream distro, treats an empty stored password field as "accept empty
input." su reads the password line off stdin, gets EOF
or empty, PAM returns success, setuid(0). the resulting shell is
uid=0. no sudo, no SUID helper, no real password.
the sick line is persistent. it stays in /etc/passwd
after the exploit exits, so re-running drops straight back into root with no
work. the repo also ships a --clean mode that runs the same
primitive in reverse, restoring the original
gnome-remote-desktop line and removing the backdoor. byte-flip in,
byte-flip out.
total wall time, cold start, no preloaded modules, no sudo, no privileged
group: ~22s for the 99 writes plus negligible PAM. the kernel autoloads
esp4, xfrm_user, and xfrm_algo on the
first ip xfrm state add from inside the userns.
what mitigations don't apply
this is a page-cache write, not a slab UAF. the entire stack of slab- and heap-shaped userspace-LPE defenses sits to one side of it:
CONFIG_RANDOM_KMALLOC_CACHESCONFIG_INIT_ON_ALLOC_DEFAULT_ONCONFIG_SLAB_FREELIST_RANDOMCONFIG_HARDENED_USERCOPYCONFIG_SLAB_VIRTUAL- SLAB_BUCKETS
- KASLR
not one of them touches a kernel write into a legitimate page-cache page the
kernel believes it owns.
kernel.apparmor_restrict_unprivileged_userns=1 goes down the same
way every other userns-dependent exploit on Resolute goes down in 2026:
aa-rootns's crun-then-chrome profile
re-execs into a clean userns with full Ambient caps. that bypass is documented
separately and shipped in the repo.
the Copy Fail mitigation
(install <mod> /bin/true in modprobe.d) applies
here just as it did above. the distros that don't blocklist the xfrm modules,
which is everyone, ship the bug exploitable by default.
repo
github.com/0xdeadbeefnetwork/Copy_Fail2-Electric_Boogaloo,
four files. copyfail2.c is the page-cache write primitive
(single-byte, takes target file + offset + desired byte, brute-forces IV
against a fixed K, splices the wire frame, fires the bug).
aa-rootns.c is the userns harness. run.sh ties it
together: pick the longest nologin line in
/etc/passwd, compute the per-byte flip set to rewrite it as
sick::0:0:<pad>:/:/bin/bash, fire the flips through
copyfail2, stash the original line at
/var/tmp/.cf2.state so we can revert, then
exec su - sick. ./run.sh --clean reads the state file,
computes the reverse flip set, fires it, and removes the backdoor.
no sudo. no PAM dance. no precondition past the kernel module autoload, which any unprivileged user gets for free off the userns netlink path.
credits
the bug, the fix, and the framing all belong to other people. this post and the repo are exploitation work on top of their findings.
- Hyunwoo Kim (imv4bel) and Kuan-Ting Chen: reported, tested, authored the upstream fix.
- Steffen Klassert: IPsec maintainer, signed
off and posted the fix to
netdev/net.git. - Brad Spengler
(@spendergrsec /
grsecurity). called this copyfail-class before anyone else read
the commit, and corrected the framing on
INET_ESP/INET6_ESPmodule status. - Theori / Xint: original Copy Fail (CVE-2026-31431) discovery and write-up. the conceptual vocabulary is theirs.
. _SiCk · afflicted.sh