Localo's Team Europe CTF Crackme

Localo, one of the coaches of the German ECSC team, created a reversing challenge for the ICC Team Europe qualifiers this year. This writeup explains how I solved it.

If you would like to try this challenge yourself, you can download the challenge here.

Initial Analysis
Understanding the Pattern
Writing an Emulator
Analyzing the Behavior
Recovering the Flag
Conclusion

Initial Analysis

As always, we run the program locally to learn what it does. We see a large banner and a prompt to enter the flag:

If we enter at least 24 characters, the program tells us that the flag is incorrect. Otherwise, nothing happens.

When we open the program in IDA, we see that it consists of one big function. Although the program is huge, there are clear patterns that allow us to split it up into parts. At the start, we see that the stack is filled with hardcoded bytes:

Once the stack has been filled, it is passed to sys_write:

It looks like the first part of the program is reponsible for printing the banner. Because I wanted to be sure that I did not miss anything, I dumped the stack memory with GDB right before sys_write is called. This confirmed that only the banner was written to the stack, and nothing else.

Next, the program calls sys_read 13 times, reading 2 bytes each and storing the result into memory:

After that, the program contains a large number blocks that follow a consistent pattern:

We will look at this pattern in more detail later.

Finally, the program ends with the same pattern that is used to print the banner, except this time it either prints that the flag is correct or that it is not correct, depending on some condition. Our goal is to figure out which input makes the program print that the flag is correct.

Understanding the Pattern

It seems that all the logic of the program is implemented using the pattern that we found in the previous section. Let's look at it in more detail.

First, some observations are:

The rax register points to a large chunk of writable memory, which contains our input near the start and a few constants in the middle.
The rbx register points to the address of the first pattern.

With this in mind, we see that the pattern performs the following operations:

Two values are read from the memory that is pointed to by rax.
The values are subtracted from each other and the result is stored back in memory.
If the first value is lower than the second, the program continues at the next pattern (jb).
Otherwise, the program performs a jump relative to rbx, which either jumps to the next pattern, or to a different pattern.

At the end, whether the input is considered correct depends on the content of a specific memory cell.

I knew that simple patterns like this can be used to implement complicated logic. It reminded me of the movfuscator, which compiles a program using only mov instructions. Unfortunately, since the pattern has more than 70.000 repetitions, it would be very hard to figure out the logic that is implemented through static analysis. Instead, I decided to look for other ways.

Writing an Emulator

Although it is possible to debug compiled code with GDB, which also has support for scripting, I decided to emulate the logic with Python instead. First, I wrote a script that parses every pattern that is present in the program, and writes them into a JSON file in a more readable format. In this script, I also included some assertions to verify that my assumptions about the pattern are correct:

import json
import struct

with open("chal", "rb") as f:
    data = f.read()

def u32(data, offset):
    return struct.unpack_from("<I", data, offset)[0]

program = []
for offset in range(0x1B099, 0x2543F4, 0x21):
    chunk = data[offset:offset+0x21]
    assert chunk[:2] == b"\x8b\x88"
    assert chunk[6:8] == b"\x2b\x88"
    assert chunk[12:14] == b"\x89\x88"
    assert chunk[18:20] == b"\x72\x0d"
    assert chunk[20:23] == b"\x49\x89\xda"
    assert chunk[23:26] == b"\x49\x81\xc2"
    assert chunk[30:33] == b"\x41\xff\xe2"

    src1 = u32(chunk, 2)
    src2 = u32(chunk, 8)
    dst = u32(chunk, 14)
    target = u32(chunk, 26) // 0x21

    assert src1 % 4 == 0
    assert src2 % 4 == 0
    assert dst % 4 == 0

    program.append((src1 // 4, src2 // 4, dst // 4, target))

with open("program.json", "w") as f:
    json.dump(program, f)

After running this script, program.json contains an array of (src1, src2, dst, target) instructions, where src1 and src2 contain the index of the memory cells that are subtracted from each other, dst contains the index of the memory cell where the result is stored, and target contains the index of the instruction where execution continues when src1 is not lower than src2.

Next, I wrote a script that reads program.json and executes it on a given input string:

import json
import struct

INPUT = b"aaaaaaaaaaaaaaaaaaaaaaaaaa"

# Read program
with open("program.json") as f:
    program = json.load(f)

# Read initial content of memory segment
with open("chal", "rb") as f:
    memory = f.read()[0x2C5000:0x2C6000]

# Unpack memory segment into 32-bit cells
memory = list(struct.unpack("<1024I", memory))

# Write our input into the correct memory cells
for i in range(13):
    memory[16 + i] = struct.unpack_from("<H", INPUT, i * 2)[0]

ip = 0 # instruction pointer
while ip < len(program):
    src1, src2, dst, target = program[ip]
    v1 = memory[src1]
    v2 = memory[src2]
    value = (v1 - v2) & 0xFFFFFFFF
    memory[dst] = value

    if v1 < v2:
        ip += 1
    else:
        ip = target

if memory[248] == 0 and memory[249] == 0:
    print("Input is correct")
else:
    print("Input is incorrect")

Using GDB, I also verified that the real program and my emulator have the same memory content after execution, to make sure that I implemented the emulator correctly.

Analyzing the Behavior

Using the emulator that we have written, it becomes much easier to analyze the behavior of the program, and we can also look for side channels. For example, we can easily count the number of instructions that are executed for different inputs. If we are lucky, our measurements show a pattern that allows us to guess the correct input. Unfortunately, counting the number of instructions that are executed did not give us any clues.

A different strategy got me closer to the solution though. I first executed the program with only null bytes as input and saved the content of all memory cells to a file:

import json
import struct

INPUT = struct.pack("<H", 0) * 13

... # run emulator

with open("memory.json", "w") as f:
    json.dump(memory, f)

Then, I edited the input, ran the program again, and observed which memory cells had changed:

import json
import struct

INPUT = struct.pack("<H", 1) * 13

... # run emulator

with open("memory.json") as f:
    baseline = json.load(memory, f)

for i in range(len(memory)):
  if baseline[i] != memory[i]:
    print(i, hex(baseline[i]), hex(memory[i]))

The above script gave the following output:

16 0x0 0x1
17 0x0 0x1
18 0x0 0x1
19 0x0 0x1
20 0x0 0x1
21 0x0 0x1
22 0x0 0x1
23 0x0 0x1
24 0x0 0x1
25 0x0 0x1
26 0x0 0x1
27 0x0 0x1
28 0x0 0x1
249 0x7d21 0x7d20
250 0x53ed 0x53ec
513 0x9619 0x9618
514 0x5339 0x5338
515 0xe1e 0xe1f
516 0xbee7 0xbee6
517 0xd923 0xd922
518 0xdf28 0xdf29
519 0x7654 0x7655
520 0x220f 0x220e
521 0xf739 0xf738
522 0x7c6f 0x7c6e
523 0xb658 0xb659
524 0x3cae 0x3caf
525 0x7f69 0x7f68

At this point, it is also important to remember that the memory cells initially contain some constants in the middle:

By playing around with various input values, I noticed two things:

Memory cells 513 - 525 each contain the XOR of one of these constants and two of our input bytes.
Memory cell 249 contains the XOR of 0x7D21 and the last two input bytes.

When the last two input bytes are [0x21, 0x7d], then memory cell 249 becomes 0. This is !} in ASCII, so a good candidate for the last two characters of the flag.

Recovering the other flag characters was a bit more difficult. I knew that the flag starts with TEC{ and likely ends with !}, so I fed these into the emulator to see what happens:

INPUT = b"TEC{" + bytes(20) + b"!}"

The output was the following:

16 0x0 0x4554
17 0x0 0x7b43
28 0x0 0x7d21
249 0x7d21 0x0
250 0x53ed 0x2ecc
253 0x1 0x0
513 0x9619 0xd34d
514 0x5339 0x287a
525 0x7f69 0x248

I knew that memory cells 513 - 525 contain the XOR of a constant and two input bytes. I was hoping that, if we provide the correct flag characters, the result would be another one of these constants, which would allow us to recover the flag. Unfortunately, this was not the case.

Then, I took a wild guess: what if the result is the XOR of two other constants? This turned out to be correct.

Let's take a look at the first two flag characters. We know that these are T and E (or 0x4554 in ASCII) and are XORed with 0x9619 in memory cell 513. The result is 0xD34D. By trying all combinations, I found two constants that XOR together to 0xD34D:

constants = [
    0x6B3F,
    0x642D,
    0xCE66,
    0x9619,
    ...
    0x6E7B,
    0x6ACF,
    0x2C84,
    0x2ECC
]

for i in range(len(constants) - 1):
    for j in range(i + 1, len(constants)):
        if constants[i] ^ constants[j] == 0xD34D:
            print(i, j, hex(constants[i]), hex(constants[j]))

The output was the following:

53 54 0x23ec 0xf0a1

Notice that these constants are right next to each other in the list. The same was true for the next two flag characters. This will help us later on.

Most importantly, we now know that every two flag characters are the XOR of three constants, one of which is known from memory cells 513 - 525.

Recovering the Flag

We have a list of constants. We know that every pair of two flag characters is the XOR of three of these constants. One of these is known, and the other two are next to each other in the list. This is enough knowledge to recover the flag.

For every pair of flag characters, we take the known constant and XOR it with any two constants that are next to each other in the list. If the result contains only ASCII characters, we have a candidate for the flag:

import struct

constants = [
    ...
]

known = [
    0x9619,
    0x5339,
    0xE1E,
    0xBEE7,
    0xD923,
    0xDF28,
    0x7654,
    0x220F,
    0xF739,
    0x7C6F,
    0xB658,
    0x3CAE,
    0x7F69
]

for k in known:
    for i in range(len(constants) - 1):
        value = k ^ constants[i] ^ constants[i + 1]
        chars = struct.pack("<H", value)
        if 0x20 <= chars[0] <= 0x7F and 0x20 <= chars[1] <= 0x7F:
            print(chars.decode())
    print("\n----\n")

For every two flag characters, we get about 10 - 20 candidates, but most of these can easily be eliminated with common sense. By putting the pieces together, we are able to recover the flag: TEC{pr0_reverse_engin33r!}.

When we enter this into the challenge program, the program displays the Team Europe logo and confirms that we entered the correct flag.

Conclusion

In this challenge, it was very hard to figure out the program with static analysis. Instead, I looked at differences in memory after execution for different inputs, and used that to guess the behavior of the program. For dynamic analysis, it really helps to set up an environment that allows for easy scripting. Perhaps the most important point is that is isn't always necessary to understand the program, as long as you understand enough of it to recover the flag.