Radare2 (r2) is an open-source, unix-like, reverse engineering framework–one with a large and diverse group of contributors. As a result, it has many commonly overlooked features, such as r2 signatures
In this blog post, you will learn how r2 signatures–or “zignatures”–can save time reversing statically compiled stripped binaries.
New to r2?
If you love code golf, command lines, and vim, then you will enjoy r2. R2 has a similar learning curve–getting started is tough, but worth it. For those who like clicking things, or would rather get a running start, check out r2’s official GUI Cutter. Either way, you should be able to follow along with the post.
Statically compiled, stripped executables
An executable needs to know the name of a function in order to look it up in an external library. This means a decent disassembler can parse an executable import table to easily discover all the needed external functions. This quickly gives a reverse engineer context as to what an executable does and where it does it.
Statically compiling an executable builds the external libraries into the executable–which means those library functions don’t need to be looked up because the executable knows exactly where they are. When the executable is stripped, the function names are removed and the reverse engineer has to put in a lot more work to get some context into what a program is doing.
Imagine we are tasked with reversing a statically compiled and stripped ELF file. Let's start by trying to find some libc function calls.
A quick warmup: what function is being called here?
lea rdi, str.usage:__s__options___file ; 0x47f00b ; "usage: %s [options] <file>\n" mov eax, 0 call fcn.00408b50
The `lea` instruction is preparing the first argument to a function call. The `rdi` register will then point to a format string. Since the `fcn.00408b50` function accepts the format string as the first argument, it is probably printf. We don’t need to confirm; we can quickly label it `sym.printf_prbly` with the command `afn sym.printf_prbly @fcn.00408b50`.
Now we can move on with reversing. We also know that if this particular function shows up again, we will immediately have some context into the code surrounding it.
That one was easy. How about this next one: can you tell what function is being called?
mov rax, qword [var_8h] mov rdi, rax call fcn.0040f840 mov eax, 0 leave ret
A single stack variable (var_8h) is being passed to the function fcn.0040f840. Could this be `malloc`? The return value in `rax` is clobbered right after the function call, so no. What libc function could this be? Is it even libc? We need more information.
Opening the `fcn.0040f840` function in the visual block mode (`VV` command) is a bit overwhelming. There are 36 basic blocks cascading downward with 55 connecting edges (function info via `afi` command). It’s a mess and will take some time to figure out all the logic. Time spent reversing this function may be wasted if it is a false path.
Reversing is all about using tricks to avoid doing more work than necessary. So, instead of diving into the assembly, let's use signatures from r2.
First, we need a signature database file. We can quickly create such a file using the rasign2 utility.
$ rasign2 -o /tmp/libc_zigs.sdb /lib/x86_64-linux-gnu/libc.so.6 [x] Analyze all flags starting with sym. and entry0 (aa) generated zignatures: 1843
Note: I know–I’m cheating here. I’m already aware of which libc version is being used because I compiled the binary myself. When it comes to learning, it is best to work in a known environment. We will see why shortly. I address how to do this–without cheating–in a later section.
Great, now we have a `/tmp/libc_zigs.sdb` file with our signatures in it. Let's just load them up (`zo`) and try to match the current function (`z.`).
[0x0040f840]> zo /tmp/libc_zigs.sdb [0x0040f840]> z. [+] searching 0x0040f840 - 0x0040f940 [+] searching function metrics hits: 0
No matches!?! With the exact same libc binary file!?!?! This must be a bug!!! Maybe, or maybe something else is going on.
Relax, we are reverse engineers, we can figure this out and learn as a result.
Let's name the unknown function “sym.unknown” and make a signature for it. From there we can do some manual comparisons and figure out what’s going on.
In the above command we sent the output of `z*` into r2’s internal grep `~`. This way we only see the signatures for the `sym.unkown` function. Each signature has a type. You can see the types with the `za??` command.
By default, r2 tries to match all signature types but you can change that with flags:
[0x0040f840]> e zign. zign.autoload = false zign.bytes = true zign.diff.bthresh = 1.0 zign.diff.gthresh = 1.0 zign.graph = true zign.hash = true zign.maxsz = 500 zign.mincc = 10 zign.minsz = 16 zign.offset = true zign.prefix = sign zign.refs = true zign.types = true [0x0040f840]> e zign.graph = false # don’t match on graphs
Apparently, none of the zignatures were an exact match. Maybe one was close, though.
We will try and do matching ourselves. Take a closer look at the `g` format signature above. This is the “graph metrics” signature. Here it is again:
[0x0040f840]> z* ~sym.unkown g za sym.unkown g cc=21 nbbs=36 edges=55 ebbs=1 bbsum=592
See the numbers 36 for nbbs? That is the number of basic blocks we have. The edges value is the number of edges connecting the blocks. Are you wondering how many libc functions match just those two pieces of information? We have libc zignatures already loaded, so we can print all of them and grep (~) for “nbbs=36 edges=55”.
[0x0040f840]> z* ~nbbs=36 edges=55 za sym.unkown g cc=21 nbbs=36 edges=55 ebbs=1 bbsum=592 za sym.fclose g cc=21 nbbs=36 edges=55 ebbs=1 bbsum=583
How about that! We only have two matches; one is our unknown function, and the other is fclose. The fclose function does take in one parameter and its return value is often ignored. If we want, we can further verify by manually comparing the unknown function to fclose in our libc.so file.
Finding the correct libc version
So, I cheated in this example. Since I compiled the target binary, I had the same version of libc at hand. This improved the ability to match the unknown function.
What can be done if you don’t know which libc version that was used?
- Obtain a nice list of libc candidates. This is not too hard. Linux has repos where things are stored in order. Check out https://github.com/niklasb/libc-database.
- Find the closest match to our binary. We can create a zignature for a known libc function in our static binary, such as the printf we found earlier. From there, we can search our libc database for the closest match. You should keep track of what metrics match best.
- Make zignatures for that libc database and use the same metrics to search for unknown functions.
The above is probably best handled with r2pipe. Whenever you find yourself running the same commands over and over again and wanting to quickly automate it, r2pipe should be your first thought.
Why did r2 fail to match?
Since this test was with the very same libc version, I was surprised that r2 failed to find a match. I figured static compiling would copy over the fclose bytes and then fix relocations.
It turns out a few instructions got mixed up.
You can quickly find where things are wrong by comparing the “bytes pattern” signatures with each other. I wrote a quick python script that does a logical AND of the byte pattern with the mask and the signature, and then a logical XOR with the two signatures. This showed there were several differences. The first difference is 84 bytes in, so we can skip there in r2 with `s+84`. I opened files in two different terminals and compared them.
Here are the instructions:
0x0040f893 4889d9 mov rcx, rbx 0x0040f896 4829d0 sub rax, rdx
0x00073cd3 4829d0 sub rax, rdx 0x00073cd6 4889d9 mov rcx, rbx
Computers are jerks. For some reason those two instructions were flipped in the statically compiled executable. Switching these instructions does not break the algorithm. The state of the CPU will be the same regardless of which instruction comes first.
It’s these little things that break the signature matching and make parsing algorithms with computers so difficult. It’s these little things that make people hate reverse engineering and think their tool does not work–but it’s also these little things that provide an opportunity to learn more and gain a greater appreciation for the complexity of computers and algorithms.
Zignatures contain basic metadata about the function. Some of the information is general, such as the number of basic blocks; other signatures are more specific, like the “bytes pattern.” Understanding this information can help you to quickly search known libraries for potential matches to an unknown function and save you a lot of reversing time.
It’s good to be aware, however, that matches will not always be perfect–computers are much more complicated than that. The data provided by Radare2 allows you to quickly perform matches in a way you wish. If you come up with a better matching algorithm, consider adding it r2. It will run faster as a built-in function. Sharing also means it will be maintained and kept in r2 with regression tests.
Hopefully this blog post helped you learn a bit more about Radar2 and how r2 zignatures can be leveraged in demasking common functions.