Comparison of the LLVM IR generated by three binary-to-llvm translators

17th september 2019
David Korczynski,
Security Research & Security Engineering

Introduction

In recent years there has been a significant increase in the interest into the LLVM project from security practitioners in both industry and academia. The reason for this is largely that the LLVM project offers a convenient way to develop novel program analysis tools and techniques allowing researchers to rapidly prototype new systems. This has resulted in a a significant amount of work being contributed to the project and attracted more and more engineers, resulting in a compound-like growth. There are many popular security-oriented projects that rely heavily on LLVM with some of the popular ones being KLEE, AddressSanitizer, LibFuzzer, S2E and PANDA. In addition to this, LLVM has a large body of analysis capabilities integrated into the compiler framework itself, which can perform a broad series of optimisations, security analyses and so on.

The analysis capabilities that comes with the LLVM framework has also attracted practitioners from the binary analysis domain. Although LLVM is a compiler framework and thus largley focused on forward-engineering, the project and its surrounding tools offer such a rich set of analysis capabilities that it is attractive to use them in a reverse engineering context. To this end, several tools are being developed that translate binary code into the LLVM intermediate representation (IR), sometimes called binary lifters or binary raisers. However, despite these tools offering the same overall goal, namely translation of binary executables to LLVM modules, they each have very different properties and maturity-levels. As such, it can be difficult to assess which tool is most likely the best solution to a given problem. In this blogpost we share some brief insights into the code produced by some of these lifters through an empirical comparison between the LLVM code created by three different translators when matched with the same binary code samples.

We have picked three popular binary-to-llvm translators, namely mcsema by Trail of Bits, mctoll (acronym for machine code to LLVM I believe) by Microsoft and retdec by Avast. The goal of this blogpost is to focus purely on the generated LLVM IR, rather than creating an overall assessment of each project on their strengths and weaknesses. Perhaps most importantly, we are not going to investigate their overall maturity and how each of them work against a large and diverse sample database as this deserves a study on its own.

The procedure we will use to assess the projects are simply to translate a given binary with each of the three projects, and then inspect the LLVM code produced by each of them. Specifically, the steps we deploy when assessing the code are the following:

In total we have created six small C code samples that we will use for our study. We will go into details with two of these samples and then leave the results from the other samples as a reference point for those who are interested.

Setup

All of the source that we have used in this study is open source. Specifically, we use the latest version of mctoll which can be retrieved from here. For Retdec we used the released binary for version 3.3 retrieved from here. The McSema version that we will be using is the one from Lukas Korencik's Master's thesis found here which relies on the DynInst framework for disassembling a given binary. As such, all of the software that we use is completely open source and, in particular, no proprietary disassembler is needed to repeat our experiments.

Result overview

LLVM IR size

The following two tables display the lines of code of the LLVM IR from compiling the source code directly and lifting the binary with mctoll, retdec and mcsema, respectively, as well as the size in bytes of the compiled bitcode file (the binary version of the LLVM IR). There are two tables, one for the the non-optimised LLVM code and one where we have applied LLVM optimisations on the given LLVM code using opt -O3.

Non-optimised LLVM

Test LoC LLVM LoC mctoll LoC retdec LoC mcsema .bc size LLVM .bc size mctoll .bc size retdec .bc size mcsema
1 41 35 206 1212 2264 1140 3424 39140
2 49 49 202 1199 2316 1204 3348 39108
3 52 49 216 1265 2388 1276 3480 39812
4 77 87 253 1320 3428 1544 4088 41260
5 116 149 291 1466 3344 1764 4276 41868
6 253 386 413 1778 4612 3144 5908 43924

Optimised LLVM

Test LoC LLVM LoC mctoll LoC retdec LoC mcsema .bc size LLVM .bc size mctoll .bc size retdec .bc size mcsema
1 27 19 137 1179 2116 1052 3444 40892
2 28 20 139 1160 2160 1100 3348 40808
3 31 22 142 1230 2256 1112 3480 41584
4 62 78 203 1280 3292 1600 4088 43172
5 115 94 220 1398 3864 1668 4488 43716
6 178 164 314 1728 4716 2364 6056 46428

Function count in produced LLVM The following table lists the number of functions defined in the LLVM modules of the non-optimizes version:

Test func count LLVM func count mctoll func count retdec func count mcsema
1 2 2 12 25
2 2 2 12 25
3 3 3 13 27
4 2 2 14 27
5 2 2 13 26
6 2 2 14 27

Qualitative discussion

In order to give a better intuition behind the code we will discuss two test samples in more depth.

Test1
First, we will go through test1 where the primary function we lift is the following function:

This code is compiled into the following LLVM code by clang and no optimisations:

and is compiled to the following code with clang and heavy optimisations (O3):

The function when compiled with GCC looks as follows:

(gdb) disass add
Dump of assembler code for function add:
   0x00000000000005fa <+0>:     push   rbp
   0x00000000000005fb <+1>:     mov    rbp,rsp
   0x00000000000005fe <+4>:     mov    DWORD PTR [rbp-0x4],edi
   0x0000000000000601 <+7>:     mov    DWORD PTR [rbp-0x8],esi
   0x0000000000000604 <+10>:    mov    edx,DWORD PTR [rbp-0x4]
   0x0000000000000607 <+13>:    mov    eax,DWORD PTR [rbp-0x8]
   0x000000000000060a <+16>:    add    eax,edx
   0x000000000000060c <+18>:    pop    rbp
   0x000000000000060d <+19>:    ret    
End of assembler dump.
As such, the non-optimised version of the LLVM is incredibly similar to the compiled code.

The following boxes show the non-optimised code when lifted by the three lifters:
mctoll:

retdec: and mcsema:
and the optimised versions:
mctoll: retdec: and mcsema:

The code produced for the add function by mctoll and retdec is quite similar in both non-optimised and optimised versions. Furthermore, mctoll produces equivalent code to the compiled LLVM function when aggressive optimisations are applied. Both mctoll and retdec retain the original number of parameters but retdec changes the return value to be an i64 instead of the original i32. Mcsema's code is quite different to the two other lifters and one of the key aspects that separate mcsema from the two others is its use of the %struct.State struct. This is a struct that contains the various artefacts of the architecture of the lifted binary, e.g. the registers (RSP, RBP, RAX, ... for amd64) and the computations perfomed by the instructions raised by mcsema are then based on this struct. In some sense, the lifted code can be considered a form of "virtual machine" since there is a level of indirection through the %struct.State struct. This is also why several getelementptr instructions occur at the beginning of the function since these grab pointers to various "registers" that will be used in the code. However, it makes the code quite difficult to read in comparison to the LLVM produced by the other two lifters and also significantly larger in terms of instructions. Finally, the function signature of the functions lifted by Mcsema are all the same which makes it hard to interpret the signature of a given function since it must be interpreted from the code itself.


Test3

Second, we will go through a few details from test3. One of the reasons we select this test case is because it shows the difficulty of raising to LLVM since we find a minor issue in one of the lifters (mctoll).
The C code for the main function in test3 is the following:

which compiles to the following non-optimised LLVM:

The non-optimised LLVM code produced by the lifters look as follows:
mctoll:

retdec: and mcsema:

All three lifters produce the same basic block composition and, again, the code by mctoll and retdec look more similar to each other compared to mcsema. The code produced by retdec is more condensed and is quite nice to read in terms of understanding the code.
The code generated by mctoll contains the following sequence at the end of the main function:

Specifically, the main function returns void and the return values from each of the functions are not used. As such, when we apply optimisations to the code we get the following:

which is naturally wrong.
The section All results below lists all of the results from the samples that we used in our study. We will not go into the details of all these samples but rather leave the code for inspection to those who are interested. In the section below we will give a few remarks on our overall impression of the generated LLVM IR.

Some conclusive remarks

Translating binary code to LLVM will surely become more attractive as LLVM continues to grow and the analysis capabilities become more powerful. We have given a brief look into the LLVM code produced by three different binary-to-llvm translators in order to give a better intuition for the expected output. The output of mctoll seems to be closer to the LLVM produced from forward compilation, in particular based on the number of functions in each LLVM module. Both retdec and mcsema seem closer to specific reverse engineering tools in that each of them tries to lift everything from the binary, e.g. compiler-generated functions too. This is the reason each of these lifters produce many more functions in the lifted output in comparison to mctoll. However, despite the similarity in terms of goal between retdec and mcsema, the specific code created by mctoll and retdec seems more comparable, whereas the code produced by mcsema performs a kind of indirect execution by way of a struct that holds the architecture-specific data of the given binary, e.g. the general purpose registers.
In this blogpost our focus was purely on the LLVM IR produced by the lifters and this on rather small samples of code. There are naturally many more aspects to each the lifters where they will differ, e.g. how well they work with real-world applications, accuracy and correctness of the lifted code, support for various targets, ability to recompile the lifted code and so on. These are interesting questions and we leave this for future work.

All results

In the following section we list each of the six test samples we exercised with and the code produced by each of the three lifters. The only thing we have removed is some metadata produced by mcsema from each of the files, but other than that files remain exact.

Test1

Source code:

Compiled LLVM:
Non-optimised:

Optimised:

Raised by mctoll
Non-optimised:

Optimised:

Raised by retdec
Non-optimised:

Optimised:

Raised by mcsema
Non-optimised:

Optimised:



Test2

Source code:

Compiled LLVM:
Non-optimised:

Optimised:

Raised by mctoll
Non-optimised:

Optimised:

Raised by retdec
Non-optimised:

Optimised:

Raised by mcsema
Non-optimised:

Optimised:



Test3

Source code:

Compiled LLVM:
Non-optimised:

Optimised:

Raised by mctoll
Non-optimised:

Optimised:

Raised by retdec
Non-optimised:

Optimised:

Raised by mcsema
Non-optimised:

Optimised:



Test4

Source code:

Compiled LLVM:
Non-optimised:

Optimised:

Raised by mctoll
Non-optimised:

Optimised:

Raised by retdec
Non-optimised:

Optimised:

Raised by mcsema
Non-optimised:

Optimised:



Test5

Source code:

Compiled LLVM:
Non-optimised:

Optimised:

Raised by mctoll
Non-optimised:

Optimised:

Raised by retdec
Non-optimised:

Optimised:

Raised by mcsema
Non-optimised:

Optimised:



Test6

Source code:

Compiled LLVM:
Non-optimised:

Optimised:

Raised by mctoll
Non-optimised:

Optimised:

Raised by retdec
Non-optimised:

Optimised:

Raised by mcsema
Non-optimised:

Optimised: