The Stack (practice)

(This is part of my series on program vulnerability exploitation.)

We will now actually look at the stack of the program presented earlier, which we rewrite for convenience:

#include <stdio.h>

int add(int a, int b)
{
    return a+b;
}

int mult(int a, int b)
{
    int prod = 0;
    for (int i=0; i<a; i++) {
        prod = add(prod, b);
    }
    return prod;
}

int main(void)
{
    printf("%d\n", mult(2, 3));
}

There are essentially two ways to see what the program does in detail. One is to look at the assembly code for the program, using a tool like objdump, and the other is to look at it in action using a debugger like gdb. In practice I often find myself using one or the other, or both, depending on the situation. If you just need a quick look, in general looking at an objdump will be sufficient, but sometimes you need to watch the program in action in order to better understand it, and you use gdb. Here I will demonstrate both approaches.

1. Static analysis (objdump)

We first compile our program into an executable, and then use objdump to obtain its assembly code (this is called a disassembly):

$ gcc -m32 -O0 -std=c99 -pedantic -Wall -Wextra -o test test.c
$ objdump -M intel -d test

A word on the command flags I use. First, I compile in C99 mode with -std=c99, because I program in C99. ^^ If you program in ANSI C or another dialect, you could use that instead, but my programs might not work as is. I use the -m32 flag to compile 32-bit x86 code, since I am running on an x86-64 64-bit system (depending on your operating system, you might need to install additional software for this to work). I use -O0 (this is “oh zero”) to disable compiler optimisations, which often make the assembly difficult if not impossible to understand. Finally, -pedantic -Wall -Wextra make the compiler warn you about possible mistakes. (The compiler is always right, so you should always compile with those flags, and remedy any warning that appears.)

Finally, I use the -M intel flag of objdump to make it print the assembly code in Intel syntax (and not AT&T syntax, which is the default). You can of course use AT&T syntax if you prefer it, but I will always use Intel syntax. Also, if you are trying to run this on Mac OS X, it will work but objdump is not provided by default; it is part of the GNU binutils package, which you can obtain from here.

objdump prints a lot of stuff. Don’t worry, you don’t need to understand all, or even most, of it. For convenience, I pipe into nl to number the lines, like this:

$ objdump -M intel -d test | nl

and so all the chunks I will paste will have the line numbers prepended.

We will look at the code of add(), since it’s simpler than mult(). It looks like this:

   109  080483e4 <add>:
   110   80483e4:       55                      push   ebp
   111   80483e5:       89 e5                   mov    ebp,esp
   112   80483e7:       8b 45 0c                mov    eax,DWORD PTR [ebp+0xc]
   113   80483ea:       8b 55 08                mov    edx,DWORD PTR [ebp+0x8]
   114   80483ed:       01 d0                   add    eax,edx
   115   80483ef:       5d                      pop    ebp
   116   80483f0:       c3                      ret    

We are introduced to two friends which we will meet often: the frame pointer and the stack pointer, respectively called ebp and esp. They are CPU registers (that is, memory spaces internal to the processor), and they store respectively the addresses of the start (bottom) and end (top) of the stack frame of the current function. Since a function can’t predict exactly where in memory its stack frame will be, it only manipulates addresses relative to either of them. So, the stack frame of add(), including the stack and frame pointers, looks like this:

|                |
+================+
|                | <- esp
+----------------+     frame of add()
|                | <- ebp
+================+
| return address |
+----------------+
| a              |     frame of mult()
+----------------+
| b              |
+----------------+
|                |

(The astute reader will have noticed that here, esp and ebp are actually the same. This is purely coincidential: since the function add() does not need to store anything, no space needs to be allocated to it and its stack frame is the bare minimum. In general they will of course be different.)

The instructions at line 110-111 are typical of the start of a function, and will almost always occur. The called function simply saves in its stack frame the value of ebp (the frame pointer of the caller function) and replaces it with its own frame pointer.

Lines 112-114 illustrate the relative addressing scheme which is used by functions to express addresses, relative to ebp or esp. The add() function is expecting two parameters, and it knows that the parameter-passing conventions dictate that mult() must have stored them in the second and third “slots” of 4 bytes under the top of its own stack frame. Another convention is that addresses go from top to bottom, so in order to grab the first argument, add() proceeds like this: it first reads the address stored in the register ebp, that’s the beginning of its stack frame. In other words, it is the address of the bottom “slot” of 4 bytes in its stack frame. 4 bytes after that, there is the return address. Finally, 4 bytes after the return address, there is the first argument a. So add() adds 8 to the address stored in ebp, and that gives the address of the first argument a, and add() can then grab it (it is done on line 113). It then stores it into another CPU register called edx. It also stores the value of the second argument beax) on line 112, and then instructs the processor to add the values in those two registers (line 114).

Finally, on line 115 the function restores the old value of ebp, and on line 116 it returns execution to the caller function.

2. Runtime analysis (gdb)

gdb is a big and sometimes intimidating tool. Needless to say, we will only use a tiny fraction of what it can do. A good tutorial to the basics of gdb is here. We will go step-by-step in the execution of the add() function and see what happens.

You can use the disass command to see the assembly code of a function, in much the same way as you do with objdump. Since I also want to see what happens during the call to add() I will put a breakpoint just before it happens (so we will still be in mult()):

$ gdb ./test 
Reading symbols from /home/.../test...(no debugging symbols found)...done.
(gdb) disass mult
Dump of assembler code for function mult:
[...]
   0x08048421 <+48>:    call   0x80483e4 <add>
   0x08048426 <+53>:    mov    DWORD PTR [ebp-0x8],eax
[...]
End of assembler dump.
(gdb) break *0x08048421
Breakpoint 1 at 0x8048421

I now start the program. It will break and we can examine the stack frame of mult() just before the call to add(). We first get the values of ebp and esp:

(gdb) start
Temporary breakpoint 2 at 0x804843d
Starting program: /home/.../test 

Temporary breakpoint 2, 0x0804843d in main ()
(gdb) cont
Continuing.

Breakpoint 1, 0x08048421 in mult ()
(gdb) print $ebp
$1 = (void *) 0xffffd598
(gdb) print $esp
$2 = (void *) 0xffffd580

We now know that the stack frame consists of 28 bytes, so we can examine it:

(gdb) x/8wx $esp
0xffffd580:     0x00000000      0x00000003      0x08049ff4      0x08048491
0xffffd590:     0x00000000      0x00000000      0xffffd5b8      0x08048457

We run the x (examine) command, and tell it to print 8 words (i.e., chunks of 4 bytes) in hexadecimal, starting at $esp. The address stored in ebp (i.e., the beginning of the stack frame) corresponds to the third word on the second line, and after that is the return address. Indeed we can see that 0x08048457 is the address of the instruction following the call to mult() in the code of main():

(gdb) disass main
Dump of assembler code for function main:
[...]
   0x08048452 <+24>:    call   0x80483f1 <mult>
   0x08048457 <+29>:    mov    edx,0x8048540
[...] 
End of assembler dump.

At the top of the stack frame, we can see the two arguments to be passed to add(): prod is still 0 since this is the first time we call the function, and b is 3. The return address has not yet been added to the top of the stack frame, which is why it does not show here. We stepi and the function call occurs:

(gdb) stepi
0x080483e4 in add ()
(gdb) disass add
Dump of assembler code for function add:
=> 0x080483e4 <+0>:     push   ebp
   0x080483e5 <+1>:     mov    ebp,esp
   0x080483e7 <+3>:     mov    eax,DWORD PTR [ebp+0xc]
   0x080483ea <+6>:     mov    edx,DWORD PTR [ebp+0x8]
   0x080483ed <+9>:     add    eax,edx
   0x080483ef <+11>:    pop    ebp
   0x080483f0 <+12>:    ret    
End of assembler dump.

We are still at the very beginning of add(), and the frame and stack pointers have not yet been updated, so we can still look at the stack frame of mult(), we see that the return address is now there:

(gdb) x/8wx $esp
0xffffd57c:     0x08048426      0x00000000      0x00000003      0x08049ff4
0xffffd58c:     0x08048491      0x00000000      0x00000000      0xffffd5b8

We now let add() do its thing with the frame and stack pointers. First, the old value of ebp is pushed on the top of the stack (notice how the value of esp decreases every time something is pushed on the stack, so that it will always hold the address of the top of the stack):

(gdb) print $ebp
$3 = (void *) 0xffffd598
(gdb) stepi
0x080483e5 in add ()
(gdb) x/8wx $esp
0xffffd578:     0xffffd598      0x08048426      0x00000000      0x00000003
0xffffd588:     0x08049ff4      0x08048491      0x00000000      0x00000000

At this point, the beginning of the stack frame of add() and the top of the stack coincide, so add() takes the value of esp as its own ebp:

(gdb) print $esp
$4 = (void *) 0xffffd578
(gdb) print $ebp
$5 = (void *) 0xffffd598
(gdb) stepi
0x080483e7 in add ()
(gdb) print $ebp
$6 = (void *) 0xffffd578

And now add() can start its task. First it grabs the arguments:

(gdb) print $eax
$7 = 0
(gdb) stepi
0x080483ea in add ()
(gdb) print $eax
$8 = 3
(gdb) print $edx
$9 = -10780
(gdb) stepi
0x080483ed in add ()
(gdb) print $edx
$10 = 0

… and then adds them:

(gdb) print $eax
$11 = 3
(gdb) stepi
0x080483ef in add ()
(gdb) print $eax
$12 = 3

eax did not change since we added 0 to it. ;) Finally the function restores the old value of ebp:

(gdb) print $ebp
$13 = (void *) 0xffffd578
(gdb) stepi
0x080483f0 in add ()
(gdb) print $ebp
$14 = (void *) 0xffffd598

… and returns, jumping to the return address which is now at the top of the stack:

(gdb) x/1wx $esp
0xffffd57c:     0x08048426
(gdb) stepi
0x08048426 in mult ()

This all was admittedly a lot of work, but by now you should have some degree of acquaintance with how things work. Don’t worry if you don’t fully understand it yet, it will become clearer as we work with it.

2 thoughts on “The Stack (practice)

  1. mark

    Hi, I want to ask you about how stack work because I am in a case seem not normal. I have a piece of code like this:
    1.#include
    2.#include
    3.#include
    4.main()
    5.{
    6.a(1234);
    7.}
    8.int a(int b)
    9.{
    10.int c;
    11.c=b;
    12.return c;
    13.}

    I want to see what address ESP point to and what value in that address before and affer calling funtion `a`. So I use gdb and make breakpoints at line 6 and line 11.

    Breakpoint 1, main () at xxx.c:6
    6 a(1234);
    (gdb) i r esp
    esp 0xbfffefc0 0xbfffefc0
    (gdb) x/16x $esp
    0xbfffefc0: 0xb7fbb3c4 0xb7fff000 0x0804842b 0xb7fbb000
    0xbfffefd0: 0x08048420 0x00000000 0x00000000 0xb7e2aa83
    0xbfffefe0: 0x00000001 0xbffff074 0xbffff07c 0xb7feccea
    0xbfffeff0: 0x00000001 0xbffff074 0xbffff014 0x0804a010
    (gdb) continue
    Continuing.

    Breakpoint 2, a (b=1234) at xxx.c:12
    12 c=b;
    (gdb) i r esp
    esp 0xbfffefa8 0xbfffefa8
    (gdb) x/16x $esp
    0xbfffefa8: 0x0804a000 0x08048472 0x00000001 0xbffff074
    0xbfffefb8: 0xbfffefd8 0x08048402 0x000004d2 0xb7fff000
    0xbfffefc8: 0x0804842b 0xb7fbb000 0x08048420 0x00000000
    0xbfffefd8: 0x00000000 0xb7e2aa83 0x00000001 0xbffff074

    I think at the first breakpoint esp point to 0xbfffefc0 and that address contain value 0xb7fbb3c4, so when in function `a` (esp must be in some lower addresss) so the address 0xbfffefc0 must still contains value 0xb7fbb3c4 and argument value 0x000004d2(1234 in decimal) must be contained in some lower addresses (addresses that lower than the previous top stack address 0xbfffefc0).

    What am I missing here? pls point it out, thanks.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *