The Stack (theory)

(This is part of my series on program vulnerability exploitation.)

Most of the techniques we will see here manipulate the program stack in some way or other, so it is imperative to have a good understanding of it, and we will describe it in some length. Again, the book by Bryant and O’Hallaron is an excellent reference for these topics.

1. What is the stack?

A common feature of high-level languages such as C is that they allow the programmer to define functions, such as

int factorial(int n)
{
    int res = 1;
    for (int i=2; i<=n; i++) {
        res *= i;
    }
    return res;
}

which are bits of code that may be used several times during the execution of the program. For several reasons, it is important that each function have some space for itself where it will store its own data (such as the variables i and res here), which will be allocated when the function is called, and freed when it returns. In many computer platforms, and especially in the x86/Linux combination which interests us here, this scheme is implemented using a stack (generally referred to as the stack, or sometimes the call stack or the execution stack).

Recall that a stack is a data structure which operates in a last-in-first-out manner: data enters the stack from the top (we say it is pushed on the stack) and exits it also from the top (we say it is popped from the stack). One can imagine a stack of plates: you put plates on the stack on the top, and take them from the top also, which means that when you take a plate, it will always be the one which was put most recently (hence, last-in-first-out). This makes sense for our purpose: when a function is called, we give it a chunk of memory space, called its stack frame, to store its data on, and push it on our stack. Then when the function returns, its stack frame is popped from the stack and the stack is back to how it was before the function was called. If, during the execution of the function, a new function is called, the frame for that function will be pushed (when the new function is called) and popped (when it returns) before our original function returns, and we can continue like that as long as we have memory available.

Consider this program:

#include <stdio.h>

int add(int a, int b)
{
    return a+b;
}

int mult(int a, int b)
{
    int prod = 0;
    for (int i=0; i<a; i++) {
        prod = add(prod, b);
    }
    return prod;
}

int main(void)
{
    printf("%d\n", mult(2, 3));
}

During a call to the function add(), the stack will look like this:

|                 |
+-----------------+
| frame of add()  |
+-----------------+
| frame of mult() |
+-----------------+
| frame of main() |
+-----------------+
|                 |

We now turn to what's inside a stack frame.

2. Stack frame structure

A function normally doesn't get to do whatever it wants with its stack frame, or at least not with all of it. There are numerous conventions in place so that the function plays nicely with both the function who called it, and the functions it itself will call. This is, of course, so that functions written by different programmers/compilers will be able to interact nicely. Mostly, those conventions have to do with information that functions pass to each other.

2.1. Parameters

The most obvious ones are the parameters of a function call, passed by the caller to the callee. Typically, the caller will simply store the parameters in a specified location at the top of its own stack frame, and the callee will grab them from there (thus the callee will read, but normally not write, on the stack frame of the caller). In our example, when mult() calls add(), it looks like this (lines of equal signs delimit stack frames, and lines of hyphens delimit where variables are stored, typically one line will be 4 bytes):

|                | frame of add()
+================+
|                |
+----------------+
| prod           | frame of mult()
+----------------+
| b              |
+----------------+
|                |
|                |
|                |
+================+
|                | frame of main()

So mult() will store the arguments to be passed to add() near the top of its stack frame. Reading them in the order in which they are written in the code, they are stored from top to bottom. There is also a space of 4 bytes at the very top of the stack frame, it's the most important one and we turn to it now.

2.2. The return address

The code of each function is stored contiguously in memory. So, when a function call happens, execution will jump to the location in memory where the code of the called function is located. A corollary is that, when execution of the called function has finished and it returns, execution must jump back to where it was before the function was called, so that execution of the caller function may resume. Thus, during the function-call process, the caller function will also store the address of the instruction immediately following the function-call instruction in the 4-byte space that was left blank in the previous picture. Then, when the called function terminates, it will grab the address, and execution will jump to it and the caller function will continue.

This means that if we can modify this value with an address of our choice, then we can divert execution of the program to any location we want in memory. If we also happen to find a way to put some code of our creation in memory, then by diverting execution to the address where our code is located we will be able to have it executed, and it will do whatever we have written it for.

This is the basic idea which is at the heart of everything that will follow.

2.3. Everything else

Besides the conventions covered in 2.1 and 2.2, the function is pretty much free to do whatever it wants with its stack frame. In particular, there is no rule governing the way in which the function stores its internal variables and data.

In the following post, we will get our hands dirty and look at how exactly the stack frames of each function look like in the previous program.

Leave a Reply

Your email address will not be published. Required fields are marked *