Factoring integers—Dixon’s algorithm

As said in the previous post, we will now present Dixon’s algorithm, which is a straightforward implementation of the ideas discussed so far. As we will see, it is still not very fast, but at least it is significantly better than trial division.

More linear algebra

Before implementing the algorithm, however, we need a quick word about the last part: given (at least) $d+1$ vectors of $\mathbf{F}_2^d$, how exactly do we find a linear dependence? We create a $d+1\times d$ matrix with our vectors as rows, which, using the example of the previous post, gives
\[ M = \begin{pmatrix}
1 & 1 & 0 & 1 \\
0 & 0 & 1 & 0 \\
1 & 1 & 0 & 1 \\
1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1
\end{pmatrix}. \]
This is a $5\times 4$ matrix, which induces by left multiplication ($f: \mathbf{x}\mapsto \mathbf{x}M$) a linear transformation from $\mathbf{F}_2^5$ to $\mathbf{F}_2^4$. Now it is easy to see that each vector of the kernel of $f$ corresponds to a set of rows which add to the zero vector, for example the vector $(1, 0, 1, 0, 0) \in \ker f$ corresponds to the set of rows we used earlier.

Thus in the end we just need to obtain the vectors of the kernel of $f$ in order to find our congruences. Fortunately, we need go no further: matrices in Sage have a method for that, and since linear algebra is not our topic here, we will happily use it without caring about how it works. (And actually, we will also freely use is_prime() since primality testing, which is a completely different thing from factoring, is not our topic either.)

Algorithm

This can be implemented as follows. We first need a trial factoring function to weed out small factors, because our algorithm does not like them very much. It is pretty straightforward:

# Trial divide for prime factors in primes.
# Returns the factors found and the unfactored part.
def trial(n, primes):
    factors = []
    for p in primes:
        e = 0
        while n % p == 0:
            e = e + 1
            n = n // p
        if e > 0:
            factors.append((p, e))
            if n == 1:
                break
    return (Factorization(factors), n)

We also need a function to recognise smooth numbers:

# If smooth (i.e., all prime factors in fbase), returns the exponent vector.
# Otherwise, returns None.
def is_smooth(n, fbase):
    lfbase = len(fbase)
    v = vector(GF(2), lfbase + 1)
    for i in range(lfbase):
        while n % fbase[i] == 0:
            v[i+1] = v[i+1] + 1
            n = n // fbase[i]
    if n < 0:
        v[0] = 1
        n = -n
    if n == 1:
        return v
    else:
        return None

The main part of the algorithm:

# Returns a non-trivial factor.
def dixonfact(n, primes):
    b = ceil(exp((1/2)*sqrt(log(n)*log(log(n)))))
    fbase = [p for p in primes if p < b]
    d = len(fbase) + 1
    Tx = []
    TQx = []
    vectors = []
    nvalues = 0

    # Search for enough suitable vectors
    x = ceil(sqrt(n))
    v = is_smooth(x^2 - n, fbase)
    if v is not None:
        Tx.append(x)
        TQx.append(x^2 - n)
        vectors.append(v)
        nvalues = nvalues + 1
    k = 1
    while nvalues < d+5:
        v = is_smooth((x+k)^2 - n, fbase)
        if v is not None:
            Tx.append(x+k)
            TQx.append((x+k)^2 - n)
            vectors.append(v)
            nvalues = nvalues + 1
        v = is_smooth((x-k)^2 - n, fbase)
        if v is not None:
            Tx.append(x-k)
            TQx.append((x-k)^2 - n)
            vectors.append(v)
            nvalues = nvalues + 1
        k = k+1

    # Get relations
    M = matrix(vectors)
    for v in M.kernel():
        if v == 0:
            continue
        x = 1
        y2 = 1
        for i in range(nvalues):
            x = x * Tx[i]^(v[i])
            y2 = y2 * TQx[i]^(v[i])
        y = sqrt(y2)
        if (x+y) % n != 0 and (x-y) % n != 0:
            return gcd(x+y, n)

And finally the main factoring function, which puts everything together. We also treat as a special case the case where $n$ is a prime power (there is a special algorithm to handle such numbers, which is both simple and efficient, but we omit it for simplicity):

# Full factorisation.
# If init, we generate the list of primes and do
# trial division with them.
def dixon(n, primes=None, init=True):
    # Special cases
    if is_prime_power(n):
        return factor(n)
    if n == 1:
        return Factorization([])

    if init:
        b = ceil(exp((1/2)*sqrt(log(n)*log(log(n)))))
        primes = prime_range(b)
        F, n = trial(n, primes)
        return Factorization(list(F) + list(dixon(n, primes, False)))

    # We know that n is not a prime power and not 1.
    d = dixonfact(n, primes)
    return Factorization(list(dixon(d, primes, False))
                         + list(dixon(n//d, primes, False)))

In order to test that it works correctly, we can use a test function like this:

# Test on n random integers from 1 to b (inclusive).
def testdixon(n, b):
    for i in range(n):
        t = cputime()
        N = 1 + ZZ.random_element(b)
        print "[%s/%s] Factoring %s = %s..." % (i+1, n, N, factor(N))
        myfactor = dixon(N)
        if list(myfactor) != list(factor(N)):
            print "Failed on %s" % N
            return False
        print "Factored in %s seconds." % cputime(t)
    print "All passed."
    return True

but it is most meaningful to test it on a semiprime (i.e., a product of two distinct primes), since this is normally the most difficult case. For example:

sage: n = 21337797057980567893
sage: t = cputime()
sage: factor(n)
2975772091 * 7170507823
sage: cputime(t)
0.029450000000000198
sage: t = cputime()
sage: dixon(n)
2975772091 * 7170507823
sage: cputime(t)
78.717072
sage: t = cputime()
sage: trial(n, prime_range(ceil(sqrt(n))))
(2975772091, 7170507823)
sage: cputime(t)
298.628159

What now?

As said at the beginning of this post, our algorithm is significantly faster than trial division, but also significantly slower than Sage's factor(), meaning that there is still room for improvement. It is not difficult to convince ourselves that the slowest part is in the is_smooth() function, to identify smooth numbers, so this is where we will concentrate our efforts from now on.

Leave a Reply

Your email address will not be published. Required fields are marked *