Factoring

The only known method which breaks the RSA cryptosystem involves factoring the public modulus. To prevent this type of attack, one must be aware of the current state of the art in factoring large numbers, so as to avoid those situations in which a fast factoring algorithm exists. We will now examine some of these factoring techniques. Throughout this section, if not otherwise stated, n will indicate a "large" odd integer that we wish to find a factor of (even integers, it goes without saying, have an obvious factor).

Brute Force (Trial Division)

Testing each of the primes up to the square root of n for divisibility, will certainly produce a factor if n is not prime. The problem with this technique is that it is very slow and for large n with no small prime factors it may not find a factor in any reasonable time. Our day to day experience with this naive approach is misleading, since we usually do not meet any integers of the size needed for a cryptosystem which need to be factored.

Fermat Factoring

Theorem: Any odd integer n > 1 can be written as the difference of two integer squares.

Pf: Write n = m1m2 with m1m2 (in a worst case, m1 = 1 will work). Since n is odd, each of m1 and m2 are odd. Let a = ½(m1 + m2) and b = ½(m2 - m1). Note that a and b are integers. Then m1 = a-b and m2 = a+b, so n = m1m2 = (a-b)(a+b) = a2 - b2.

Now suppose we wish to find a factor of the odd integer n (> 1). Examine, in turn, the numbers n, n+12, n+22, n+32, ... until you find a square (this is guaranteed to exist by the theorem), say n + b2 = a2, then n = a2 - b2 = (a+b)(a-b) and so, factors of n have been located.

Example: Find a factor of n = 152398989. Looking for a square in the sequence, 152398989, 152398990, 152398993, 152398998, 152399005, 152399014, 152399025, 152399038, ... we have (12344.998541...)2, (12344.99582...)2, (12344.99870...)2, (12344.99890...)2, (12344.99918...)2, (12344.99955..)2, (12345)2. Thus, n = (12345)2 - 62 = (12351)(12339).

The method can be sped up a bit by observing that the last digit of a square must be a 0,1,4,5,6 or 9. However, taking square roots to determine if a number is a square is a slow operation, and this naive approach is therefore not very fast. A better algorithm to search for squares would be to examine the sequence of integers given by ([k] + i)2 - n for a square, where [k] is the square root of n rounded up to the nearest integer (ceiling function), and i = 0,1,2,... . This algorithm does not take as many square roots and those it does take are of smaller numbers. In the above example, using this algorithm, the factorization would have been found in the first step (but, to be honest, trial division would have found a different factorization in one step as well).

Factors which are nearly equal will be found fairly quickly by this procedure, thus in the RSA application one must make sure that the two primes are not too close together.

p-1 Factoring Algorithm

This algorithm is due to Pollard and dates from 1974. Choose a "bound" B. Compute a 2B! mod n. If d = gcd(a-1, n) satisfies 1 < d < n, then d is a factor of n. To see why this works, consider a prime p which divides n. If p-1 divides B! (which will be the case if, for instance, p-1 only has small prime divisors) then we have B! = (p-1)k for some integer k. Now, since a 2B! mod n, we also have a 2B! mod p since p|n. By Fermat's theorem 2p-1 1 mod p, so a 2B! = (2p-1)k 1 mod p. Therefore p | (a-1) and p | n, so p | gcd(a-1,n) = d. Hence, 1 < d and if d < n, d will be a proper divisor of n.

Example: Let n = 15770708441. Choose B = 180. Then a = 11620221425 and we compute d = 135979. We get the factorization 15770708441 = (135979)(115979). The reason that factorization worked is that d-1 = 135978 = 2(3)(131)(173) has only small prime factors. Any B 173 would have worked for this n.

The choice of B is crucial in this algorithm. If B is small, the algorithm will run quickly, but the chance of success is small. On the other hand, if B is large, the algorithm will find a factor, but the runtime will be prohibitively slow (comparable to trial division).

In the RSA application, one must ensure that the primes p and q have the property that p-1 and q-1 have at least one large prime factor to avoid an attack by this method. We shall see a generalization of this method later when we consider elliptic curves.

Factor Base Algorithms

Most of the best modern factoring algorithms are based on a generalization of the idea behind Fermat factorization. Namely, if we can find a congruence of the form t2s2 mod n, with t ±s mod n, then since n|t2 - s2 = (t+s)(t-s) while it doesn't divide either t+s or t-s, n must have some non-trivial common factor with both t+s and t-s. One of these common factors is a = gcd(t+s,n) and the other is b = n/a.

Example: Suppose we want to factor n = 4633. If we notice that 118225 = 52 mod 4633, then a = gcd(118+5, 4633) = gcd(123, 4633) = 41 and we have 4633 = (41)(113).

The factoring problem is then reduced to finding a congruence of this type. To manufacture such a congruence we use the concept of a factor base. A factor base is simply a set of small primes which is not too big. If B is a factor base, then a number all of whose prime factors lie in B is said to be B-smooth. To find a congruence of the form t2s2 mod n, we first find several numbers bi so that (bi)2 reduced mod n is B-smooth for a fixed factor base B. Since |B| is small, there will be many repeated primes in the factorizations of these numbers. The next task would be to find some subset of the (bi)2's so that all the primes that appear in the product of these (bi)2's appear to an even power (so, the product will be a square mod n).

Example: Factor n = 2043221 using the factor base B = {2,3,5,7,11}. We find, by means to be discussed below, the following B-smooth squares:

14392 mod 2043221 = 27500 = 22 54 11
28782 mod 2043221 = 110000 = 24 54 11
31972 mod 2043221 = 4704 = 253 72
31992 mod 2043221 = 17496 = 23 37
32532 mod 2043221 = 365904 = 24 33 7 112

Consider the 3rd and 4th numbers; we see that [(3197)(3199)]228 38 72 mod 2043221. Thus, t = (3197)(3199) mod 2043221 = 11098 and s = 24 34 7 mod 2043221 = 9072. Now gcd(t+s,n) = gcd(11098+9072, 2043221) = 2017 and we have 2043221 = (2017)(1013).

This example also illustrates what can go wrong with the procedure. Had we taken the first two numbers, we would have obtained [(1439)(2878)]2 26 58 112 mod n, so t = (1439)(2878) mod n = 55000 and s = 23 54 11 mod n = 55000, i.e. t = s, and this does not lead to a factorization.

In the example we found the appropriate subset of bi's to multiply by inspection, but we can do this systematically and at the same time answer the question of how many bi's do we need to find? We form a 0-1 matrix where each row corresponds to one of the B-smooth squares, having |B| columns, each column corresponding to one prime in the factor base B. For each row the entry in the jth column is a 1 if the jth prime of B appears to an odd power and 0 otherwise. For the last example this matrix would look like:

0  0  0  0  1 
0 0 0 0 1
1 1 0 0 0
1 1 0 0 0
0 1 0 1 0.
We are seeking sets of rows whose sum mod 2 is the zero row, i.e., we are finding a set of linearly dependent rows of this matrix when it is thought of as a matrix over GF(2). If this can't be done by inspection, we can always use the linear algebra technique of row reduction to find such sets. If there are |B| + 1 rows, then we are guaranteed to find at least one set of linearly dependent rows. This means that we can always find a congruence of the form t2s2 mod n, however there is no guarantee that t ±s mod n. When this occurs, we can either use another set of linearly dependent rows (often requiring the finding of new B-smooth squares) or change the factor base.

It should now be clear why we are a little vague in the definition of a factor base. If the factor base is small, we will need to only a few B-smooth squares to get a linear dependency, however, having a small factor base means that the B-smooth squares are rare and so finding them will be hard. On the other hand, a large factor base means that there are many more B-smooth squares, so they will be easier to find, but we will then need to find many more of them. A good algorithm based on these considerations would therefore be one for which the factor base is not too big and which has an efficient way of finding B-smooth squares.

One could try randomly selecting the bi and if n is not too large this will be effective, but for large n it isn't. A more effective procedure would be to select the bi's to be integers near the square root of kn for different choices of k. The squares of these bi's will be near kn, so, when reduced mod n they should be small and thus made up of only small primes. Another procedure, due to Pomerance, is to start with a large interval of integers around the square root of n, and then systematically remove integers based on a quadratic relationship with each prime in the factor base. The remaining integers have a high probability of being B-smooth. This method is known as the Quadratic Sieve. A more recent algorithm, known as the Number Field Sieve finds the B-smooth squares by means of computations in rings of algebraic integers.

For factoring RSA moduli, the quadratic sieve has been the most successful algorithm. In April 1994, a 129-digit number known as RSA-129 was factored by Atkins, Graff, Lenstra and Leyland using the quadratic sieve. The numbers RSA-100, RSA-110, ..., RSA-500 were a list of RSA moduli publicized on the Internet (RSA Labs) as "challenge" numbers for factoring algorithms. Each number RSA-d was a d-digit number that is the product of two primes of approximately the same length. The numbers RSA-100, RSA-110, RSA-120, RSA-129, RSA-130, RSA-140, RSA-155 and RSA-160 have all been factored (the last of these on April 1, 2003). In 2001, RSA Labs renamed and reissued the "challenge" numbers and assigned specific monetary rewards for their factoring. The new list (available at RSA Labs) uses the number of digits in the binary representation in the name, starting at RSA-576 (worth $10K) and going up to RSA-2048 ($200K).

The number field sieve seems to have great potential since its asymptotic running time is faster than other known algorithms. It is still in the developmental stages, but many researchers feel that it might prove to be faster for numbers having more than about 125-130 digits. In 1990, the number field sieve was used by Lenstra, Lenstra, Manasse and Pollard to factor 2512 + 1. On December 3, 2003 the factoring of RSA-576 (174 digits) was announced by a group at the German Federal Agency for Information Technology Security (BIS). They used a number field sieve to obtain the two 87-digit prime factors. The smallest challenge number is now RSA-640 worth $20K$.