GMP logo GMP Development Tidbits
or
GMP(tm) Technology(tm) Preview(tm)

GMP team projects

Here the GMP developers present some new code that will eventually, in one form or another, go into a GMP release. Please use with caution, as this code might not always be well-tested. If it doesn't compile, we probably forgot to include some needed pieces; please let us know if that happens.

None of the interfaces or conventions used by these functions can be safely assumed to be present in any future GMP release.

You are encouraged to study this code and help us improve it. Such improvements can be speedups, or code simplifications.

Note: This code is not simple drop-in replacements for any current code, and Makefiles will not magically know how to build this code, nor will current GMP functions magically know how to call these new functions.

NB: Most of the code here is released under the GPL, not the LGPL. This means that it cannot be used at all in non-free programs. When the code appears in a GMP release, it will be under the usual GMP license, the LGPL.

If you put any of the code here in a library, please make sure to change the prefix so that the final GMP versions will not collide with your versions. Also, please make sure to inform your users that your library contain code that its authors considers insufficiently tested for a broad release.


New mpn_gcd and mpn_gcdext

This is Niels Möller's sub-quadratic GCD code, implementing both plain gcd and extended gcd. It is believed to be quite stable at this point.

Comparison of gcd routines. The top row denotes number of bits in the input arguments:

Times in µs 100 1000 10000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000
mpn_gcd (new) 0.672 18.6 279 8,370 251,000 5,940,000 112,000,000 1,890,000,000
mpn_gcdext (new) 1.26 18.6 341 11,100 389,000 10,200,000 202,000,000 3,390,000,000
mpn_gcd (4.2) 0.656 18.8 292 12,800 1,930,000 360,000,000    
mpn_gcdext (4.2) 2.70 39.0 556 24,400 3,220,000 767,000,000    

Download files Last changed
mpn/generic/gcd.c 2005-06-13
mpn/generic/gcdext.c 2005-06-13
mpn/generic/hgcd.c 2005-03-31
mpn/generic/hgcd2.c 2004-01-25
mpn/generic/qstack.c 2003-12-19
gmp-impl.h-gcd-newlines 2005-11-17


New Toom multiplication code for unbalanced operands

This code greatly improves multiplication performance for when the two operands have different sizes, also known as unbalanced operands.

The code is believed to be stable and should be straightforward to deploy.

Download files Last changed What changed?
mpn/generic/mul.c 2006-12-29
mpn/generic/mul_toom22.c 2007-01-11 test code bug fix
mpn/generic/mul_toom32.c 2007-01-11 test code
mpn/generic/mul_toom42.c 2007-01-13 interpolate call
mpn/generic/toom_interpolate_5pts.c 2007-01-13 new

We are working on more Toom functions, see below. In addition, mpn_mul_toomM3 for M = 4, 6 are planned, using 6 and 8 evaluation points, respectively. mpn_mul_toom33 is like the present GMP 4.2 mpn_mul_toom3 except that the former allows the operands to be slightly unbalanced.

We believe that even more Toom functions might be needed for optimal performance, perhaps going as high as mpn_mul_toomM7. This will largely depend on ongoing work on FFT multiplication; the faster that will become, the less room for higher-order Toom. If we need to go to order-7, there will be a forbidding number of functions, in order to support unbalanced operands as well as we support balanced operands.

The following functions are more bleeding edge, and will be integrated with mul.c at some point. The code is under active development, and is therefore not unlikely to have bugs.

Download files Last changed What changed?
mpn/generic/mul_toom33.c 2007-01-14 new
mpn/generic/toom_interpolate_5pts.c 2007-01-13 new
mpn/generic/mul_toom44.c 2007-01-11 new
mpn/generic/mul_toom53.c 2007-01-14 optimized
mpn/generic/mul_toom62.c 2007-01-11 new
mpn/generic/toom_interpolate_7pts.c 2007-01-11 new

This new tool is useful for for optimizing the evaluation sequences of Toom functions. It exhaustively searches for possible sequences, given a goal function. It is similar to the GNU superoptimizer, except that this program optimizes over a a simple algebraic structure and handles m-ary -> n-ary projections (the GNU superoptimizer optimizes over an assembly language subset, and handles m-ary -> 1-ary projections). This is very immature code, but it seems to work.

Download files Last changed What changed?
superopt-va.c 2007-01-16/2 speedup+UI improvement
goal-va.def 2007-01-16 bug fixes


New division code

This is new code for truncating division and Hensel division at the mpn level. There is also new exact division code, exact meaning that it is known that N mod D = 0. The new code has complexity O(M(n)) for all operations, while the old mpz code is O(M(n)log(n). More precisely, this code takes time 2M(n) for large n.

This code was written by Niels Möller and Torbjörn Granlund.

Notation: nn, dn, qn are the limb count of N, D, and Q, respectively. M(n) is the time use for multiplication of two n-bit numbers.

For small operands, simple O(n²) "schoolbook" code is used, for medium operands a divide-and-conquer algorithm is used, and for large operands a variant of Barrett's algorithm is used. The files names contain "sb", "dc", and "mu" for the respective algorithm.

We make use of that large operand multiplication in GMP uses FFT which computes multiplication mod 2^m+1, for some value of m. In the inversion Fermat iteration code, we know that the low part is ...00001. In the quotient development code, we know that the low or upper part cancels with the partial remainder. We can therefore choose m smaller than our desired products, and let the upper products "wrap" into the low part of the product.

Our algorithm is similar to Barrett's algorithm, except that we compute an inverse which is typically smaller than D; the inverse has sometimes just 1/2 or 1/3 as many digits as D.

Further work in these areas is desirable:

The interface of this code might change multiple times before becoming part of GMP, and even then the interface will be undocumented/internal. The code is under active development, and is therefore not unlikely to have bugs.

Download files Last changed What changed?
new.h 2006-11-07
 
mpn/generic/sb_div_qr.c 2007-01-03
mpn/generic/sb_divappr_q.c 2007-01-03
mpn/generic/sb_div_q.c 2007-01-03
mpn/generic/dc_div_qr.c 2007-01-03
mpn/generic/dc_divappr_q.c 2007-01-07 new
mpn/generic/dc_div_q.c 2007-01-07 new
mpn/generic/invert.c 2006-12-17 file retracted by author
mpn/generic/mu_div_qr.c 2006-11-07
mpn/generic/mu_divappr_q.c 2006-11-10
mpn/generic/mu_div_q.c 2007-01-07 improved
mpn/generic/sb_bdiv_qr.c 2006-11-07
mpn/generic/sb_bdiv_q.c 2006-11-07
mpn/generic/dc_bdiv_qr.c 2007-01-03
mpn/generic/dc_bdiv_q.c 2006-12-31
mpn/generic/binvert.c 2006-11-07
mpn/generic/mu_bdiv_q.c 2006-11-07
mpn/generic/divexact.c 2006-11-07
 
test-bdivisions.c 2006-11-07
test-binvert.c 2006-11-07
test-divexact.c 2006-11-07
test-divisions.c 2006-11-07
test-invert.c 2006-11-07

The following tables compares the speed of division functions. The first table uses dividends that are precisely twice as large as the divisors; the second table uses dividends that are 11 times as large as the divisors. Times are in microseconds; operand sizes are in bits.

dividend
divisor
200
100
2000
1000
20000
10000
200,000
100,000
2,000,000
1,000,000
20,000,000
10,000,000
200,000,000
100,000,000
2,000,000,000
1,000,000,000
mpn_divexact (new) 0.058 0.531 16.4 997 33,100 503,000 8,310,000 103,000,000
mpz_divexact (4.2) 0.083 0.64 28.9 2,140 67,200 1,410,000 25,900,000 413,000,000
mpn_tdiv_qr (4.2) 0.106 1.19 52.4 2,140 67,200 1,410,000 25,900,000 413,000,000
dividend
divisor
1100
100
11000
1000
110000
10000
1,100,000
100,000
11,000,000
1,000,000
110,000,000
10,000,000
1,100,000,000
100,000,000
11,000,000,000
1,000,000,000
mpn_divexact (new) 0.247 6.66 452 20,000 316,000 4,850,000 64,000,000  
mpz_divexact (4.2) 0.281 7.72 517 21,200 658,000 14,100,000 259,000,000  
mpn_tdiv_qr (4.2) 0.351 10.2 559 21,200 658,000 14,100,000 259,000,000  
These results were obtained on an AMD Opteron running at 2.2 GHz, using 64-bit code.

dividend
divisor
200
100
2000
1000
20000
10000
200,000
100,000
2,000,000
1,000,000
20,000,000
10,000,000
200,000,000
100,000,000
2,000,000,000
1,000,000,000
mpn_sb_div_qr 0.0597 0.983 56.5 5,250 871,000      
mpn_dc_div_qr     50.4 2,120 67,800 1,430,000 25,700,000  
mpn_mu_div_qr     70.7 2,440 48,100 736,000 10,700,000 133,000,000
mpn_tdiv_qr (4.2) 0.0905 1.19 51.5 2,120 67,200 1,400,000 25,900,000 413,000,000
dividend
divisor
1100
100
11000
1000
110000
10000
1,100,000
100,000
11,000,000
1,000,000
110,000,000
10,000,000
1,100,000,000
100,000,000
11,000,000,000
1,000,000,000
mpn_sb_div_qr 0.371 9.14 561 53,000        
mpn_dc_div_qr     510 21,100 674,000 14,200,000 258,000,000  
mpn_mu_div_qr     625 17,300 299,000 4,400,000 64,500,000  
mpn_tdiv_qr (4.2) 0.380 10.4 517 21,200 668,000 14,200,000 259,000,000  


New mpn_set_str and mpn_get_str

Both mpn_set_str and mpn_get_str are being rewritten, mainly for improving the memory economy and the performance, but also for making performance and memory use more linear in operand size. The new code is 20% to 30% faster than the old code for typical input, and it uses less than half the amount of memory.

The code in present GMP releases computes a series of powers of the conversion base B,

        1   2   4        (2^k)
       B , B , B , ...  B
and then multiplies (mpn_set_str) or divides (mpn_get_str) by these powers.

The new code computes a better tuned series, where the topmost power is in the order of the square root of the converted number.

The update of 2007-10-28 implements a trimmed table of powers, no longer with a large number of zero bits at the low end, greatly improving speed for even bases (such as 10).

When the division code (see above) is completed, mpn_get_str will become asymptotically faster. We will also at some point improve small operands code by rewriting the base case code to convert the numbers into fractions, and use mpn_mul_1 for developing digits.

Download files Last changed What changed?
mpn/generic/set_str.c 2007-10-28 optimization, alloc bug fix
mpn/generic/get_str.c 2007-10-28 optimization

In order to use the download files, you need to put these lines in gmp-impl.h:

/* Definitions for mpn_set_str and mpn_get_str */
struct powers
{
  mp_ptr p;			/* actual power value */
  mp_size_t n;			/* # of limbs at p */
  mp_size_t shift;		/* weight of lowest limb, in limb base B */
  size_t digits_in_base;	/* number of corresponding digits */
  int base;
};
typedef struct powers powers_t;
#define mpn_dc_set_str_powtab_alloc(n) ((n) + GMP_LIMB_BITS)
#define mpn_dc_set_str_itch(n) (n)
#define mpn_dc_get_str_powtab_alloc(n) ((n) + 2 * GMP_LIMB_BITS)
#define mpn_dc_get_str_itch(n) ((n) + GMP_LIMB_BITS)

#ifndef GET_STR_DC_THRESHOLD
#define GET_STR_DC_THRESHOLD             18
#endif

#ifndef GET_STR_PRECOMPUTE_THRESHOLD
#define GET_STR_PRECOMPUTE_THRESHOLD     35
#endif

#ifndef SET_STR_DC_THRESHOLD
#define SET_STR_DC_THRESHOLD            750
#endif

#ifndef SET_STR_PRECOMPUTE_THRESHOLD
#define SET_STR_PRECOMPUTE_THRESHOLD   2000
#endif

New mpn/generic/mul_fft.c

This is the current development version of the file mpn/generic/mul_fft.c. The most important improvement here is the choice of the k parameter. With the right tables in mpn/MACHINE/gmp-mparam.h, this can smooth out the staircase effect. There are also other changes, that slightly improves locality, and other changes that decreases some overhead.

Download files Last changed What changed?
mpn/generic/mul_fft.c 2007-09-27 speedup


External GMP projects

Here we'll link to some developments by external groups or people. These developments have not been tested by the GMP team, therefore we cannot make any assessments about their quality.


Please send comments about this page to gmp-discuss@swox.com
Copyright 2005, 2006, 2007 Free Software Foundation
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.