“Finite Field Arithmetic.” Chapter 15: Greatest Common Divisor.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch15_gcd.kv.vpatch.

You should end up with the same directory structure as previously.

As of Chapter 15, the versions of FFACalc and FFA are 254 and 254, respectively.

Now compile ffacalc:

cd ffacalc

But do not run it quite yet.

First, the mail bag!

Reader diana_coman has given me to know that she has read and signed Chapters 7 and 8:

Thank you, reader diana_coman!

Now, let’s eat the meat of this Chapter.

We’ll start with two very minor extensions of FFACalc:

Op Description # Ins # Outs Notes
R* Multiply top and second item and place only the junior half of the product on the stack. 2 1 The “Low-Multiply” from Ch. 14B.
MS Square the second item modulo the top item and place the result on the stack. 2 1 Conventional modular squaring.

R* (i.e. Right-Multiply) is simply the “Low Mul” helper routine from the previous chapter, now made available directly from FFACalc. It is arithmetically-equivalent to an ordinary * multiplication followed by dropping the senior half of the product, but gives a twofold saving of CPU time by avoiding the calculation of the senior half to begin with.

MS (i.e. Modular-Square) is directly analogous to M* (Modular Multiply) as seen in Chapter 13.

Both of these new FFACalc operators simply expose previously-discussed routines, and therefore do not merit further discussion.

Two typos in the comments of Chapter 14B were found, on lines 258 and 261 of:


      -- (1) Ns := X >> Jm
      FZ_Quiet_ShiftRight(N => X, ShiftedN => Xs, Count => Bar.J);
      -- (2) Z  := X * Bm
      FZ_Multiply_Unbuffered(X => Bar.B, Y => Xs, XY => Z);

They have been corrected:


      -- (1) Xs := X >> Jm
      FZ_Quiet_ShiftRight(N => X, ShiftedN => Xs, Count => Bar.J);
      -- (2) Z  := Xs * Bm
      FZ_Multiply_Unbuffered(X => Bar.B, Y => Xs, XY => Z);

Finally, let’s proceed to the main subject of Chapter 15: Greatest Common Divisor. We have a new FFACalc operator:

Op Description # Ins # Outs Notes
G Find the Greatest Common Divisor of the top and second item, and place on the stack. 2 1 GCD(0,0) is conventionally defined as 0.

This is the Greatest Common Divisor operator, as originally defined by Euclid. Several constructive uses for it will become apparent in subsequent chapters.

There did not, to my knowledge, previously exist in the public literature a constant-spacetime algorithm for GCD. However, it is not difficult to produce one. As a starting point, we will use Stein’s Algorithm, also known as binary GCD (refer to Vol. 2 of D. Knuth’s AOP, where a detailed analysis is found.)

It should be noted that there are several other classic algorithms for GCD (e.g. Lehmer’s method). Unfortunately, none of them are suitable for a constant-time implementation, as every single one of them relies on periodically examining some number of bits in the working registers and performing a variant set of operations (i.e. branching) depending on their value. This leaks, via the timing side-channel, information about the numbers being operated on — and is therefore absolutely prohibited in FFA.

Additionally, all known GCD algorithms run in quadratic time on their worst-case input. And per the definition of constant-time, all FFA algorithms must always run in the worst-case time. Hence, the simplest practical algorithm is best, as it will have the smallest constant factor in its run-time (as well as being the easiest to Fit-in-Head !) Hence, we will be sticking with Stein’s Binary GCD as the basis for our method.

Let’s begin by looking at a commonly-studied recursive variant of this algorithm:

Algorithm 1: Stein’s Recursive GCD.

For positive integers A and B:

  1. If B > A:
    return Stein(B, A)
  2. If B = 0:
    return A
  3. If A and B are both even:
    return 2 × Stein(A / 2, B / 2)
  4. If only A is even:
    return Stein(A / 2, B)
  5. If only B is even:
    return Stein(A, B / 2)
  6. Else:
    return Stein((A - B) / 2, B)

Chapter 15 Exercise #1:

Prove that Algorithm 1 computes GCD(A, B) in a finite number of steps.

As it stands, this algorithm is not suitable for constant-time implementation. However, it illustrates the equivalences that we will use to construct one which is.

Let’s illuminate how Stein’s Algorithm works:

Step 1 forces the relationship A ≥ B to hold at the start of each recursive call.

Step 2 terminates the recursion when B is found to equal zero, and there is nothing further to do but to return the value of A — which will be equal to the sought GCD.

Step 3 determines whether the current values of A and B have a common factor of two (i.e. both are even numbers), and accumulates this common factor for later re-introduction into the computed GCD.

Steps 4 and 5 eliminate a possible non-common factor of two in the current value of either A or B.

Finally, Step 6 makes use of the elementary fact that a difference between two odd numbers (A and B are both known to be odd at this point) is always even, in order to remove a known non-common (with B) factor of two from the quantity A - B, and then proceeds to the next level of the recursion (similarly to the well-known “kindergarten” variant of Euclid’s original GCD — where only such subtractions take place.)

The re-introduction of the shared power-of-two factor accumulated in Step 3 happens as a result of the unwinding of the recursion.

It is important to note that Steps 3, 4, and 5 are the reason why Stein’s Algorithm converges in quadratic (i.e. O(N2), where N is the total number of bits being operated on) time, rather than the O(max(A, B)) convergence time of “Euclid’s subtractive GCD”. These steps shave a single bit of length from A, B, or both whenever the respective quantities are even (i.e. have a zero for their junior-most bit.)

Recall the unsurprising Lemma 3 of Ch. 14A-Bis: if a small number is being subtracted from a much-larger one, it will take very many subtractions to shorten the quantity being subtracted from by even a single bit — you are liable to die of old age waiting for Euclid’s subtractive GCD to converge if the difference between the inputs is large.

The eventual result is that one of the quantities will equal zero, at which point Step 2 terminates the recursion. The zero always ends up in B, while the sought GCD ends up in A.

Now, let’s transform Algorithm 1 into a similar but iterative one.

Observe that a constant-time algorithm must always execute exactly the same — from an algebraic point of view — computations, regardless of the particular inputs. Therefore, the subtractive step must be performed during every iteration of the loop, and likewise the divisions-by-two must also take place during every iteration. Fortunately, the following equivalence holds for all integers A, B, and not merely when |A - B| is odd:

GCD(A, B) = GCD(|A - B|, Min(A, B))

Therefore we can dispense with the division-by-two in Step 6, and likewise with the “shortcuts” in Steps 1-5, and get the following modified algorithm:

Algorithm 2: Iterative Quasi-Stein GCD.

For positive integers A and B:

  1. Twos := 0
  2. Iterate until B = 0:
  3.    Ae, Be := IsEven(A), IsEven(B)
  4.    A      := A >> Ae
  5.    B      := B >> Be
  6.    Twos   := Twos + (Ae AND Be)
  7.    Bnext   := Min(A, B)
  8.    Anext   := |A - B|
  9.    A, B   := Anext, Bnext
  10. A := A × 2Twos
  11. A contains the GCD.

Algorithm 2 is not yet constant-time, but the missing ingredient should at this point be quite apparent to the astute reader.

Observe that once B = 0, the value of A at step 10 will not be affected by any number of “redundant” iterations of the loop.

This fact is the key to deriving a constant-time, i.e. always-worst-case version of Algorithm 2. Let’s write it out in a form directly suitable for implementation in FFA:

Algorithm 3: Constant-Time Iterative GCD.

For FZ integers A and B:

  1. Twos := 0
  2. Iterate Bitness(A) + Bitness(B) times:
  3.    Ae   := 1 - (A AND 1)
  4.    Be   := 1 - (B AND 1)
  5.    A    := A >> Ae
  6.    B    := B >> Be
  7.    Twos := Twos + (Ae AND Be)
  8.    D    := |A - B|, C ← Borrow
  9.    B    := {C=0 → B, C=1 → A}
  10.    A    := D
  11. A := A << Twos
  12. A contains the GCD.

Chapter 15 Exercise #2:

Prove that the number of iterations given in Algorithm 3 always suffices, and that the “redundant” iterations have no effect on the final output.

Chapter 15 Exercise #3:

What values of A and B, for a given FZ_Bitness, actually demand the maximum given number of iterations in order to produce the correct GCD?

And now, we will write an Ada program to perform Algorithm 3:


   -- Find Greatest Common Divisor (GCD) of X and Y.
   -- Note that by convention, GCD(0, 0) = 0.
   procedure FZ_Greatest_Common_Divisor(X      : in  FZ;
                                        Y      : in  FZ;
                                        Result : out FZ) is
      -- Widths of X, Y, and Result are equal
      subtype Width is Word_Index range X'Range;
      -- Working buffers for GCD computation, initially equal to the inputs
      A      : FZ(Width) := X; -- GCD will appear in A in the end
      B      : FZ(Width) := Y;
      -- Evenness (negation of lowest bit) of A and B respectively
      Ae, Be : WBool;
      -- Common power-of-2 factor
      Twos   : Word := 0;
      -- |A - B|
      D      : FZ(Width);
      -- This flag is set iff A < B
      A_lt_B : WBool;
      -- For convergence, requires number of shots equal to 2 * FZ_Bitness:
      for i in 1 .. 2 * FZ_Bitness(X) loop
         -- If A is even, A := A >> 1; otherwise A := A
         Ae := 1 - FZ_OddP(A);
         FZ_ShiftRight(A, A, WBit_Index(Ae));
         -- If B is even, B := B >> 1; otherwise B := B
         Be := 1 - FZ_OddP(B);
         FZ_ShiftRight(B, B, WBit_Index(Be));
         -- If both A and B were even, increment the common power-of-two
         Twos := Twos + (Ae and Be);
         -- D := |A - B|
         FZ_Sub_Abs(X => A, Y => B, Difference => D, Underflow => A_lt_B);
         -- B' := min(A, B)
         FZ_Mux(X => B, Y => A, Result => B, Sel => A_lt_B);
         -- A' := |A - B|
         A := D;
      end loop;
      -- Reintroduce the common power-of-2 factor stored in 'Twos'
      FZ_Quiet_ShiftLeft(N => A, ShiftedN => A, Count => Indices(Twos));
      -- Output final result
      Result := A;
   end FZ_Greatest_Common_Divisor;

Take note that we have defined GCD(0, 0) = 0. This is technically an abuse of mathematical rigour — the GCD of two zeroes is not a uniquely-determined value. However (unlike division by zero) there are no known down-sides to permitting a computation of GCD(0, 0).

I have found that this choice was made in every currently-extant arithmetron, including popular computer algebra systems (e.g. Octave and Wolfram’s.) (Reader: if you know of one which signals an error when given GCD(0, 0), please let me know which, and the author’s reasoning, if it is known.)

Now we will want to test the new GCD routine.

Let’s create a test FFACalc tape for GCD by taking a famous pair of braindamaged RSA moduli from the Phuctor zoo — the stars of the SecLab incident:


Now, run the tape as follows:

$ cat seclab.tape | ./bin/ffa_calc 4096 2

… and if successful, the output will be:


A serious reader will also want to test any iron being considered for use with FFA, as previously described in Chapter 14B.

The following set of canonical GCD litmus tapes is available for download:

Verify the signature and unpack the archive. Inside, you will find five FFACalc tapes:

  • 10k_null_gcd_8192bit.tape
    Ten thousand 8192-bit GCD tests with only null inputs.
  • 10k_small_gcd_8192bit.1.tape
    Ten thousand 8192-bit GCD tests with randomly-generated inputs having a usually-small common factor.
  • 10k_small_gcd_8192bit.2.tape
    Similar to the above.
  • 10k_large_gcd_8192bit.1.tape
    Ten thousand 8192-bit GCD tests with randomly-generated inputs having a usually-large common factor.
  • 10k_large_gcd_8192bit.2.tape
    Similar to the above.

… and a simple shell script, gcd_litmus.sh.

Place all of these items into your ffa/ffacalc directory and execute the litmus script.

After less than half an hour (on reasonable iron), you will have a set of timer outputs (and no error outputs, unless your machine is catastrophically broken.)

Just as previously, with the tapes of Chapter 14B, if you discover that there is any substantial and persistent pattern of difference in the runtimes of the tapes, you have defective iron!

Please leave a comment here if you turn up a machine which fails this litmus! It is, for instance, conceivable that some piece of rubbish masquerading as a CPU performs a shift-by-zero in a different period of time than a shift-by-one; and I was recently quite certain that I had found one such CPU — but it turned out to be a false alarm.

And that is all, as far as GCD is concerned, for now.

~To be continued!~

This entry was written by Stanislav , posted on Thursday January 10 2019 , filed under Ada, Bitcoin, Cold Air, Computation, Cryptography, FFA, Friends, Mathematics, ShouldersGiants, SoftwareArchaeology, SoftwareSucks . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

One Response to ““Finite Field Arithmetic.” Chapter 15: Greatest Common Divisor.”

  • semz says:

    Take note that we have defined GCD(0, 0) = 0. This is technically an abuse of mathematical rigour — the GCD of two zeroes is not a uniquely-determined value.

    In more general settings, you can define the GCD via the partial order a < b \iff a divides b. This is unique up to multiplication with "units" like -1, agrees with the old definition wherever applicable, and satisfies GCD(0,0) = 0. Admittedly this is a post-hoc rationalization on my end, but it goes to show that it's not that much of an abuse mathematically. Very nice series by the way.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

Please prove that you are human:

62 xor 62 = ?

What is the serial baud rate of the FG device ?

// Script to allow anchoring of user-selected content on html pages. // Original idea deployed by http://archive.today // Packaged for WordPress on http://trilema.com/2015/that-spiffy-selection-thing/