“Finite Field Arithmetic.” Chapter 14B: Barrett’s Modular Reduction. (Part 2 of 2.)

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch14_barrett.kv.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc

But do not run it quite yet.

First, the mail bag!

Reader diana_coman recently observed that the OF_in parameter taken by certain procedures in FZ_Shift is not checked for validity, and — if abused by being given an oversized (i.e. wider than the given shiftness) value — could result in a garbage output.

Her observation is entirely correct. And ideally, the range of OF_in would be constrained via a precondition. Unfortunately, Ada does not permit the use of preconditions in combination with inlining; and FZ_Shift routines are invoked in several costly inner loops, and absolutely must be subject to inlining. Therefore, it is impractical to actually verify the bit-width of OF_in on every invocation. It is, however, the case that these procedures are defined strictly for internal use in FFA, and hence do not constitute a danger to the operator. After giving the matter some thought, I took diana_coman’s suggestion and added comments to warn the reader of the potential rake he could step on if he were to insist on making direct use of FFA’s internal shift routines.

I will take this opportunity to remind the reader that FFA is designed to be “safe if used as prescribed”: if it is invoked via the provided external interface, the promised semantics are guaranteed to apply. The only prohibited operations are ones which over- or under-run the FFACalc stack, demand a division by zero, or attempt to violate other FFACalc rules. (These will bring the program to an orderly stop, and warn the operator.) All other actions will produce arithmetically-correct outputs for the given inputs. However it is impractical on extant iron to make this guarantee for each of the internal components taken separately!

This is why we want sane iron, with inexpensive bounds-checking instructions! But we do not have it yet. Hence, the reader who wishes to make use of FFA internals for some custom purpose of his own, must proceed with extreme caution.

Reader diana_coman also observed that Get_Argument in FFACalc’s command line handler:

   procedure Get_Argument(Number : in  Natural;
                          Result : out String);

… can be turned into the stricter type:

   procedure Get_Argument(Number : in  Natural;
                          Result : out CmdLineArg);

I have included the change in this Chapter.

Thank you for these nitpicks, diana_coman! And for reading and signing Chapters 3, 4, 5, and 6:

Reader mircea_popescu observed that Chapter 13’s FZ_Measure can be slightly simplified, where:

Index := W_Mux(Index + 1, Index, W_ZeroP(W));

… can be safely turned into the equivalent:

Index := Index + W_NZeroP(W);

I have included the change in this Chapter. Thank you, reader mircea_popescu !

Now, let’s eat the meat of this Chapter.

We’ll start with a very minor extension of FFACalc. A Version command has been introduced:

Op Description # Ins # Outs Notes
V Put the FFACalc and FFA version numbers on the stack. 0 2 Kelvin Versioning is in use.

The implementation of this command is quite straightforward:


               -- Put the FFACalc Program Version on the stack,
               -- followed by FFA Program Version.
            when 'V' =>
               -- FFACalc Version:
               FFA_FZ_Clear(Stack(SP - 1));
               FFA_FZ_Set_Head(Stack(SP - 1), Word(FFACalc_K_Version));
               -- FFA Version:
               FFA_FZ_Set_Head(Stack(SP), Word(FFA_K_Version));


package Version is
   pragma Pure;
   -- Current 'deg. Kelvin' Version of FFACalc --
   FFACalc_K_Version : constant Natural := 255;
end Version;


   -- ...
   --- Current 'deg. Kelvin' Version of FFA
   FFA_K_Version : constant Natural := 255;
   -- ...

The effect: FFACalc and FFA now have independent “Degrees Kelvin” versions — i.e. they are to decrement by one upon every published revision to each respective program. Observe that this constitutes a promise to the reader: no more than 255 changes to either FFACalc or FFA are to be published after this Chapter. In the quite unlikely event where a change is found to be required after a Kelvin version reaches zero degrees, it is expected that the program is to be renamed, Vtronically-reground, and some very pointed questions posed to the maintainer!

Now, let’s proceed to the originally-planned subject of Chapter 14B: the Ada implementation of Barrett’s Modular Reduction.

high voltage

Don’t even think about proceeding further into this Chapter if you have not fully read and understood the two previous chapters:

Stop now and go back, study! Lest you become a danger to yourself and others.

We will now discuss the Ada implementation of the Algorithm 2 given in Chapter 14A. Please print the Algorithm and the physical bounds proof and refer to these while reading this Chapter.

Let’s start with the relatively-obvious:


package FZ_Barr is
   pragma Pure;
   -- Precomputed data for Barrett's Modular Reduction
   type Barretoid(ZXMLength       : Indices;
                  BarretoidLength : Indices) is
         ZXM            : FZ(1 .. ZXMLength);       -- Zero-Extended Modulus
         J              : FZBit_Index;              -- Jm
         B              : FZ(1 .. BarretoidLength); -- The Barrettoid itself
         ZSlide         : FZBit_Index;              -- Amount to slide Z
         Degenerate     : WBool;                    -- Is it degenerate case?
      end record;
   -- Prepare the precomputed Barrettoid corresponding to a given Modulus
   procedure FZ_Make_Barrettoid(Modulus    : in  FZ;
                                Result     : out Barretoid)
     with Pre => Result.B'Length = 2 * Modulus'Length and
     Result.ZXM'Length = Modulus'Length + 1;
   -- Reduce N using the given precomputed Barrettoid.
   procedure FZ_Barrett_Reduce(X          : in     FZ;
                               Bar        : in     Barretoid;
                               XReduced   : in out FZ);
   pragma Inline_Always(FZ_Barrett_Reduce);
end FZ_Barr;

In every instance of the pre-computed Barrettoid data structure, we will keep everything which is required for Barrett’s Modular Reduction by a given modulus. In particular, we will retain ZXM: a zero-extended (for convenient use in steps 6 and 8) copy of the modulus itself; the parameter JM; the Barrettoid proper, BM; ZSlide, the number of bits we must right-shift Z by to compute ZS; and, finally, the degeneracy indicator, i.e. DM.

Unsurprisingly, a Barrettoid is computed from a given modulus with the procedure FZ_Make_Barrettoid. All Barrettoids — like other FFA data — will exist as stack-allocations. (Heapism in any form whatsoever is forever banned in FFA.) And the only use of a Barrettoid is to compute Barrett’s Modular Reduction, using FZ_Barrett_Reduce. We will review both procedures in detail, below.

Here is how we create a Barrettoid:


   -- Prepare the precomputed Barrettoid corresponding to a given Modulus
   procedure FZ_Make_Barrettoid(Modulus    : in  FZ;
                                Result     : out Barretoid) is
      -- Length of Modulus and Remainder
      Lm : constant Indices := Modulus'Length;
      -- Remainder register, starts as zero
      Remainder : FZ(1 .. Lm) := (others => 0);
      -- Length of Quotient, with an extra Word for top bit (if Degenerate)
      Lq : constant Indices := (2 * Lm) + 1;
      -- Valid indices into Quotient, using the above
      subtype Quotient_Index is Word_Index range 1 .. Lq;
      -- The Quotient we need, i.e. 2^(2 * ModulusBitness) / Modulus
      Quotient : FZ(Quotient_Index);
      -- Permissible 'cuts' for the Slice operation
      subtype Divisor_Cuts is Word_Index range 2 .. Lm;
      -- Current bit of Pseudo-Dividend; high bit is 1, all others 0
      Pb  : WBool := 1;
      -- Performs Restoring Division on a given segment
      procedure Slice(Index : Quotient_Index;
                      Cut   : Divisor_Cuts;
                      Bits  : Positive) is
            -- Borrow, from comparator
            C   : WBool;
            -- Left-Shift Overflow
            LsO : WBool;
            -- Current cut of Remainder register
            Rs  : FZ renames Remainder(1 .. Cut);
            -- Current cut of Divisor
            Ds  : FZ renames   Modulus(1 .. Cut);
            -- Current Word of Quotient being made, starting from the highest
            W   : Word := 0;
            -- Current bit of Quotient (inverted)
            nQb : WBool;
            -- For each bit in the current Pseudo-Dividend Word:
            for b in 1 .. Bits loop
               -- Advance Rs, shifting in the current Pseudo-Dividend bit:
               FZ_ShiftLeft_O_I(N        => Rs,
                                ShiftedN => Rs,
                                Count    => 1,
                                OF_In    => Pb, -- Current Pseudo-Dividend bit
                                Overflow => LsO);
               -- Subtract Divisor-Cut from R-Cut; Underflow goes into C:
               FZ_Sub(X => Rs, Y => Ds, Difference => Rs, Underflow => C);
               -- Negation of current Quotient bit
               nQb := C and W_Not(LsO);
               -- If C=1, the subtraction underflowed, and we must undo it:
               FZ_Add_Gated(X => Rs, Y => Ds, Sum => Rs,
                            Gate => nQb);
               -- Save the bit of Quotient that we have found:
               W := Shift_Left(W, 1) or (1 - nQb); -- i.e. inverse of nQb
            end loop;
            -- We made a complete Word of the Quotient; save it:
            Quotient(Quotient'Last + 1 - Index) := W; -- Indexed from end
      end Slice;
      pragma Inline_Always(Slice);
      -- Measure of the Modulus
      ModulusMeasure : constant FZBit_Index := FZ_Measure(Modulus);
      -- First, process the high Word of the Pseudo-Dividend:
      Slice(1, 2, 1); -- ... it has just one bit: a 1 at the bottom position
      -- Once we ate the top 1 bit of Pseudo-Dividend, rest of its bits are 0
      Pb := 0;
      -- Process the Modulus-sized segment below the top Word:
      for i in 2 .. Lm - 1 loop
         Slice(i, i + 1, Bitness); -- stay ahead by a word to handle carry
      end loop;
      -- Process remaining Words:
      for i in Lm .. Lq loop
         Slice(i, Lm, Bitness);
      end loop;
      -- Record the Quotient (i.e. the Barrettoid proper, Bm)
      Result.B                    := Quotient(Result.B'Range);
      -- Record whether we have the Degenerate Case (1 iff Modulus = 1)
      Result.Degenerate           := W_NZeroP(Quotient(Lq));
      -- Record a copy of the Modulus, extended with zero Word:
      Result.ZXM(Modulus'Range)   := Modulus;
      Result.ZXM(Result.ZXM'Last) := 0;
      -- Record the parameter Jm:
      Result.J                    := ModulusMeasure - 1;
      -- Wm - Jm
      Result.ZSlide :=
        FZBit_Index(Bitness * Modulus'Length) - ModulusMeasure + 1;
   end FZ_Make_Barrettoid;

The process may seem complicated, but it is merely a specialized form of FZ_Mod. With the difference that we are interested in the quotient, rather the remainder, and also wish to compute certain additional parameters corresponding to the given modulus.

Recall that a Barrettoid BM for a given modulus M, was defined as the quantity ⌊2k / M⌋. In FZ_Make_Barrettoid, we compute it via Knuth’s division. Afterwards we record: the modulus itself; the quotient; whether the given modulus corresponds to the degenerate case M = 1; the parameter JM; and the parameter ZSlide. After this, the contents of the Barrettoid can be used to perform modular reduction modulo M in constant time.

And now, let’s show exactly how:


   -- Reduce X using the given precomputed Barrettoid.
   procedure FZ_Barrett_Reduce(X          : in     FZ;
                               Bar        : in     Barretoid;
                               XReduced   : in out FZ) is
      -- Wordness of X, the quantity being reduced
      Xl      : constant Indices := X'Length;
      -- Wordness of XReduced (result), one-half of Xl, and same as of Modulus
      Ml      : constant Indices := XReduced'Length; -- i.e. # of Words in Wm.
      -- The Modulus we will reduce X by
      Modulus : FZ renames Bar.ZXM(1 .. Ml);
      -- Shifted X
      Xs      : FZ(X'Range);
      -- Z := Xs * Bm (has twice the length of X)
      Z       : FZ(1 .. 2 * Xl);
      -- Upper 3Wm-bit segment of Z that gets shifted to form Zs
      ZHi     : FZ renames   Z(Ml       + 1  ..  Z'Last       );
      -- Middle 2Wm-bit segment of Z, that gets multiplied by M to form Q
      Zs      : FZ renames   Z(Z'First  + Ml ..  Z'Last  - Ml );
      -- Sub-terms of the Zs * M multiplication:
      ZsLo    : FZ renames  Zs(Zs'First      .. Zs'Last  - Ml );
      ZsHi    : FZ renames  Zs(Zs'First + Ml .. Zs'Last       );
      ZsHiM   : FZ(1 .. Ml);
      -- Q := Modulus * Zs, i.e. floor(X / M)*M + E*M
      Q       : FZ(1 .. Xl);
      QHi     : FZ renames   Q(Q'First  + Ml ..  Q'Last       );
      -- R is made one Word longer than Modulus (see proof re: why)
      Rl      : constant Indices := Ml + 1;
      -- Reduction estimate, overshot by 0, 1, or 2 multiple of Modulus
      R       : FZ(1 .. Rl);
      -- Scratch for the outputs of the gated subtractions
      S       : FZ(1 .. Rl);
      -- Borrow from the gated subtractions
      C       : WBool;
      -- Barring cosmic ray, no underflow can take place in (4) and (5)
      NoCarry : WZeroOrDie := 0;
      -- Result is initially zero (and will stay zero if Modulus = 1)
      -- (1) Ns := X >> Jm
      FZ_Quiet_ShiftRight(N => X, ShiftedN => Xs, Count => Bar.J);
      -- (2) Z  := X * Bm
      FZ_Multiply_Unbuffered(X => Bar.B, Y => Xs, XY => Z);
      -- (3) Zs := Z >> 2Wm - Jm (already thrown lower Wm, so only Wm - Jm now)
      FZ_Quiet_ShiftRight(N => ZHi, ShiftedN => ZHi, Count => Bar.ZSlide);
      -- (4) Q  := Zs * M (this is split into three operations, see below)
      -- ... first, Q := ZsLo * M,
      FZ_Multiply_Unbuffered(ZsLo, Modulus, Q);
      -- ... then, compute ZsHiM := ZsHi * M,
      FZ_Low_Multiply_Unbuffered(ZsHi, Modulus, ZsHiM);
      -- ... finally, add ZsHiM to upper half of Q.
      FZ_Add_D(X => QHi, Y => ZsHiM, Overflow => NoCarry);
      -- (5) R  := X - Q (we only need Rl-sized segments of X and Q here)
      FZ_Sub(X => X(1 .. Rl), Y => Q(1 .. Rl),
             Difference => R, Underflow => NoCarry);
      -- (6) S1 := R - M, C1 := Borrow (1st gated subtraction of Modulus)
      FZ_Sub(X => R, Y => Bar.ZXM, Difference => S, Underflow => C);
      -- (7) R := {C1=0 -> S1, C1=1 -> R}
      FZ_Mux(X => S, Y => R, Result => R, Sel => C);
      -- (8) S2 := R - M, C2 := Borrow (2nd gated subtraction of Modulus)
      FZ_Sub(X => R, Y => Bar.ZXM, Difference => S, Underflow => C);
      -- (9) R := {C2=0 -> S2, C2=1 -> R}
      FZ_Mux(X => S, Y => R, Result => R, Sel => C);
      -- (10) RFinal := {DM=0 -> R, DM=1 -> 0} (handle the degenerate case)
      FZ_Mux(X => R(1 .. Ml), Y => XReduced, Result => XReduced,
             Sel => Bar.Degenerate); -- If Modulus = 1, then XReduced is 0.
   end FZ_Barrett_Reduce;

Notice anything unfamiliar? The astute reader will observe that the above is an exact implementation of the process described in Chapter 14A-Bis; the only new subcomponent is the FZ_Low_Multiply_Unbuffered routine used in Step 4. So let’s learn what it’s made of.

But first, review the elementary multiplication equivalence from Chapter 10:

× YLo YHi
+ XLoYHi
+ XHiYLo
+ XHiYHi
= XY

Suppose, however, that we were only interested in calculating the bottom half of XY. We can then write the following schematic instead:

+ (XLoYHi)Lo
+ (XHiYLo)Lo
= XYLo

Observe that this method is exactly analogous to the mechanism used in Chapter 9’s Mul_Word, where we find the lower half of a Word × Word multiplication by:


      -- ........
      -- XL * YL
      LL : constant Word := Mul_HalfWord_Iron(XL, YL);
      -- XL * YH
      LH : constant Word := Mul_HalfWord_Iron(XL, YH);
      -- XH * YL
      HL : constant Word := Mul_HalfWord_Iron(XH, YL);
      -- ........
      -- Get the bottom half of the Product:
      XY_LW := LL + Shift_Left(LH + HL, HalfBitness);
      -- ........

So let’s now see how this works for FZ rather than Word:


-- "Low Multiplication" computes only the bottom half of the product XY.
-- Presently, it is used solely in Barrett's Modular Reduction.
package body FZ_LoMul is
   -- Low-Only Comba's multiplier. (CAUTION: UNBUFFERED)
   procedure FZ_Low_Mul_Comba(X     : in  FZ;
                              Y     : in  FZ;
                              XY    : out FZ) is
      -- Words in each multiplicand (and also in the half-product)
      L : constant Word_Index := X'Length;
      -- 3-word Accumulator
      A2, A1, A0 : Word := 0;
      -- Compute the lower half of the Product, which is all we want:
      for N in 0 .. L - 1 loop
         -- Compute the Nth (indexed from zero) column of the Half-Product
            -- The outputs of a Word multiplication
            Lo, Hi : Word;
            -- Carry for the Accumulator addition
            C      : WBool;
            -- Sum for Accumulator addition
            Sum    : Word;
            -- For lower half of XY, will go from 0 to N
            -- For upper half of XY, will go from N - L + 1 to L - 1
            for j in 0 .. N loop
               -- Hi:Lo := j-th Word of X  *  (N - j)-th Word of Y
               Mul_Word(X(X'First + j),
                        Y(Y'First - j + N),
                        Lo, Hi);
               -- Now add Hi:Lo into the Accumulator:
               -- A0 += Lo; C := Carry
               Sum := A0 + Lo;
               C   := W_Carry(A0, Lo, Sum);
               A0  := Sum;
               -- A1 += Hi + C; C := Carry
               Sum := A1 + Hi + C;
               C   := W_Carry(A1, Hi, Sum);
               A1  := Sum;
               -- A2 += A2 + C
               A2  := A2 + C;
            end loop;
            -- We now have the Nth (indexed from zero) word of XY
            XY(XY'First + N) := A0;
            -- Right-Shift the Accumulator by one Word width:
            A0 := A1;
            A1 := A2;
            A2 := 0;
      end loop;
   end FZ_Low_Mul_Comba;
   -- Low-Only Multiplier. (CAUTION: UNBUFFERED)
   procedure Low_Mul(X  : in  FZ;
                     Y  : in  FZ;
                     XY : out FZ) is
      -- L is the wordness of a multiplicand. Guaranteed to be a power of two.
      L : constant Word_Count := X'Length;
      -- K is HALF of the length of a multiplicand.
      K : constant Word_Index := L / 2;
      -- A 'KSeg' is the same length as HALF of a multiplicand.
      subtype KSeg is FZ(1 .. K);
      -- The two K-sized variables of the half-product equation:
      LH, HL : KSeg;
      -- Bottom and Top K-sized halves of the multiplicand X.
      XLo        : KSeg  renames    X(  X'First       ..   X'Last - K );
      XHi        : KSeg  renames    X(  X'First + K   ..   X'Last     );
      -- Bottom and Top K-sized halves of the multiplicand Y.
      YLo        : KSeg  renames    Y(  Y'First       ..   Y'Last - K );
      YHi        : KSeg  renames    Y(  Y'First + K   ..   Y'Last     );
      -- Top K-sized half of the half-product XY.
      XYHi       : KSeg  renames   XY( XY'First + K   ..  XY'Last     );
      -- Carry from individual term additions.
      C          : WBool;
      pragma Unreferenced(C);
      -- Recurse to FULL-width multiplication: XY := XLo * YLo
      FZ_Multiply_Unbuffered(XLo, YLo, XY);
      -- Recurse to HALF-width multiplication: LH := XLo * YHi
      FZ_Low_Multiply_Unbuffered(XLo, YHi, LH);
      -- Recurse to HALF-width multiplication: HL := XHi * YLo
      FZ_Low_Multiply_Unbuffered(XHi, YLo, HL);
      -- XY += 2^(K * Bitness) * LH
      FZ_Add_D(X => XYHi, Y => LH, Overflow => C);
      -- XY += 2^(K * Bitness) * HL
      FZ_Add_D(X => XYHi, Y => HL, Overflow => C);
   end Low_Mul;
   -- CAUTION: Inlining prohibited for Low_Mul !
   -- Low-Only Multiplier. (CAUTION: UNBUFFERED)
   procedure FZ_Low_Multiply_Unbuffered(X     : in  FZ;
                                        Y     : in  FZ;
                                        XY    : out FZ) is
      -- The length of either multiplicand
      L : constant Word_Count := X'Length;
      if L < = Low_Mul_Thresh then
         -- Base case:
         FZ_Low_Mul_Comba(X, Y, XY);
         -- Recursive case:
         Low_Mul(X, Y, XY);
      end if;
   end FZ_Low_Multiply_Unbuffered;

FZ_Low_Mul_Comba, of course, is merely a cut-down Comba from Chapter 9; while the recursion is analogous to the one in Chapter 10's Karatsuba and Chapter 12's Square Karatsuba.

Now, let's see where Barrett's Reduction is put to use. You will recall the conclusion of Chapter 12B, where we discussed the fact that the use of Knuth's division for modular reduction is quite expensive, and constitutes the bulk of the cost of modular exponentiation as presented in Chapters 6 through 13. And now, at last, we can write a fast modular reducer -- one that uses Barrett's method instead of Knuth's division:


   -- (Barrettronic) Modular Exponent: Result := Base^Exponent mod Modulus
   procedure FZ_Mod_Exp(Base     : in  FZ;
                        Exponent : in  FZ;
                        Modulus  : in  FZ;
                        Result   : out FZ) is
      -- Double-width scratch buffer for the modular operations
      D   : FZ(1 .. Base'Length * 2);
      -- Working register for the squaring; initially is copy of Base
      B   : FZ(Base'Range) := Base;
      -- Register for the Mux operation
      T   : FZ(Result'Range);
      -- Buffer register for the Result
      R   : FZ(Result'Range);
      -- Space for Barrettoid
      Bar : Barretoid(ZXMLength       => Modulus'Length + 1,
                      BarretoidLength => 2 * B'Length);
      -- First, pre-compute the Barretoid for the given Modulus:
      FZ_Make_Barrettoid(Modulus => Modulus, Result => Bar);
      -- Result := 1
      WBool_To_FZ(1, R);
      -- For each Word of the Exponent:
      for i in Exponent'Range loop
            -- The current Word of the Exponent
            Wi : Word := Exponent(i);
            -- For each bit of Wi:
            for j in 1 .. Bitness loop
               -- T := Result * B mod Modulus
               FZ_Multiply_Unbuffered(X => R, Y => B, XY => D);
               FZ_Barrett_Reduce(X => D, Bar => Bar, XReduced => T);
               -- Sel is the current bit of Exponent;
               --    When Sel=0 -> Result := Result;
               --    When Sel=1 -> Result := T
               FZ_Mux(X => R, Y => T, Result => R, Sel => Wi and 1);
               -- Advance to the next bit of Wi (i.e. next bit of Exponent)
               Wi := Shift_Right(Wi, 1);
               -- B := B^2 mod Modulus
               FZ_Square_Unbuffered(X => B, XX => D);
               FZ_Barrett_Reduce(X => D, Bar => Bar, XReduced => B);
            end loop;
      end loop;
      -- Output the Result:
      Result := R;
   end FZ_Mod_Exp;

Quite straightforward: we precompute the Barrettoid, and use it for all of the necessary modular multiplications and squarings modulo the given modulus.

The reader may be curious regarding how to properly test this program. And so I will invite him to download a complete package of test tapes, generated using a RNG:

Each modular exponentiation test tape was mechanically-produced on Chapter 13 FFA, and contains a series of modular exponentiations, each followed by an equality comparison with the expected result. E.g., a 2048-bit test:

={(do nothing if ok)}{[SAD ]}_

One invokes the tapes as follows, e.g. for the 1024-bit 10,000 shot tape:

$ time cat 10k_shots_1024bit_ffa_unif_rnd.tape | ./bin/ffa_calc 1024 32

... and if successful (i.e. all outputs are correct) it will emit only the output of the unix "time" command; e.g. on my test iron:

 real    7m24.751s
 user    7m24.081s
 sys     0m0.290s

Now, the reader has probably read Dijkstra and recalls that "testing can reveal the presence of bugs, but never their absence." So why bother? The answer is, it is necessary to test your iron.

The test tapes in the signed TAR come in two variants, slid (where there are randomly-sized stretches of leading zeroes in the arguments to modular exponentiation) and uniform -- where there are not. You can use the test tapes as a litmus of whether your iron provides a constant-time iron multiplier and a constant-time barrel shifter. If you find that the "slid" tapes reliably execute faster on your machine than the "unif" tapes of the same respective FFA width, you have sad iron and must enable the workarounds (i.e. Mul_HalfWord_Soft and/or HaveBarrelShifter := False.)

Now let's find out what we actually achieve by using Barrett's Reduction:

Ch14 modexp timing

Or, for those who prefer the raw numbers to the logarithmic plot,

Cost of one modular exponentiation operation (sec):
FFA Bitness Ch.13 (Conventional Knuth Mod. Red.) Ch.14 (Barrettronic Mod. Red.)
1024 0.395 0.043
2048 2.895 0.276
4096 21.895 1.703
8192 169.394 10.400

It would appear that "the game was worth the candles" -- we now have (AFAIK: the first and only presently-published...) fully constant-time Barrettron. And it is one that (with reasonable effort on the reader's part) fits-in-head.

In the next chapter, 15, we will begin to assemble the necessary ingredients for the generation of cryptographic primes. Please stay tuned!

~To be continued!~

This entry was written by Stanislav , posted on Thursday December 27 2018 , filed under Ada, Bitcoin, Cold Air, Computation, Cryptography, FFA, Friends, Mathematics, ShouldersGiants, SoftwareArchaeology, SoftwareSucks . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

3 Responses to ““Finite Field Arithmetic.” Chapter 14B: Barrett’s Modular Reduction. (Part 2 of 2.)”

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

Please prove that you are human:

105 xor 112 = ?

What is the serial baud rate of the FG device ?

// Script to allow anchoring of user-selected content on html pages. // Original idea deployed by http://archive.today // Packaged for WordPress on http://trilema.com/2015/that-spiffy-selection-thing/