The Return of Phuctor!

I have the pleasure of informing my readers that…

Phuctor is back!

It — exactly as it was, but with a few minor fix-ups for browsing speed — now lives on a very spiffy 32-core Opteron at Pizarro, the ISP.

The WWW UI is already up; the factoring proper will resume later tonight.

“Finite Field Arithmetic.” Chapter 10: Introducing Karatsuba’s Multiplication.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch10_karatsuba.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


First things first:

Manul Inspection.

  • The mail bag from Chapter 9 will be dealt with in Chapter 11.
  • This Chapter may require a cup of coffee.
  • This Chapter is dedicated to my most dedicated readers, such that I did not even know that I had, who publicly raged when it did not appear on schedule.


Recall the algebraic identities in Mul_Word of Chapter 9, where a single Word × Word multiplication is turned into four HalfWord × HalfWord ones. Let’s consider the general case.

Suppose that b is a positive integer, and we would like to multiply a 2b-bit-wide X by a 2b-bit-wide Y. We can express the multiplicands algebraically as follows:


X = XLo + 2bXHi
Y = YLo + 2bYHi

…where NLo consists of any 2b-wide integer N’s least-significant b binary digits, while NHi consists of its b most-significant binary digits. The multiplication itself can then be written as:


XY = (XLo + 2bXHi)(YLo + 2bYHi)
.. = XLoYLo + 2bXLoYHi + 2bXHiYLo + 2bXHi2bYHi
.. = XLoYLo + 2b(XLoYHi + XHiYLo) + 22bXHiYHi

…or pictured “physically”, drawn to scale (suppose that the most junior bit position in a given register is at the left edge of the rectangle representing said register; the most senior — at the right edge) :

XLo XHi
× YLo XHi
=
XLoYLo
+ XLoYHi
+ XHiYLo
+ XHiYHi
= XY

Performing this “schoolbook” 2b × 2b multiplication costs four b × b-wide multiplications (multiplications by powers of two are performed using inexpensive position shifting and do not figure towards this count) and three 2b-wide additions. This cost, as we learned in previous chapters, is proportional to the square of b. But in fact it is possible to do better — using a subquadratic multiplication algorithm.

We will make use of the oldest, best-known, and simplest such algorithm: the classic one derived from A. A. Karatsuba’s (1937 – 2008) “Multiplication of many-digital numbers by automatic computers”, Dokl. Akad. Nauk SSSR, 145:2 (1962), 293–294.

After working through this Chapter, the reader will possess a fits-in-head, constant-spacetime and provably-correct computerized implementation of Karatsuba’s algorithm.

First, let’s specify that our multiplicands X and Y are FZ-integers, and each of them occupies exactly L Words.

Recall that in Chapter 4, we have mandated that no FZ must come into existence with a bitwise length that is not a positive power of two. Therefore, at all times, a FZ will break cleanly into two halves of equal bitnesses, as many times as desired until halves each consisting of a FZ one Word in length are formed.

And so, at every recursive (see further below) invocation of our procedure, we are able to begin by finding a positive integer k (where L = 2k) i.e. the length of a multiplicand cut in half.

Thereby we can take b = MachineBitness × k, where k = L / 2.

Let’s also define, for brevity, the following recurring terms:


LL = XLoYLo
HH = XHiYHi
MM = XLoYHi + XHiYLo

And trivially obtain the equation:


XY = LL + 2bMM + 22bHH

Karatsuba’s discovery is the fact that it is possible to compute the MM middle term using only one b × b multiplication in place of two. In the most common implementations of the algorithm, this appears as the equivalence:


MM = XLoYHi + XHiYLo
.. = (XLo + XHi)(YLo + YHi) - HH - LL

This algebraic identity replaces an expensive operation (multiplication) with three cheap (addition/subtraction) arithmetical operations. However, the possibility of generating a carry during addition when computing the terms of the binomial makes the mechanism slightly slower and uglier than it strictly has to be. So we will instead use another variant of the algorithm, sometimes called subtractive Karatsuba. It makes use of the following trivial equivalence:


MM = LL + HH - (XLo - XHi)(YLo - YHi)

In FFA’s realization, this will take the following — at first, seemingly gnarlier — form:


Dx = |XLo - XHi|
Dy = |YLo - YHi|
DD = Dx × Dy
MM = LL + HH - (-1CX)(-1CY)DD

…where CX is the sign (in the convention where it equals 1 if XLo is less than XHi, and 0 otherwise) of XLo – XHi and CY is the sign (similarly) of YLo – YHi.

And it is not difficult to show that, if we take


DDSub = CX XNOR CY

then:


MM = LL + HH + (-1DDSub)DD

Or, for kindergarten:

Or, for the thick, for the impatient,
CX CY DDSub (-1DDSub)DD MM =
0 0 1 -DD LL + HH - DD
0 1 0 DD LL + HH + DD
1 0 0 DD LL + HH + DD
1 1 1 -DD LL + HH - DD

…i.e. the DD term ends up subtracted if (XLo – XHi)(YLo – YHi) is positive; otherwise it gets added. Which is as it should be. And this gives us the formula:


XY = LL + 2b(LL + HH + (-1DDSub)DD) + 22bHH

And now let’s draw a possible shape for this mechanism “electrically”. First, let’s do the obvious Right Thing and begin by stuffing LL and HH into the lower and upper, respectively, halves of the register in which we are computing the sum. And after that, we will add the “middle” terms, at the requisite offset (i.e. at the b-th binary place) :

LL HH
+ LL 0
+ HH 0
+ (-1DDSub)DD 0
= XY

The above would work as a physical realization, but our 2b-sized additions sadly became 3b-sized, because of the need to ripple out the carries (recall that the addition of any two s-sized integers is physically s+1-sized, as there is the possibility of overflow.) But can we do anything about this, or do we have to live with it?

Turns out: there is a pill. One economical way to make the additions stay 2b-sized, is to accumulate the carries. We will put them in a Word-sized register called TC (i.e. tail carry). After the three terms of the original equation have been added into the result XY, we will make the result correct by rippling out (at the cost of walking over a b-sized space) the accumulated carry in TC into the tail (i.e. most senior quarter segment) of the result XY.

The new scheme looks like this:

LL HH TC := 0
+ LL TC += Carry
+ HH TC += Carry
+ (-1DDSub)DD TC += Carry – DDSub
+ TC
= XY

The reason why, after the third addition, we must…


TC += Carry - DDSub

…is not uninteresting, and I am quite tempted to leave it as an exercise. And in fact I think I will:


Chapter 10, Exercise 1: Why is it necessary to subtract DDSub from TC ?


Observe that we can simplify the program we are about to write, by proving the lemma:


TC >= 0

The fact that TC is greater than or equal to zero by the time it gets rippled, trivially follows from the fact that:


MM = XLoYHi + XHiYLo

There is no way for the value of above expression to become negative, given that no subtraction happens in it, and that both of the multiplicands in both of the terms, are positive! From which it necessarily follows that, given as…


MM = LL + HH + (-1DDSub)DD

…it must then at all times be true that:


LL + HH >= (-1DDSub)DD

And therefore it will never become necessary to propagate a “borrow” when rippling TC into the tail, and we can use a strictly “additive” formulation in the pertinent code.

But what is the maximum possible value of TC ? It so happens that the answer is: two. And we can show… but wait. Why not leave this as an exercise ! So :


Chapter 10, Exercise 2: Can the value of TC ever become greater than 2 ? Give an example of a Karatsubaization where it does; or if it cannot, prove that it cannot.


And now let’s put all of this together into a mechanism:

FZ_Mul.adb:

   procedure Mul_Karatsuba(X  : in  FZ;
                           Y  : in  FZ;
                           XY : out FZ) is
 
      -- L is the wordness of a multiplicand. Guaranteed to be a power of two.
      L : constant Word_Count := X'Length;
 
      -- An 'LSeg' is the same length as either multiplicand.
      subtype LSeg is FZ(1 .. L);
 
      -- K is HALF of the length of a multiplicand.
      K : constant Word_Index := L / 2;
 
      -- A 'KSeg' is the same length as HALF of a multiplicand.
      subtype KSeg is FZ(1 .. K);
 
      -- The three L-sized variables of the product equation, i.e.:
      -- XY = LL + 2^(K*Bitness)(LL + HH + (-1^DD_Sub)*DD) + 2^(2*K*Bitness)HH
      LL, DD, HH : LSeg;
 
      -- K-sized terms of Dx * Dy = DD
      Dx, Dy     : KSeg;  -- Dx = abs(XLo - XHi) , Dy = abs(YLo - YHi)
 
      -- Subtraction borrows, signs of (XL - XH) and (YL - YH),
      Cx, Cy     : WBool; -- so that we can calculate (-1^DD_Sub)
 
      -- Bottom and Top K-sized halves of the multiplicand X.
      XLo        : KSeg  renames    X(  X'First       ..   X'Last - K );
      XHi        : KSeg  renames    X(  X'First + K   ..   X'Last     );
 
      -- Bottom and Top K-sized halves of the multiplicand Y.
      YLo        : KSeg  renames    Y(  Y'First       ..   Y'Last - K );
      YHi        : KSeg  renames    Y(  Y'First + K   ..   Y'Last     );
 
      -- L-sized middle segment of the product XY (+/- K from the midpoint).
      XYMid      : LSeg  renames   XY( XY'First + K   ..  XY'Last - K );
 
      -- Bottom and Top L-sized halves of the product XY.
      XYLo       : LSeg  renames   XY( XY'First       ..  XY'Last - L );
      XYHi       : LSeg  renames   XY( XY'First + L   ..  XY'Last     );
 
      -- Topmost K-sized quarter segment of the product XY, or 'tail'
      XYHiHi     : KSeg  renames XYHi( XYHi'First + K .. XYHi'Last    );
 
      -- Whether the DD term is being subtracted.
      DD_Sub     : WBool;
 
      -- Carry from individual term additions.
      C          : WBool;
 
      -- Tail-Carry accumulator, for the final ripple
      TC         : Word;
 
   begin
 
      -- Recurse: LL := XL * YL
      FZ_Multiply(XLo, YLo, LL);
 
      -- Recurse: HH := XH * YH
      FZ_Multiply(XHi, YHi, HH);
 
      -- Dx := |XL - XH| , Cx := Borrow (i.e. 1 iff XL < XH)
      FZ_Sub_Abs(X => XLo, Y => XHi, Difference => Dx, Underflow => Cx);
 
      -- Dy := |YL - YH| , Cy := Borrow (i.e. 1 iff YL < YH)
      FZ_Sub_Abs(X => YLo, Y => YHi, Difference => Dy, Underflow => Cy);
 
      -- Recurse: DD := Dx * Dy
      FZ_Multiply(Dx, Dy, DD);
 
      -- Whether (XL - XH)(YL - YH) is positive, and so DD must be subtracted:
      DD_Sub := 1 - (Cx xor Cy);
 
      -- XY := LL + 2^(2 * K * Bitness) * HH
      XYLo := LL;
      XYHi := HH;
 
      -- XY += 2^(K * Bitness) * HH, but carry goes in Tail Carry accum.
      FZ_Add_D(X => XYMid, Y => HH, Overflow => TC);
 
      -- XY += 2^(K * Bitness) * LL, ...
      FZ_Add_D(X => XYMid, Y => LL, Overflow => C);
 
      -- ... but the carry goes into the Tail Carry accumulator.
      TC := TC + C;
 
      -- XY += 2^(K * Bitness) * (-1^DD_Sub) * DD
      FZ_Not_Cond_D(N => DD, Cond => DD_Sub); -- invert DD if 2s-complementing
      FZ_Add_D(OF_In    => DD_Sub,            -- ... and then must increment
               X        => XYMid,
               Y        => DD,
               Overflow => C); -- carry will go in Tail Carry accumulator
 
      -- Compute the final Tail Carry for the ripple
      TC := TC + C - DD_Sub;
 
      -- Barring a cosmic ray, 0 < = TC <= 2 .
      pragma Assert(TC <= 2);
 
      -- Ripple the Tail Carry into the tail.
      FZ_Add_D_W(X => XYHiHi, W => TC, Overflow => C);
 
      -- Barring a cosmic ray, the tail ripple will NOT overflow.
      pragma Assert(C = 0);
 
   end Mul_Karatsuba;

But… Does this routine work? How fast does it run? What do the new procedures like FZ_Not_Cond_D do? Where and when do we recurse ? Why did Comba’s Multiplier have to change at all !? And, most importantly, is the proof “proofy” enough ?? Find out in Chapter 11!


~To be continued!~

Dicţionar Român-Englez.

This post exists to give a permanent home and linkable reference point for certain materials. Specifically, a — to my knowledge, currently the only one found anywhere on the entire Netgrep-able plain-text English-Romanian dictionary.

The original source was an ancient piece of MS-Windows nagware. The perpetrator of this item (I hesitate to dignify his actions with the word “author”) took the — very clearly pre-1989, public domain — material, and embedded it in a crypted EXE “viewer” turd. Why he did this, can only be guessed at, but a certain observation by Mircea Popescu may shed some light on this particular type of psychopathology:


So : you have found somewhere something exceptional. A treasure trove. Looky, you were just walking down the road and found a chestful of wonder. What to do ? Well, confronted with such an occurence, some people proceed in the following manner : they run off to town, gather the folk and explain : “looky, I’ve found this chestful of wonder, so and so and back and forth, let’s go pick it up, stick it in a museum, so it may be inspected, admired, discussed, preserved for future generations so they may also gain by the said wonders”. Other… let’s call them people by virtue of bipedalism, proceed in a different manner : they cloink a coupla with the sledgehammer so as to break down the find into shards the size they can fit in a pocket, after which they stick it on their oxcart. Nevermind that those things aren’t made to go with oxcarts, nor do they help the oxcart in any way, the peon sits proudly on his pile of dung. And if anyone asks where he found the things, for they don’t really go with his general appearance, he answers just as proudly : he raised them since they were but pistols. (This is an ancient joke, saying that a gypsy is picked up for stealing a rifle. Before the judge, he sees a farmer brought under charges of having stolen a cow. “Why’ve you stolen the cow ?” asks the judge. “I’ve not stolen it your Honor, I raised it myself since it were a calf”. He’s eventually released, and when the gypsy’s turn comes, “Why’ve you stolen the rifle ?” asks the judge. “I’ve not stolen it your honor, I raised it myself since it were a pistol”.)

“What happens when you add a drop of sewage to a bottle of fine wine ?”

And so, with a big, fat, Fuck You to the oxcart rider, all of it (decrypted, deduplicated, and minus some obvious rubbish) is out in the open, permanently:

File Description SHA256
ro_eng_ascii.txt Romanian < -> English Dictionary. b10ac91292746466433b2cdb4ff67268b556463fe144e38cd20bde4057c0d5e3
ro_eng_tech.txt Romanian < -> English “Technical” Dictionary. 82f44453fe55847d5ee0e1b80697cf8a1e6fa088bf2d823ab79bb7b2a8ce8fc3
ro_eng_expr.txt Romanian < -> English “Expressions” Dictionary. c085054bf9f51d17d8d7c6a610b496b5086647391114c07cfb28a264fcb4dee0

The first of these items (the general-purpose dictionary) is also available as a V-Genesis:

ro_eng_genesis.vpatch

ro_eng_genesis.vpatch.asciilifeform.sig

The diacritics are conspicuous in their absence.
I suspect that the sledgehammer-wielder had removed them, for whatever reason of his own.

You can also flip any of these Romanian -> English dictionaries into an English -> Romanian one, e.g.:

paste -d'`' < (cut -f2 -d'`' ro_eng_ascii.txt) <(cut -f1 -d'`' ro_eng_ascii.txt) | sort -n > eng_ro_ascii.txt

In all three files, the backtick (`) is used to separate each line into the two respective languages; parsing and searching is quite trivial.

Edit: files now mirrored here.

“Finite Field Arithmetic.” Chapter 9: “Exodus from Egypt” with Comba’s Algorithm.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch9_exodus.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


The “Egyptian” algorithm for multiplication (in its simplest Chapter 5 version, and likewise in the refined form given in Chapter 7) has a time cost that is proportional to the square of the operand bitness — i.e. it runs in O(N^2), or quadratic time. In Chapter 10, we will meet a substantially faster — subquadratic — algorithm for multiplication. However, for reasons which will become apparent to the reader, we would first like to obtain the fastest quadratic-time multiplier that we possibly can (without compromising any of the FFA design principles laid out in Chapter 1).

The ancient Egyptians came up with an easily-taught and reasonably efficient-method for multiplication. However, they ran it “on” human scribes, rather than automatic machinery. And it turns out that the latter, especially in the form of recently-made CPUs, is quite sensitive to the particular time-and-space “shape” of an algorithm. But we will expand on this point later.

Mathematician and astronomer Paul G. Comba (1926 — 2017) devised an elegant multiplication algorithm which superficially resembles the familiar “grade school” long-multiplication — but with the important difference that it does not require the generation of intermediate per-row temporary results. Instead, it generates each column — or digit — (or, in FFA parlance, Word) of the product, sequentially, from lowest to highest, using only a handful of words of memory for temporary storage. Let’s implement Comba’s algorithm:

FZ_Mul.ads:

with FZ_Type; use FZ_Type;
 
package FZ_Mul is
 
   pragma Pure;
 
   -- Comba's multiplier.
   procedure FZ_Mul_Comba(X     : in  FZ;
                          Y     : in  FZ;
                          XY_Lo : out FZ;
                          XY_Hi : out FZ);
   pragma Precondition(X'Length = Y'Length and
                         XY_Lo'Length = XY_Hi'Length and
                         XY_Lo'Length = ((X'Length + Y'Length) / 2));
 
end FZ_Mul;

FZ_Mul.adb:

with Words;    use Words;
with Word_Ops; use Word_Ops;
with W_Mul;    use W_Mul;
 
 
package body FZ_Mul is
 
   -- Comba's multiplier.
   procedure FZ_Mul_Comba(X     : in  FZ;
                          Y     : in  FZ;
                          XY_Lo : out FZ;
                          XY_Hi : out FZ) is
 
      -- Words in each multiplicand
      L : constant Word_Index := X'Length;
 
      -- Length of Product, i.e. double the length of either multiplicand
      LP : constant Word_Index := 2 * L;
 
      -- 3-word Accumulator
      A2, A1, A0 : Word := 0;
 
      -- Register holding Product; indexed from zero
      XY : FZ(0 .. LP - 1);
 
      -- Type for referring to a column of XY
      subtype ColXY is Word_Index range XY'Range;
 
      -- Compute the Nth (indexed from zero) column of the Product
      procedure Col(N : in ColXY; U : in ColXY; V : in ColXY) is
 
         -- The outputs of a Word multiplication
         Lo, Hi : Word;
 
         -- Carry for the Accumulator addition
         C      : WBool;
 
         -- Sum for Accumulator addition
         Sum    : Word;
 
      begin
 
         -- For lower half of XY, will go from 0 to N
         -- For upper half of XY, will go from N - L + 1 to L - 1
         for j in U .. V loop
 
            -- Hi:Lo := j-th Word of X  *  (N - j)-th Word of Y
            Mul_Word(X(X'First + j),
                     Y(Y'First - j + N),
                     Lo, Hi);
 
            -- Now add Hi:Lo into the Accumulator:
 
            -- A0 += Lo; C := Carry
            Sum := A0 + Lo;
            C   := W_Carry(A0, Lo, Sum);
            A0  := Sum;
 
            -- A1 += Hi + C; C := Carry
            Sum := A1 + Hi + C;
            C   := W_Carry(A1, Hi, Sum);
            A1  := Sum;
 
            -- A2 += A2 + C
            A2  := A2 + C;
 
         end loop;
 
         -- We now have the Nth (indexed from zero) word of XY
         XY(N) := A0;
 
         -- Right-Shift the Accumulator by one Word width:
         A0 := A1;
         A1 := A2;
         A2 := 0;
 
      end Col;
      pragma Inline_Always(Col);
 
   begin
 
      -- Compute the lower half of the Product:
      for i in 0 .. L - 1 loop
 
         Col(i, 0, i);
 
      end loop;
 
      -- Compute the upper half (sans last Word) of the Product:
      for i in L .. LP - 2 loop
 
         Col(i, i - L + 1, L - 1);
 
      end loop;
 
      -- The very last Word of the Product:
      XY(XY'Last) := A0;
 
      -- Output the Product's lower and upper FZs:
      XY_Lo := XY(0 .. L - 1);
      XY_Hi := XY(L .. XY'Last);
 
   end FZ_Mul_Comba;
   pragma Inline_Always(FZ_Mul_Comba);
 
end FZ_Mul;

I will omit the detailed analysis of why Comba’s Multiplication works, and leave the proof as an exercise for the reader. However, do observe that this routine demands a Mul_Word procedure. Which the reader will correctly guess, lives in the unit W_Mul:

W_Mul.ads:

with Words; use Words;
 
package W_Mul is
 
   pragma Pure;
 
   -- The bitness of a Half-Word
   HalfBitness : constant Positive := Bitness / 2;
   subtype HalfWord is Word range 0 .. 2**HalfBitness;
 
   -- The number of bytes in a Half-Word
   HalfByteness : constant Positive := Byteness / 2;
 
   -- Multiply half-words X and Y, producing a Word-sized product
   function Mul_HalfWord(X : in HalfWord; Y : in HalfWord) return Word;
 
   -- Get the bottom half of a Word
   function BottomHW(W : in Word) return HalfWord;
 
   -- Get the top half of a Word
   function TopHW(W : in Word) return HalfWord;
 
   -- Carry out X*Y mult, return lower word XY_LW and upper word XY_HW.
   procedure Mul_Word(X : in Word; Y : in Word; XY_LW : out Word; XY_HW : out Word);
 
end W_Mul;

W_Mul.adb (”Iron” Variant):

with W_Shifts; use W_Shifts;
 
 
package body W_Mul is
 
   function Mul_HalfWord(X : in HalfWord;
                         Y : in HalfWord) return Word is
   begin
      return X * Y;
   end Mul_HalfWord;
   pragma Inline_Always(Mul_HalfWord);
 
 
   -- Get the bottom half of a Word
   function BottomHW(W : in Word) return HalfWord is
   begin
      return W and (2**HalfBitness - 1);
   end BottomHW;
   pragma Inline_Always(BottomHW);
 
 
   -- Get the top half of a Word
   function TopHW(W : in Word) return HalfWord is
   begin
      return Shift_Right(W, HalfBitness);
   end TopHW;
   pragma Inline_Always(TopHW);
 
 
   -- Carry out X*Y mult, return lower word XY_LW and upper word XY_HW.
   procedure Mul_Word(X       : in  Word;
                      Y       : in  Word;
                      XY_LW   : out Word;
                      XY_HW   : out Word) is
 
      -- Bottom half of multiplicand X
      XL : constant HalfWord := BottomHW(X);
 
      -- Top half of multiplicand X
      XH : constant HalfWord := TopHW(X);
 
      -- Bottom half of multiplicand Y
      YL : constant HalfWord := BottomHW(Y);
 
      -- Top half of multiplicand Y
      YH : constant HalfWord := TopHW(Y);
 
      -- XL * YL
      LL : constant Word := Mul_HalfWord(XL, YL);
 
      -- XL * YH
      LH : constant Word := Mul_HalfWord(XL, YH);
 
      -- XH * YL
      HL : constant Word := Mul_HalfWord(XH, YL);
 
      -- XH * YH
      HH : constant Word := Mul_HalfWord(XH, YH);
 
      -- Carry
      CL : constant Word := TopHW(TopHW(LL) + BottomHW(LH) + BottomHW(HL));
 
   begin
 
      -- Get the bottom half of the Product:
      XY_LW := LL + Shift_Left(LH + HL, HalfBitness);
 
      -- Get the top half of the Product:
      XY_HW := HH + TopHW(HL) + TopHW(LH) + CL;
 
   end Mul_Word;
   pragma Inline_Always(Mul_Word);
 
end W_Mul;

Mul_Word, it turns out, is merely a Word*Word -> double-Word “schoolbook” multiplier. And: it is abundantly obvious that it could be replaced with a single IMUL instruction on x86/x86-64. But we won’t do this in standard FFA — recall the design principles from Chapter 1! We will not be “marrying” any particular iron.

A very sharp reader may have already noticed the “problematic” element in the above Mul_Word routine. Namely, Mul_HalfWord as given here creates a danger of timing sidechannel leakage on certain machine architectures!

Ostorozhno s vilami!

CPUs widely known to suffer from this “helpful” “optimization” include x86-compatible VIA, old Intels (386, 486), and most of the commonly-met PowerPC and ARM devices. Others archs, as well as obscure implementations of well-known archs, may also belong on this list. (Please write in if you happen to know of one such that I have not listed here.)

If FFA were to continue with the above variant (let’s call it iron) of Mul_HalfWord, every machine on which it is put to use would have to be first probed for non-constanttime iron multiplication: every newly-purchased CPU is arguably “guilty until proven innocent”. But it turns out that there is a simple and relatively-inexpensive cure for this headache:

W_Mul.adb (”Soft” Variant, non-unrolled):

   -- Multiply half-words X and Y, producing a Word-sized product
   function Mul_HalfWord(X : in HalfWord; Y : in HalfWord) return Word is
 
      -- X-Slide
      XS : Word := X;
 
      -- Y-Slide
      YS : Word := Y;
 
      -- Gate Mask
      GM : Word;
 
      -- The Product
      XY : Word := 0;
 
   begin
 
      -- For each bit of the Y-Slide:
      for b in 1 .. HalfBitness loop
 
         -- Compute the gate mask
         GM := 0 - (YS and 1);
 
         -- Perform the gated addition
         XY := XY + (XS and GM);
 
         -- Crank the next Y-slide bit into position
         YS := Shift_Right(YS, 1);
 
         -- Advance the X-slide by 1 bit
         XS := Shift_Left(XS, 1);
 
      end loop;
 
      -- Return the Product
      return XY;
 
   end Mul_HalfWord;
   pragma Inline_Always(Mul_HalfWord);

The above Mul_HalfWord replaces the “iron” multiplication with a miniature version of the Chapter 5 “Egyptian” algorithm. Because there is no multiwordism and no carry calculation, this safe (bitwise logical operations and shifts by one bit are constant-time on all CPUs known to me) alternative to the “leaky” MUL, when implemented in the partially unrolled form as follows — comes at a cost of a less than 6 percent (see further below) increase in required CPU time(see comments):

W_Mul.adb (”Soft” Variant, unrolled):

   -- Multiply half-words X and Y, producing a Word-sized product
   function Mul_HalfWord(X : in HalfWord; Y : in HalfWord) return Word is
 
      -- X-Slide
      XS : Word := X;
 
      -- Y-Slide
      YS : Word := Y;
 
      -- Gate Mask
      GM : Word;
 
      -- The Product
      XY : Word := 0;
 
      -- Performed for each bit of HalfWord's bitness:
      procedure Bit is
      begin
 
         -- Compute the gate mask
         GM := 0 - (YS and 1);
 
         -- Perform the gated addition
         XY := XY + (XS and GM);
 
         -- Crank the next Y-slide bit into position
         YS := Shift_Right(YS, 1);
 
         -- Advance the X-slide by 1 bit
         XS := Shift_Left(XS, 1);
 
      end Bit;
      pragma Inline_Always(Bit);
 
   begin
 
      -- For each bit of the Y-Slide (unrolled) :
      for b in 1 .. HalfByteness loop
 
         Bit; Bit; Bit; Bit; Bit; Bit; Bit; Bit;
 
      end loop;
 
      -- Return the Product
      return XY;
 
   end Mul_HalfWord;
   pragma Inline_Always(Mul_HalfWord);

And now, on the same machine as in previous Chapters, timed and plotted logarithmically:

Cost of Ch.9 'X' Operation, vs FFA Bitness.

Or, for those who prefer the raw numbers,

FFA Bitness Ch.6 Egyptian ‘X’ (sec) Ch.7 Egyptian ‘X’ (sec) Ch.9 “Iron” Comba ‘X’ (sec) Ch.9 “Soft” Comba ‘X’ (sec)
256 0.072 0.040 0.028 0.029
512 0.505 0.240 0.163 0.169
1024 3.685 1.672 1.125 1.177
2048 27.975 12.024 7.975 8.387
4096 217.966 90.439 59.356 62.725
8192 1720.127 699.979 457.103 483.993

Despite this small increase in runtime cost, standard FFA will from this point on use only the “soft” variant of Mul_HalfWord, so as to maintain the guarantee of constant-time operation on all known CPU architectures. Individual readers, can, of course, at their own risk — replace the entirety of Mul_Word — or even FZ_Mul_Comba — with an inline-assembly routine known to work safely on a particular CPU. But this type of optimization falls outside the scope of FFA as per the design principles introduced in Chapter 1.

~To be continued!~


Answer to Homework Problem 1 from Chapter 8:

Let’s suppose that you wish to test multiplication by generating a Python program via FFACalc. Your tape could look like this:

tape.txt:

??``*[print 0x]##[ == 0x]#[ * 0x]#

Which would produce a valid Python program that can be fed straight into a Python interpreter, e.g. :

./bin/ffa_calc 1048576 32 /dev/urandom < tape.txt | tr -d '\n' > multest.py

… will pseudorandomly generate two numbers, each being one megabit in size, and produce a statement regarding their (two-megabit-wide, at most) product, that can be verified by the Python interpreter, in turn producing an output after around half a minute on the test box used thus far.

Now let’s feed this output into Python:

$ python < multest.py

Prints:

True

... after less than one fifth of a second, on the same machine.

The bulk of the remaining material in this series will concern methods of bridging the speed gap between our routines and traditional "bignum" mechanisms (such as those used in the Python interpreter) to the extent possible without compromising any of the design principles of FFA.


Homework:

1. Prove that Comba's Algorithm works. And in particular, that the three-word Accumulator space given, will always suffice.

2. Given that both "Egyptian" and Comba's multiplication run in O(N^2), why does Comba's method win speedwise? Does it continue to win for all possible operand bitnesses, or only over a particular range?

3. Write an inline assembly variant of FZ_Mul_Comba for your CPU architecture. Demonstrate how you determined that the resulting program, when operating on the exponentiation test tape from Chapter 6, still runs in constant time (i.e. its CPU time cost is unaffected by any aspect of the input other than the choice of FFA bitness) on your particular machine.


~To be continued!~

“Finite Field Arithmetic.” Chapter 8: Interlude: Randomism.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch8_randomism.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


The “mail bag” from Chapter 7… is empty.

Well, not entirely empty: reader Apeloyee observed that FZ_Mod_Exp would produce the wrong answer in a hypothetical scenario where its output Result is permitted to overwrite the location containing the Modulus input operand. A revised variant is therefore as follows:

FZ_ModEx.adb:

   -- Modular Exponent: Result := Base^Exponent mod Modulus
   procedure FZ_Mod_Exp(Base     : in  FZ;
                        Exponent : in  FZ;
                        Modulus  : in  FZ;
                        Result   : out FZ) is
 
      -- Working register for the squaring; initially is copy of Base
      B : FZ(Base'Range)     := Base;
 
      -- Copy of Exponent, for cycling through its bits
      E : FZ(Exponent'Range) := Exponent;
 
      -- Register for the Mux operation
      T : FZ(Result'Range);
 
      -- Buffer register for the Result
      R : FZ(Result'Range);
 
   begin
      -- Result := 1
      WBool_To_FZ(1, R);
 
      -- For each bit of R width:
      for i in 1 .. FZ_Bitness(R) loop
 
         -- T := Result * B mod Modulus
         FZ_Mod_Mul(X => R, Y => B, Modulus => Modulus, Product => T);
 
         -- Sel is the current low bit of E;
         --    When Sel=0 -> Result := Result;
         --    When Sel=1 -> Result := T
         FZ_Mux(X => R, Y => T, Result => R, Sel => FZ_OddP(E));
 
         -- Advance to the next bit of E
         FZ_ShiftRight(E, E, 1);
 
         -- B := B*B mod Modulus
         FZ_Mod_Mul(X => B, Y => B, Modulus => Modulus, Product => B);
 
      end loop;
 
      -- Output the Result:
      Result := R;
 
   end FZ_Mod_Exp;
   pragma Inline_Always(FZ_Mod_Exp);

The ultimate aim of FFA is not merely correctness, but obvious correctness. In that vein, any reasonably inexpensive (from complexity and speed POVs) “cushion” which increases rodent resistance — i.e. cuts down the phase space of possible abuses — is of interest.

As always, I would like to thank my eagle-eyed readers for their sweat. And anyone who observes a similar thing, I encourage to write in, and receive credit in the next Chapter.


In this Chapter, we will be taking a break from the intricacies of Modular Exponentiation. And in fact from FFA per se. (We will return to the subject in Chapter 9.) Instead, we will be introducing an important knob previously missing from FFACalc: random number generation :

FFA_RNG.ads:

with Ada.Sequential_IO;
 
with Words;   use Words;
with FZ_Type; use FZ_Type;
 
 
package FFA_RNG is
 
   Default_RNG_Path : constant String := "/dev/random";
 
   -- For reading from RNGs:
   package Word_IO is new Ada.Sequential_IO(Element_Type => Word);
 
   -- Represents an RNG Device:
   type RNG_Device is record
      F : Word_IO.File_Type;
   end record;
 
   -- Prepare an RNG for use; at given path, or will use default
   procedure Init_RNG(RNG           : out RNG_Device;
                      RNG_Unix_Path : in  String := Default_RNG_Path);
 
   -- Fill a FZ from RNG
   procedure FZ_Random(RNG : in RNG_Device;
                       N   : out FZ);
 
end FFA_RNG;

Do you have an auditable hardware True Random Number Generator? If so, you will be able to use it with FFACalc. (Note: if your RNG is on, e.g., a serial port, the port must be already initialized prior to using FFACalc.)

Alternatively, it is possible to specify an ordinary file as an “RNG”, for deterministic testing.

The mechanism itself is not complicated:

FFA_RNG.adb:

with OS;       use OS;
 
with FZ_Type;  use FZ_Type;
 
 
package body FFA_RNG is
 
   -- Prepare an RNG for use; at given path, or will use default
   procedure Init_RNG(RNG           : out RNG_Device;
                      RNG_Unix_Path : in  String := Default_RNG_Path) is
   begin
      begin
         -- Open the RNG at the offered path:
         Word_IO.Open(File => RNG.F,
                      Mode => Word_IO.In_File,
                      Name => RNG_Unix_Path);
      exception
         when others =>
            Eggog("Could not open RNG at : " & RNG_Unix_Path & "!");
      end;
   end Init_RNG;
 
 
   -- Fill a FZ from RNG
   procedure FZ_Random(RNG : in RNG_Device;
                       N   : out FZ) is
   begin
      begin
         -- Fill the destination FZ from this RNG:
         for i in N'Range loop
            Word_IO.Read(RNG.F, N(i));
         end loop;
      exception
         when others =>
            Eggog("Could not read from RNG!");
      end;
   end FZ_Random;
 
end FFA_RNG;

Observe that we make no attempt to “massage” the RNG device’s output in any way whatsoever. If your RNG is correctly designed, it is unnecessary; if incorrectlyfutile.

The RNG initialization will look like this:

ffa_calc.adb:

-- For RNG:
with FFA_RNG;  use FFA_RNG;
 
 
procedure FFA_Calc is
 
   Width  : Positive;   -- Desired FFA Width
   Height : Positive;   -- Desired Height of Stack
   RNG    : RNG_Device; -- The active RNG device.
 
begin
   if Arg_Count < 3 or Arg_Count > 4 then
      Eggog("Usage: ./ffa_calc WIDTH HEIGHT [/dev/rng]");
   end if;
 
   declare
      Arg1 : CmdLineArg;
      Arg2 : CmdLineArg;
   begin
      -- Get commandline args:
      Get_Argument(1, Arg1); -- First arg
      Get_Argument(2, Arg2); -- Second arg
 
      if Arg_Count = 4 then
         -- RNG was specified:
         declare
            Arg3 : CmdLineArg;
         begin
            Get_Argument(3, Arg3); -- Third arg (optional)
 
            -- Ada.Sequential_IO chokes on paths with trailing whitespace!
            -- So we have to give it a trimmed path. But we can't use
            -- Ada.Strings.Fixed.Trim, because it suffers from
            -- SecondaryStackism-syphilis. Instead we are stuck doing this:
            Init_RNG(RNG, Arg3(Arg3'First .. Len_Arg(3)));
         end;
      else
         -- RNG was NOT specified:
         Init_RNG(RNG); -- Use the machine default then
      end if;
 
      -- .......snipped.......

To satisfy the Unix insanity of Ada.Sequential_IO, it was necessary to make the Len_Arg function accessible.

FFACalc is invoked quite like before; but now it can take an optional third argument, consisting of the RNG device path. If this optional argument is not given, the default RNG (presently /dev/random. And pointedly not /dev/urandom, which only even exists in Linux because rats gotta rat around.)

And now for the new op itself:

ffa_calc.adb:

               -- Push a FZ of RNGolade onto the stack
            when '?' =>
               Push;
               FZ_Clear(Stack(SP));
               FZ_Random(RNG, Stack(SP));

Now let’s try it:

echo "?#" | ./bin/ffa_calc 4096 32

And one possible result:

A331FBE217BC77C7C6AD90C1BABB18C9C95B5A420650AE30876102ECF3FB47667BC5C3
38857C058ED15554724F71321D950A41CB95E86B7894B73BC61BDBAE8B9E0F24EF400D
B71742B96A2D46F64D24C57304423D13F5C3EBE7DC62D1A72A33B424F8C3920905BCD6
D1AAB6868D2350D82F73A4583C695B9D46D1D4CFCC46E8487AFBE55DAA53865084D261
B32880A337AFD5A567E1CBA475B3215F18A625E0343031F153F46070F15A159DCBE628
4FC6E624DE42AE31CE086502419777DDC9133BC07D21930D3B87DE124F8E5B282629C3
95DC240033C8E0EF1790F95FBC4C21E8346ECF335B9F4D0DC9A26D5B05218A03C742DA
2D03667A16FB01BBCC2B4CF29169D0CE02E750265BBA4F568C5EE35103440A54B9D6E6
098490DF7E587BD220D01EA396524DAC92812E417BD79248AD3D4480E3AD9CA76DF2B2
405D79F689830C4101D52AEC26311324D529661F286E90DCBB973B371114102971BFF6
FA2EA2460D6733D2C893AFAC0A81FFA0B39C3F67DE04CF9E56AE82B9C659BC8A8470DE
EFE5908C16CC1168F1F0AD00DBF6AEFD08B402B9E742817475DDFF273E99641360B62A
3CCB5CC9343BC936CCECAF4FA926B4BF69D96D286C8308448A17B0FF4342AECD0F84CC
5CD43D7DF5D8A343FABEE3CDD4126B3E7DB94C35ECD111824CA288051C269CBEDBAB2F
368627DDD2D4E8CAFD1E3E00001F88FF4E9E5E5DD94C

Homework:

1. Write an automatic generator of arithmetical statements, e.g. multiplications, which are valid programs in your favourite scripting language, and will print “True” supposing that FFACalc had performed the computation correctly, and “False” otherwise.

2. Observe what you get when invoking the RNG when a file is used as the entropy source. Do the contents of the resulting FZ depend on the machine “endianness” ? What happens if there are insufficiently many bytes in the file to supply the demand of your FFACalc tape?


~To be continued!~

“Rodenticide” with the Seiko DPU414 Tape Printer.

The Seiko DPU414 is a small (110mm paper tape, ~90mm printable width) thermal printer, similar to the type used with cash registers everywhere, but slightly wider. It is quite unremarkable apart from the fact that no thermal printer of similar or greater width appears to be widely available; and also for the price (typically around 20-30 USD, new-surplus — these were apparently made in huge numbers; if a vendor is asking for substantially more, go elsewhere.)


Seiko 414

The OEM crate contains a roll of tape, but — oddly enough — no power supply. Evidently these were expected to be used inside industrial equipment which would supply the necessary 6 volts. (Seiko claims “6.5-7.0V, 2A” but a 6V dongle worked. Keep in mind that the polarity is reversed from the typical: the positive terminal is the outer surface of the cylinder.)

There are a number of uses for a small tape printer, but I will mention two rodenticidal ones in particular:

  • Non-editable and non-erasable system logs.

    This is self-explanatory. The enemy cannot edit or delete the log of a machine which sends its log in real time to a printer. (Granted, he can rob your house, or burn it down, but this falls outside of the scope of “computer” security, and is a problem all valuable physical objects, including your own skin, have in common.)

    Observe that the enemy does not need to know that a log printer is in use. Let him wonder. But naturally you will want to keep the output brief enough to read and compare with the contents of electronic logs, with some regularity.

  • One-Time Pads.

    AKA Vernam cipher. This should need no explanation; OTP is part of an ordinarily educated computer user’s essential toolkit.

    You can plug it straight into your TRNG, if you know how to configure the latter to output a serial stream at 19200 (or below) baud. The Seiko can be configured to print hex dumps natively. (RTFM)

    Observe that you can print duplicate “pads” without ever storing a pad on a computer for any length of time, by simply connecting two or more tape printers using a fan-out cable (AFAIK no such thing is sold anywhere, but you can trivially make such a cable yourself in a couple of minutes) to your RNG.

The Seiko is particularly handy in that it does not use USB (works fine, however, with a USB-to-RS232 dongle, if you must) and does not require drivers or specific system support of any kind whatsoever : once the baud rate is configured, it will eat standard RS-232 without complaint until it runs out of tape. (The printer also comes with a “nostalgic” Centronics “parallel port” connector, but I have not tried to use this.)

This printer can be configured to emit 80 columns of text (Alternatively, 40, with larger characters.) AFAIK it is the only thermal printer on the market which can print 80 columns. If anyone knows otherwise, I would like to hear about it.

Now, for the headaches. The Seiko 414 comes with factory defaults in EEPROM that make it virtually unusable with recent iron: it wants input from the Centronics port. And to reconfigure it, you must use “software DIP switches” — Seiko’s way of saving a dime in part cost by making you waste a ~metre of tape:


Seiko 414 config routine


Seiko 414 config routine close-up

Fortunately, this only has to be done once.

If one believes the manual, the Seiko 414 also knows how to print graphics (monochrome, 640 pixels wide, times… well, “infinity”-high…) and can therefore print “QR” codes (for, e.g., Bitcoinism) and similar applications. But I have not tried this.

And this is where I note that I have no particular love for Seiko, and will ask any among my readers who might know of a similar-but-cheaper printer; or of one that can print on a wider thermal tape; or even of an entirely different but somehow superior tool for the same job — to please write in.

“Finite Field Arithmetic.” Chapter 7: “Turbo Egyptians.”

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch7_turbo_egyptians.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


No one submitted a solution to the “trivial optimizations” problem posed in Chapter 5, but reader Diana Coman did show symptoms of knowing the magic ingredient which happens to be the subject of this Chapter.

The attentive reader may already have begun to suspect that the “Egyptian” multiplication algorithm (FZ_Mul_Egyptian from Ch. 5) is not the final word on the subject of integer multiplication in FFA. But before we can explore subquadratic multiplication (and — much later — clever methods for speeding up modular reduction) it is necessary to set up the arena for a “fair fight”, by making reasonably well-optimized variants of both of the quadratic-runtime “Egyptian” algos, FZ_Mul_Egyptian and FZ_Mod.

Previously, the primary design objectives of the FFA algorithms were correctness and constant-spacetime operation; followed, secondarily, by simplicity (in the fits-in-head sense.) In this chapter, we add another objective: speed.

At no point will correctness or constant-spacetime (branch-free and offsetting-by-secrets-free) operation be sacrificed under any pretext whatsoever. However, in order to obtain, e.g., an RSAtron that is practically usable on commonplace machines, it will be necessary to sacrifice a certain amount of mechanical simplicity. This Chapter, along with the bulk of the remaining material, will be devoted to this painful — but not uninteresting! — task.

Let’s begin with a new, “turbo” FZ_Mul_Egyptian:

FZ_Mul.adb:

   -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
   procedure FZ_Mul_Egyptian(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ) is
 
      L : constant Indices := X'Length;
 
      -- Register holding running product
      XY : FZ(1 .. X'Length + Y'Length);
 
      -- X-Slide
      XS : FZ(1 .. X'Length + Y'Length);
 
   begin
      -- Product register begins empty
      FZ_Clear(XY);
 
      -- X-Slide initially equals X:
      XS(1            .. X'Length) := X;
      XS(X'Length + 1 .. XS'Last)  := (others => 0);
 
      -- For each word of Y:
      for i in Y'Range loop
 
         declare
            -- Current word of Y
            W : Word  := Y(i);
 
            -- Current cut of XY and XS. Stay ahead by a word to handle carry.
            Cut : constant Indices := L + i;
 
            XYc : FZ renames XY(1 .. Cut);
            XSc : FZ renames XS(1 .. Cut);
 
         begin
            for b in 1 .. Bitness loop
 
               -- If current Y bit is 1, X-Slide Cut is added into XY Cut
               FZ_Add_Gated(X    => XYc, Y => XSc, Sum => XYc,
                            Gate => W and 1);
 
               -- Crank the next bit of Y into the bottom position of W
               W := Shift_Right(W, 1);
 
               -- X-Slide := X-Slide * 2
               FZ_ShiftLeft(XSc, XSc, 1);
 
            end loop;
         end;
 
      end loop;
 
      -- Write out the Product's lower and upper FZs:
      XY_Lo := XY(1                .. XY_Lo'Length);
      XY_Hi := XY(XY_Lo'Length + 1 .. XY'Last);
 
   end FZ_Mul_Egyptian;
   pragma Inline_Always(FZ_Mul_Egyptian);

Observe that the Y-Slide from Chapter 5’s multiplier is gone. Instead, we now walk through the bits of the multiplicand Y without having to shift the entire thing FZ_Bitness(Y) times: each word of Y is loaded into W, starting with the first; and the “egyptology” is then performed Bitness times, once for each bit of W.

The other optimization is the introduction of the cut concept. Observe that an addition of two FZ integers of identical bitness, can produce a result with an intrinsic bitness that is larger than that of the greater parent’s by a maximum of one bit. The consequence of this is that the Chapter 5 FZ_Mul_Egyptian wastes roughly half of its CPU time shifting and adding words that never, at the particular times they are touched, depart from zero.

Thus, the first iteration of the loop is carried out on a cut of length X’Length + 1; the second, X’Length + 2; and forth; the last iteration is the only one which is performed on the entire XY product-accumulator and the entire XS X-Slide. We “run ahead” of the segment of XY which has been touched, by one word, so as to have a place to which to ripple the carry.

At the expense of a certain amount of “obviousness”, we win a 2x gain in multiplication speed.


And now we will apply exactly the same optimizations to the modulus routine:

FZ_Divis.adb:

   -- Modulus. Permits the asymmetric Dividend and Divisor in FZ_Mod_Exp.
   procedure FZ_Mod(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Remainder : out FZ) is
 
      -- Length of Divisor and Remainder; < = Dividend'Length
      L : constant Indices := Divisor'Length;
 
      -- Remainder register, starts as zero
      R : FZ(1 .. L) := (others => 0);
 
      -- Indices into the words of Dividend
      subtype Dividend_Index is Word_Index range Dividend'Range;
 
      -- Permissible 'cuts' for the Slice operation
      subtype Divisor_Cuts   is Word_Index range 2 .. Divisor'Length;
 
      -- Performs Restoring Division on a given segment of Dividend:Divisor
      procedure Slice(Index : Dividend_Index;
                      Cut   : Divisor_Cuts) is
      begin
 
         declare
 
            -- Borrow, from comparator
            C   : WBool;
 
            -- Left-Shift Overflow
            LsO : WBool;
 
            -- Current cut of Remainder register
            Rs  : FZ renames R(1 .. Cut);
 
            -- Current cut of Divisor
            Ds  : FZ renames Divisor(1 .. Cut);
 
            -- Current word of Dividend, starting from the highest
            W   : Word  := Dividend(Dividend'Last + 1 - Index);
 
         begin
 
            -- For each bit in the current Dividend word:
            for b in 1 .. Bitness loop
 
               -- Send top bit of current Dividend word to the bottom of W
               W := Rotate_Left(W, 1);
 
               -- Advance Rs, shifting in the current Dividend bit
               FZ_ShiftLeft_O_I(N => Rs, ShiftedN => Rs, Count => 1,
                                OF_In => W and 1,
                                Overflow => LsO);
 
               -- Subtract Divisor-Cut from R-Cut; Underflow goes into C
               FZ_Sub(X => Rs, Y => Ds, Difference => Rs, Underflow => C);
 
               -- If C=1, subtraction underflowed, and we must undo it:
               FZ_Add_Gated(X => Rs, Y => Ds, Sum => Rs,
                            Gate => C and W_Not(LsO));
 
            end loop;
 
         end;
 
      end Slice;
 
   begin
 
      -- Process bottom half of dividend:
      for i in 1 .. L - 1 loop
 
         Slice(i, i + 1); -- stay ahead by a word to handle carry
 
      end loop;
 
      -- Process top half of dividend
      for i in L .. Dividend'Length loop
 
         Slice(i, L);
 
      end loop;
 
      -- Output the Remainder.
      Remainder := R;
 
   end FZ_Mod;
   pragma Inline_Always(FZ_Mod);

In addition to the two optimizations analogous to those we had applied to FZ_Mul_Egyptian, we also get the ability to apply FZ_Mod to a Dividend which exceeds the bitness of the Divisor. This will allow us to abolish the zero-padding of Modulus in Chapter 6’s FZ_Mod_Mul.

Correspondingly, we carefully alter the preconditions of FZ_Mod:

FZ_Divis.ads:

   -- Modulus. Permits the asymmetric Dividend and Divisor in FZ_Mod_Exp.
   procedure FZ_Mod(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Remainder : out FZ);
   pragma Precondition(Dividend'Length >= Divisor'Length and
                         Divisor'Length = Remainder'Length);

The only place in FFA where the asymmetric invocation of FZ_Mod will be permitted, is FZ_Mod_Mul, the modular multiplier procedure, which will now look like this:

FZ_ModEx.adb:

   -- Modular Multiply: Product := X*Y mod Modulus
   procedure FZ_Mod_Mul(X        : in  FZ;
                        Y        : in  FZ;
                        Modulus  : in  FZ;
                        Product  : out FZ) is
 
      -- The wordness of all three operands is equal:
      L     : constant Indices := X'Length;
 
      -- Double-width register for multiplication and modulus operations
      XY    : FZ(1 .. L * 2);
 
      -- To refer to the lower and upper halves of the working register:
      XY_Lo : FZ renames XY(1     .. L);
      XY_Hi : FZ renames XY(L + 1 .. XY'Last);
 
   begin
 
      -- XY_Lo:XY_Hi := X * Y
      FZ_Mul_Egyptian(X, Y, XY_Lo, XY_Hi);
 
      -- Product := XY mod M
      FZ_Mod(XY, Modulus, Product);
 
   end FZ_Mod_Mul;
   pragma Inline_Always(FZ_Mod_Mul);

And now, on the same machine as in Ch.6 , timed and plotted logarithmically:


Cost of Ch.7 'X' Operation, vs FFA Bitness.

Or, for those who prefer the raw numbers,

FFA Bitness Ch.6 ‘X’ (sec) Ch.7 ‘X’ (sec)
256 0.072 0.040
512 0.505 0.240
1024 3.685 1.672
2048 27.975 12.024
4096 217.966 90.439
8192 1720.127 699.979

And finally, let’s turn this into a FFACalc tape, and verify the signature of the Chapter 7 material, using itself, just like we did in the previous chapter:

( the seal of ffa_ch7_turbo_egyptians.vpatch : )
.
88A0984C5438EFD811BA8505362EEA572B4057C52662DA8854DF66613C40332EB8F4E3
80932DBF9106F3536643BD2FF69DA90EA0402B4FDC21A9245C3EBCB10C86C1847C9CC9
9C2ECE4245735013D0E59EE905479445B625D76EECC1BDEEBF489DC512715324995D13
EAB27D286EFA86BF8D69C3D93CB65E2F4BA9B805AA392B29D9F6635E9C7D81C0CC4EAA
A2DA0F618D83F7DAB4854F24B3A900BFDF35E58E87745A6A8D11868C860334DD80A4A4
D0DCAC4E5EC34E9C16BBD174659008145F1E09556009557277A7DFE74FAA3018995746
8CB25E0B10B5AE8D114E4E89E4B940B1A21A5BB7EFDD75AB59A1D82AC08DBCC9A7B10F
5B708B2BC2453AB9BC6C80
 
( my public rsa exponent : )
.
10001
 
( my public rsa modulus : )
.
CDD49A674BAF76D3B73E25BC6DF66EF3ABEDDCA461D3CCB6416793E3437C7806562694
73C2212D5FD5EED17AA067FEC001D8E76EC901EDEDF960304F891BD3CAD7F9A335D1A2
EC37EABEFF3FBE6D3C726DC68E599EBFE5456EF19813398CD7D548D746A30AA47D4293
968BFBAFCBF65A90DFFC87816FEE2A01E1DC699F4DDABB84965514C0D909D54FDA7062
A2037B50B771C153D5429BA4BA335EAB840F9551E9CD9DF8BB4A6DC3ED1318FF3969F7
B99D9FB90CAB968813F8AD4F9A069C9639A74D70A659C69C29692567CE863B88E191CC
9535B91B417D0AF14BE09C78B53AF9C5F494BCF2C60349FFA93C81E817AC682F0055A6
07BB56D6A281C1A04CEFE1 
 
( now modularly-exponentiate it : )
X
 
( ... and print the output .)
#

… and play the tape! :

$ time  cat ch7_rsa_tape.txt | ./bin/ffa_calc 2048 32

… and unsurprisingly:

0001FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051
300D060960864801650304020305000440FCFB42B5342D226261DF36158BB4EA5482F4
B45B8D4259F3873D8B77CD7572D55FBADCBCC1267BECE169F330154F7A9156C60A07FA
8C902BEE43FB27FC33B248
 
real    0m12.018s 
user    0m12.004s 
sys     0m0.002s

Guess what? The correct answer. (Use the patched GPG from Ch. 6, and compare.)


Homework:

1. Prove that the new FZ_Mod produces, for all valid inputs, the equivalent answer to the one in the previous chapter.

2. Formulate a hypothesis regarding why the observed speedup of ‘X’ was greater than two-fold.


~To be continued!~

“Finite Field Arithmetic.” Chapter 6: “Geological” RSA.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch6_simplest_rsa.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


The “mail bag” from Chapter 5… is empty. Congratulations, however, to the successful solvers of the Chapter 4 puzzle!


Now, errata! In the FFACalc extensions of Chapter 5, we have:

ffa_calc.adb:

               -- Multiply, give bottom and top halves
            when '*' =>
               Want(2);
               MustNotZero(Stack(SP));
               -- Ch5: slow and simple 'Egyptological' method:
               FZ_Mul_Egyptian(X     => Stack(SP - 1),
                               Y     => Stack(SP),
                               XY_Lo => Stack(SP - 1),
                               XY_Hi => Stack(SP));

After removal of the obvious bug, this becomes:

ffa_calc.adb:

               -- Multiply, give bottom and top halves
            when '*' =>
               Want(2);
               -- Ch5: slow and simple 'Egyptological' method:
               FZ_Mul_Egyptian(X     => Stack(SP - 1),
                               Y     => Stack(SP),
                               XY_Lo => Stack(SP - 1),
                               XY_Hi => Stack(SP));

In this Chapter, we will add a modular multiplication routine to FFA. This in turn will immediately let us write a modular exponentiation routine, which is the fundamental operation of the RSA system of asymmetric cryptography.

Concretely:

FZ_ModEx.ads:

with FZ_Type; use FZ_Type;
 
 
package FZ_ModEx is
 
   pragma Pure;
 
   -- Modular Multiply: Product := X*Y mod Modulus
   procedure FZ_Mod_Mul(X        : in  FZ;
                        Y        : in  FZ;
                        Modulus  : in  FZ;
                        Product  : out FZ);
   pragma Precondition(X'Length = Y'Length and
                         Modulus'Length = X'Length and
                         Product'Length = Modulus'Length);
 
   -- Modular Exponent: Result := Base^Exponent mod Modulus
   procedure FZ_Mod_Exp(Base     : in  FZ;
                        Exponent : in  FZ;
                        Modulus  : in  FZ;
                        Result   : out FZ);
   pragma Precondition(Base'Length = Exponent'Length and
                         Base'Length = Result'Length and
                         Base'Length = Modulus'Length);
 
end FZ_ModEx;

Because of the extremely simplistic and wholly-unoptimized way in which we implemented Multiplication and Division in Chapter 5, this will be a “geological-time” RSAtron, rather than a battlefield-ready one (though I can conceive of a few speed-insensitive applications where even an algorithm such as this one could suffice.) And at the end of this Chapter, we will answer the question of “exactly how geological” ?

The bulk of the material in the remaining Chapters in this series will concern itself with demonstrably-correct and safe (i.e. constant-spacetimeness-preserving) methods for speeding up Multiplication and Division — and consequently Modular Exponentiation — so as to transform this “geological” RSA into a generally-applicable one.

And so, without further delay, let’s define the new modular multiplication and modular exponentiation operators in the FFACalc system:

ffa_calc.adb:

               -- Modular Multiplication
            when 'M' =>
               Want(3);
               MustNotZero(Stack(SP));
               FZ_Mod_Mul(X       => Stack(SP - 2),
                          Y       => Stack(SP - 1),
                          Modulus => Stack(SP),
                          Product => Stack(SP - 2));
               Drop;
               Drop;
 
               -- Modular Exponentiation
            when 'X' =>
               Want(3);
               MustNotZero(Stack(SP));
               FZ_Mod_Exp(Base     => Stack(SP - 2),
                          Exponent => Stack(SP - 1),
                          Modulus  => Stack(SP),
                          Result   => Stack(SP - 2));
               Drop;
               Drop;

And now let’s implement the new unit, FZ_ModEx:

FZ_ModEx.adb:

with FZ_Basic; use FZ_Basic;
with FZ_Pred;  use FZ_Pred;
with FZ_Shift; use FZ_Shift;
with FZ_Mul;   use FZ_Mul;
with FZ_Divis; use FZ_Divis;
 
package body FZ_ModEx is

We will begin with the modular multiplication algorithm. It is deliberately made in the most straightforward way I could think of, making use only of “schoolboy” methods:

   -- Modular Multiply: Product := X*Y mod Modulus
   procedure FZ_Mod_Mul(X        : in  FZ;
                        Y        : in  FZ;
                        Modulus  : in  FZ;
                        Product  : out FZ) is
 
      -- The wordness of all three operands is equal:
      L     : constant Indices := X'Length;
 
      -- Double-width register for multiplication and modulus operations
      XY    : FZ(1 .. L * 2);
 
      -- To refer to the lower and upper halves of the working register:
      XY_Lo : FZ renames XY(1     .. L);
      XY_Hi : FZ renames XY(L + 1 .. XY'Last);
 
      -- A zero-padded double-wide copy of Modulus, to satisfy Ch.5's FZ_Mod
      M     : FZ(XY'Range);
 
   begin
      -- Place the Modulus in a double-width M, as FZ_Mod currently demands
      M(Modulus'Range)   := Modulus;
      M(L + 1 .. M'Last) := (others => 0);
 
      -- XY_Lo:XY_Hi := X * Y
      FZ_Mul_Egyptian(X, Y, XY_Lo, XY_Hi);
 
      -- XY := XY mod M
      FZ_Mod(XY, M, XY);
 
      -- The bottom half of XY is our modular product; top half is always 0
      Product := XY_Lo;
   end FZ_Mod_Mul;
   pragma Inline_Always(FZ_Mod_Mul);

This FZ_Mod_Mul multiplies X and Y using the barbaric FZ_Mul_Egyptian algorithm from Chapter 5; then subsequently divides XY — their product — by Modulus, and returns the remainder — using the FZ_Mod from Chapter 5.

Some of the wastefulness of the above FZ_Mod_Mul will immediately stand out to the reader. In subsequent Chapters, we will methodically strip it away. For now, we are concerned strictly with obtaining a minimal and correct baseline against which to work.

Now we can define a modular exponentiation algorithm:

   -- Modular Exponent: Result := Base^Exponent mod Modulus
   procedure FZ_Mod_Exp(Base     : in  FZ;
                        Exponent : in  FZ;
                        Modulus  : in  FZ;
                        Result   : out FZ) is
 
      -- Working register for the squaring
      B : FZ(Base'Range)     := Base;
 
      -- Register for cycling through the bits of E
      E : FZ(Exponent'Range) := Exponent;
 
      -- Register for the Mux operation
      T : FZ(Result'Range);
 
   begin
      -- Result := 1
      WBool_To_FZ(1, Result);
 
      -- For each bit of Result width:
      for i in 1 .. FZ_Bitness(Result) loop
 
         -- T := Result * B mod Modulus
         FZ_Mod_Mul(X => Result, Y => B, Modulus => Modulus,
                    Product => T);
 
         -- Sel is the current low bit of E;
         --    When Sel=0 -> Result := Result;
         --    When Sel=1 -> Result := T
         FZ_Mux(X => Result, Y => T, Result => Result,
                Sel => FZ_OddP(E));
 
         -- Advance to the next bit of E
         FZ_ShiftRight(E, E, 1);
 
         -- B := B*B mod Modulus
         FZ_Mod_Mul(X => B, Y => B, Modulus => Modulus,
                    Product => B);
 
      end loop;
 
   end FZ_Mod_Exp;
   pragma Inline_Always(FZ_Mod_Exp);

This is the traditional right-to-left binary method, as seen in e.g. Vol. 2 of D. E. Knuth’s AOCP. The only substantial departure from the schoolbook algorithm is the elimination of branching on secret bits. The multiplication step is always carried out, and FZ_Mux is used to either keep, or throw away the answer.

Specifically: Result is initially equal to 1. B initially equals the Base; E — the Exponent. Now we iterate for a fixed number of cycles, equal to the bitness of Result (which is also equal to the bitness of all three operands.)

In each iteration:

… We multiply Result with the current B, modulo the Modulus, and assign the result to temporary register T.

… And now, if the current bottom bit (determined via FZ_OddP) of E is a zero, we end up discarding (in constant-time) T — the output of the modular multiplication in the previous step, via FZ_Mux (See Chapter 1). But if this bottom bit is a one, then T is not discarded, and becomes the new value of Result.

… Now E, where we had placed the Exponent at the beginning, is cranked right-wise by one bit (i.e. divided by 2). The next highest bit will now be found at the bottom position, for use in the next iteration.

… Finally, the register B is squared modulo the Modulus.

After the final iteration, Result will contain Base raised to the Exponent power modulo the Modulus.

And that’s all for the new unit:

 
end FZ_ModEx;

Now let’s remember that, as in every other FFA operation, the CPU time consumed by our new modular multiplication and modular exponentiation algorithms is independent of the Hamming weights of the operands. Concretely, e.g., 1 to the 1st power mod 1:

time echo -E ".~.~.~.1.1.1X" | ./bin/ffa_calc 1024 32

… will take — and for any given bitness — the same, within the margin of error of your Unix time utility, amount of time to compute, as maxint to the maxint-th power modulo maxint:

time echo -E ".1.1.1.~.~.~X" | ./bin/ffa_calc 1024 32

Verify this to your satisfaction.

And now, for some empirical data. On a certain 3 GHz Opteron, timed and plotted logarithmically:


Cost of Ch.6 'X' Operation, vs FFA Bitness.
(Click here to see the raw numbers)

Perhaps not quite as “geological” as the reader expected! We can actually fire this primitive RSAtron in anger! Let’s verify the RSA seal of ffa_ch6_simplest_rsa.vpatch, the Chapter 6 code itself, using itself:

First, take the seal:

$ pgpdump -i ffa_ch6_simplest_rsa.vpatch.asciilifeform.sig

… and dump it to a human-readable form:

Old: Signature Packet(tag 2)(284 bytes)
	Ver 4 - new
	Sig type - Signature of a binary document(0x00).
	Pub alg - RSA Encrypt or Sign(pub 1)
	Hash alg - SHA512(hash 10)
	Hashed Sub: signature creation time(sub 2)(4 bytes)
		Time - Sat Jan 6 11:47:07 EST 2018
	Sub: issuer key ID(sub 16)(8 bytes)
		Key ID - 0xB98228A001ABFFC7
	Hash left 2 bytes - 62 55
	RSA m^d mod n(2047 bits) - 5b 6a 8a 0a cf 4f 4d b3 f8 2e ac 2d
20 25 5e 4d f3 e4 b7 c7 99 60 32 10 76 6f 26 ef 87 c8 98 0e 73 75 79
ec 08 e6 50 5a 51 d1 96 54 c2 6d 80 6b af 1b 62 f9 c0 32 e0 b1 3d 02
af 99 f7 31 3b fc fd 68 da 46 83 6e ca 52 9d 73 60 94 85 50 f9 82 c6
47 6c 05 4a 97 fd 01 63 5a b4 4b fb db e2 a9 0b e0 6f 79 84 ac 85 34
c3 86 13 74 7f 34 0c 18 17 6e 6d 5f 0c 10 24 6a 2f ce 3a 66 8e ac b6
16 5c 20 52 49 7c a2 ee 48 3f 4f d8 d0 6a 99 11 bd 97 e9 b6 72 05 21
d8 72 bd 08 ff 8d a1 1a 1b 8d b1 47 f2 52 e4 e6 9a e6 20 1d 3b 37 4b
17 1d f4 45 ef 2b f5 09 d4 68 fd 57 ce b5 84 03 49 b1 4c 6e 2a aa 19
4d 95 31 d2 38 b8 5b 8f 0d d3 52 d1 e5 96 71 53 9b 42 98 49 e5 d9 65
e4 38 bf 9e ff c3 38 df 9a ad f3 04 c4 13 0d 5a 05 e0 06 ed 85 5f 37
a0 62 42 28 09 7e f9 2f 6e 78 ca e0 cb 97
		-> PKCS-1

Now, obtain my public modulus:

$ pgpdump -i ~/.wot/asciilifeform.asc
Old: Public Key Packet(tag 6)(269 bytes)
	Ver 4 - new
	Public key creation time - Thu Dec 20 12:49:24 EST 2012
	Pub alg - RSA Encrypt or Sign(pub 1)
	RSA n(2048 bits) - cd d4 9a 67 4b af 76 d3 b7 3e 25 bc 6d f6
6e f3 ab ed dc a4 61 d3 cc b6 41 67 93 e3 43 7c 78 06 56 26 94 73 c2
21 2d 5f d5 ee d1 7a a0 67 fe c0 01 d8 e7 6e c9 01 ed ed f9 60 30 4f
89 1b d3 ca d7 f9 a3 35 d1 a2 ec 37 ea be ff 3f be 6d 3c 72 6d c6 8e
59 9e bf e5 45 6e f1 98 13 39 8c d7 d5 48 d7 46 a3 0a a4 7d 42 93 96
8b fb af cb f6 5a 90 df fc 87 81 6f ee 2a 01 e1 dc 69 9f 4d da bb 84
96 55 14 c0 d9 09 d5 4f da 70 62 a2 03 7b 50 b7 71 c1 53 d5 42 9b a4
ba 33 5e ab 84 0f 95 51 e9 cd 9d f8 bb 4a 6d c3 ed 13 18 ff 39 69 f7
b9 9d 9f b9 0c ab 96 88 13 f8 ad 4f 9a 06 9c 96 39 a7 4d 70 a6 59 c6
9c 29 69 25 67 ce 86 3b 88 e1 91 cc 95 35 b9 1b 41 7d 0a f1 4b e0 9c
78 b5 3a f9 c5 f4 94 bc f2 c6 03 49 ff a9 3c 81 e8 17 ac 68 2f 00 55
a6 07 bb 56 d6 a2 81 c1 a0 4c ef e1 
	RSA e(17 bits) - 01 00 01 
 
...~~~crapola snipped~~~...

But let us now “thank” the malignant idiocy of W. Koch (see also)… Because we will have to patch GPG in order to learn the RFC4880 turd that it uses as “the signature” to compare against when verifying a seal:

diff -uNr a/gnupg-1.4.10/cipher/rsa.c b/gnupg-1.4.10/cipher/rsa.c
--- a/gnupg-1.4.10/cipher/rsa.c	2008-12-11 11:40:06.000000000 -0500
+++ b/gnupg-1.4.10/cipher/rsa.c	2018-01-06 15:15:55.000000000 -0500
@@ -444,6 +444,16 @@
     result = mpi_alloc ( mpi_nlimb_hint_from_nbits (160) );
     public( result, data[0], &pk );
     rc = mpi_cmp( result, hash )? G10ERR_BAD_SIGN:0;
+
+    /* Enable dumping of raw sig verification operands: */
+    if( DBG_CIPHER ) {
+        log_mpidump("  data  = ", data[0] );
+        log_mpidump("  pk.n  = ", pk.n );
+        log_mpidump("  pk.e  = ", pk.e );
+        log_mpidump("  hash  = ", hash );
+        log_mpidump("  result= ", result );
+    }
+
     mpi_free(result);
 
     return rc;

So we built it; and now let’s run the seal verification with this debuggism:

$ ~/gpg-patched/gnupg-1.4.10/g10/gpg --debug-all --verify ch6.vpatch.sig ch6.vpatch

And obtain:

(From this point on, lines were wrapped and reindented for clarity:)

...~~~crapola snipped~~~...
 
gpg:   
data  =
5B6A8A0ACF4F4DB3F82EAC2D20255E4DF3E4B7C799603210766F26EF87C8980E737579
EC08E6505A51D19654C26D806BAF1B62F9C032E0B13D02AF99F7313BFCFD68DA46836E
CA529D7360948550F982C6476C054A97FD01635AB44BFBDBE2A90BE06F7984AC8534C3
8613747F340C18176E6D5F0C10246A2FCE3A668EACB6165C2052497CA2EE483F4FD8D0
6A9911BD97E9B6720521D872BD08FF8DA11A1B8DB147F252E4E69AE6201D3B374B171D
F445EF2BF509D468FD57CEB5840349B14C6E2AAA194D9531D238B85B8F0DD352D1E596
71539B429849E5D965E438BF9EFFC338DF9AADF304C4130D5A05E006ED855F37A06242
28097EF92F6E78CAE0CB97
 
gpg:   pk.n  =
 
CDD49A674BAF76D3B73E25BC6DF66EF3ABEDDCA461D3CCB6416793E3437C7806562694
73C2212D5FD5EED17AA067FEC001D8E76EC901EDEDF960304F891BD3CAD7F9A335D1A2
EC37EABEFF3FBE6D3C726DC68E599EBFE5456EF19813398CD7D548D746A30AA47D4293
968BFBAFCBF65A90DFFC87816FEE2A01E1DC699F4DDABB84965514C0D909D54FDA7062
A2037B50B771C153D5429BA4BA335EAB840F9551E9CD9DF8BB4A6DC3ED1318FF3969F7
B99D9FB90CAB968813F8AD4F9A069C9639A74D70A659C69C29692567CE863B88E191CC
9535B91B417D0AF14BE09C78B53AF9C5F494BCF2C60349FFA93C81E817AC682F0055A6
07BB56D6A281C1A04CEFE1 
 
gpg:   pk.e  = 10001 
 
gpg:
hash  =
1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051300
D0609608648016503040203050004406255509399A3AF322C486C770C5F7F6E05E18FC
3E2219A03CA56C7501426A597187468B2F71B4A198C807171B73D0E7DBC3EEF6EA6AFF
693DE58E18FF84395BE 
 
gpg:
result=
1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051300
D0609608648016503040203050004406255509399A3AF322C486C770C5F7F6E05E18FC
3E2219A03CA56C7501426A597187468B2F71B4A198C807171B73D0E7DBC3EEF6EA6AFF
693DE58E18FF84395BE
 
...~~~crapola snipped~~~...

And finally, let’s turn this into a FFACalc tape:

( the seal of ffa_ch6_simplest_rsa.vpatch.asciilifeform.sig : )
.
5B6A8A0ACF4F4DB3F82EAC2D20255E4DF3E4B7C799603210766F26EF87C8980E737579
EC08E6505A51D19654C26D806BAF1B62F9C032E0B13D02AF99F7313BFCFD68DA46836E
CA529D7360948550F982C6476C054A97FD01635AB44BFBDBE2A90BE06F7984AC8534C3
8613747F340C18176E6D5F0C10246A2FCE3A668EACB6165C2052497CA2EE483F4FD8D0
6A9911BD97E9B6720521D872BD08FF8DA11A1B8DB147F252E4E69AE6201D3B374B171D
F445EF2BF509D468FD57CEB5840349B14C6E2AAA194D9531D238B85B8F0DD352D1E596
71539B429849E5D965E438BF9EFFC338DF9AADF304C4130D5A05E006ED855F37A06242
28097EF92F6E78CAE0CB97
 
( my public rsa exponent : )
.
10001
 
( my public rsa modulus : )
.
CDD49A674BAF76D3B73E25BC6DF66EF3ABEDDCA461D3CCB6416793E3437C7806562694
73C2212D5FD5EED17AA067FEC001D8E76EC901EDEDF960304F891BD3CAD7F9A335D1A2
EC37EABEFF3FBE6D3C726DC68E599EBFE5456EF19813398CD7D548D746A30AA47D4293
968BFBAFCBF65A90DFFC87816FEE2A01E1DC699F4DDABB84965514C0D909D54FDA7062
A2037B50B771C153D5429BA4BA335EAB840F9551E9CD9DF8BB4A6DC3ED1318FF3969F7
B99D9FB90CAB968813F8AD4F9A069C9639A74D70A659C69C29692567CE863B88E191CC
9535B91B417D0AF14BE09C78B53AF9C5F494BCF2C60349FFA93C81E817AC682F0055A6
07BB56D6A281C1A04CEFE1 
 
( now modularly-exponentiate it : )
X
 
( ... and print the output .)
#

… and play the tape! :

$ time  cat ch6_rsa_tape.txt | ./bin/ffa_calc 2048 32

… and unsurprisingly:

 
0001FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051
300D0609608648016503040203050004406255509399A3AF322C486C770C5F7F6E05E1
8FC3E2219A03CA56C7501426A597187468B2F71B4A198C807171B73D0E7DBC3EEF6EA6
AFF693DE58E18FF84395BE
 
 
real    0m27.956s 
user    0m27.927s 
sys     0m0.003s

Guess what? The correct answer. Took an entire half-minute, however, to complete on this (same as where the log plot was made) machine.

In the next Chapter, we will investigate some elementary improvements in efficiency!

~To be continued!~

“Finite Field Arithmetic.” Chapter 5: “Egyptological” Multiplication and Division.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.



Эй, сюда плывите, рыбки,
Сом Считает без ошибки!
Сом И делит,
Сом И множит,
Сом Сложить и вычесть Может!

Борис Заходер. СОМОМНЕНИЕ


You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch5_egypt.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


First, let’s go through the “mail bag” from Chapter 4:

Reader Apeloyee observed that FZ_Valid_Bitness_P can be made slightly shorter and clearer. I have taken the more conservative of his two suggestions.

The rewritten procedure now looks like this:

   -- Determine if a proposed FFA Bitness is valid.
   function FZ_Valid_Bitness_P(B : in Positive) return Boolean is
      Result   : Boolean := False;
      T        : Natural := B;
      PopCount : Natural := 0;
   begin
      -- Supposing we meet the minimal bitness:
      if B >= FZ_Minimal_Bitness then
         while T > 0 loop
            PopCount := PopCount + T mod 2;
            T := T / 2;
         end loop;
 
         -- Is B a power of 2?
         if PopCount = 1 then
            Result := True;
         end if;
      end if;
 
      return Result;
   end FZ_Valid_Bitness_P;

I would like to thank again the eagle-eyed Apeloyee for his attentive and skeptical reading.


And now for the solution to the Chapter 4 Puzzle:

ch4_official_solution.txt

Readers who have written solutions of their own to this puzzle, are invited to post them as comments to this post (please try to avoid spoiling it for those who have not yet reached Chapter 5.)


In Chapter 4, we wrote a small and very limited RPN calculator, FFACalc, capable only of addition, subtraction, basic I/O, and a few stack motion operations. In Chapter 5, we will give it the ability to multiply and divide.

The division and multiplication routines in FFA will extend across more than one chapter in this article series, on account of the wide spectrum of possible tradeoffs between simplicity and speed in the known approaches to both problems. We will begin at the extreme of simplicity at all cost. And, as the reader may already have guessed from the title of this Chapter, we will begin with division and multiplication algorithms that were known to the ancient Egyptians.

But first, a few small helper-function additions to LibFFA.

A convenient WBool (See Chapter 1) inverter:

w_pred.ads:

   -- Return WBool-complement of N.
   function W_Not(N : in WBool) return WBool;

w_pred.adb:

   -- Return WBool-complement of N.
   function W_Not(N : in WBool) return WBool is
   begin
      return 1 xor N;
   end W_Not;
   pragma Inline_Always(W_Not);

A convenient function for determining the bitness of an FZ integer:

fz_type.ads:

   -- A count of Bits, anywhere (cannot be 0):
   subtype Bit_Count is Positive;

fz_basic.ads:

   -- Determine the Bitness of N
   function FZ_Bitness(N : in FZ) return Bit_Count;

fz_basic.adb:

   -- Determine the Bitness of N
   function FZ_Bitness(N : in FZ) return Bit_Count is
   begin
      return N'Length * Words.Bitness;
   end FZ_Bitness;
   pragma Inline_Always(FZ_Bitness);

And finally, a mechanism for constant-time conditional addition:

fz_arith.ads:

   -- Gate = 1: Sum := X + Y; Overflow := Carry
   -- Gate = 0: Sum := X;     Overflow := 0
   procedure FZ_Add_Gated_O(X          : in  FZ;
                            Y          : in  FZ;
                            Gate       : in  WBool;
                            Sum        : out FZ;
                            Overflow   : out WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length);

fz_arith.adb:

   procedure FZ_Add_Gated_O(X          : in  FZ;
                            Y          : in  FZ;
                            Gate       : in  WBool;
                            Sum        : out FZ;
                            Overflow   : out WBool) is
      Carry : WBool := 0;
      Mask  : constant Word := 0 - Gate;
   begin
      for i in 0 .. Word_Index(X'Length - 1) loop
         declare
            A : constant Word := X(X'First + i);
            B : constant Word := Y(Y'First + i) and Mask;
            S : constant Word := A + B + Carry;
         begin
            Sum(Sum'First + i) := S;
            Carry  := W_Carry(A, B, S);
         end;
      end loop;
      Overflow := Carry;
   end FZ_Add_Gated_O;
   pragma Inline_Always(FZ_Add_Gated_O);

This is something like a hybrid between FZ_Add (See Chapter 1) and FZ_Mux. If Gate is equal to 1, Mask is set to “maxint” of our Word, i.e. 11111….1; consequently the quantity B is unmolested, and the routine behaves exactly like FZ_Add. However, if Gate is equal to 0, Mask is then also equal to 0 and B is forced to also equal 0 during each iteration of the loop; the resulting addition is a constant-time “no-op”.

Work out the behaviour of this algorithm on a sheet of paper, as you have done for the material in earlier chapters.


And at last, we are ready for “Egyptological” division:

fz_divis.ads:

with FZ_Type; use FZ_Type;
 
 
package FZ_Divis is
 
   pragma Pure;
 
   -- Dividend is divided by Divisor, producing Quotient and Remainder.
   -- WARNING: NO div0 test here! Caller must test.
   procedure FZ_IDiv(Dividend  : in  FZ;
                     Divisor   : in  FZ;
                     Quotient  : out FZ;
                     Remainder : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Quotient'Length = Remainder'Length and
                         Dividend'Length = Quotient'Length);
 
   -- Exactly same thing as IDiv, but keep only the Quotient
   procedure FZ_Div(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Quotient  : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Dividend'Length = Quotient'Length);
 
   -- Exactly same thing as IDiv, but keep only the Remainder
   procedure FZ_Mod(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Remainder : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Dividend'Length = Remainder'Length);
 
end FZ_Divis;

This algorithm is quite exactly analogous to the “long division” everyone learns in grade school. The “Egyptian” variant is also known as “restoring division” and is discussed as such in, e.g., Vol. 2 of D. E. Knuth’s “Art of Programming”. Observe that, just like every other FFA operation, this procedure does not branch on the operand bits. This is essential in any routine intended for use in cryptographic arithmetic.

The mechanics are quite simple: the Dividend is shifted into a register R “head-first”, and the Divisor is repeatedly subtracted from R. If the subtraction results in an underflow, this is taken to signify that the Divisor “did not go in”, and the subtraction is then undone, in constant-time (via the conditional-adder introduced earlier in this Chapter.) In the latter case, a 0 is revealed as the current bit of the Quotient — whose bits are found backwards, from last (highest) to first (lowest); in the opposite case, a 1 is written. The Remainder obtained from the division is found in R at the end of the procedure, which requires a number of iterations equal to the bitness of the Dividend:

fz_divis.adb:

with Words;    use Words;
with W_Pred;   use W_Pred;
with FZ_Basic; use FZ_Basic;
with FZ_Arith; use FZ_Arith;
with FZ_BitOp; use FZ_BitOp;
with FZ_Shift; use FZ_Shift;
 
 
package body FZ_Divis is
 
   -- Dividend is divided by Divisor, producing Quotient and Remainder.
   -- WARNING: NO div0 test here! Caller must test.
   procedure FZ_IDiv(Dividend  : in  FZ;
                     Divisor   : in  FZ;
                     Quotient  : out FZ;
                     Remainder : out FZ) is
 
      -- The working register
      QR : FZ(1 .. Dividend'Length + Divisor'Length);
 
      -- Bottom seg of Z will end up containing the Quotient; top - remainder
      Q  : FZ renames QR(1                   .. Dividend'Length); -- Quotient
      R  : FZ renames QR(Dividend'Length + 1 .. QR'Last);         -- Remainder
 
      C : WBool := 0; -- Borrow, from comparator
   begin
      Q := Dividend; -- Q begins with the Dividend
      FZ_Clear(R);   -- R begins empty
 
      -- For each bit of Dividend:
      for i in 1 .. FZ_Bitness(Dividend) loop
 
         -- Advance QR by 1 bit:
         FZ_ShiftLeft(QR, QR, 1);
 
         -- Subtract Divisor from R; Underflow goes into C
         FZ_Sub(X => R, Y => Divisor, Difference => R, Underflow => C);
 
         -- If C=1, subtraction underflowed, and then Divisor gets added back:
         FZ_Add_Gated(X => R, Y => Divisor, Gate => C, Sum => R);
 
         -- Current result-bit is equal to Not-C, i.e. 1 if Divisor 'went in'
         FZ_Or_W(Q, W_Not(C));
 
      end loop;
 
      Quotient  := Q; -- Output the Quotient.
      Remainder := R; -- Output the Remainder.
 
   end FZ_IDiv;
   pragma Inline_Always(FZ_IDiv);

The above is, to my knowledge, the simplest practical integer division algorithm (simple repeated-subtraction is not practical for large numbers and will not be discussed.) There are several substantial improvements that can be made to it, and we will visit them in later chapters. However, it is worth noting that Division is the most expensive of the familiar arithmetical operations, and — unlike multiplication — no order-of-magnitude speedup for it is known.

Now we will define remainderless division, and modulus operations. These can be optimized to an extent, but their simplest possible formulation is the trivial one below, written in terms of FZ_IDiv:

   -- Exactly same thing as IDiv, but keep only the Quotient
   procedure FZ_Div(Dividend  : in  FZ;
                    Divisor   : in  FZ;
                    Quotient  : out FZ) is
      Remainder : FZ(Divisor'Range);
      pragma Unreferenced(Remainder);
   begin
      FZ_IDiv(Dividend, Divisor, Quotient, Remainder);
   end FZ_Div;
   pragma Inline_Always(FZ_Div);
 
 
   -- Exactly same thing as IDiv, but keep only the Remainder
   procedure FZ_Mod(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Remainder : out FZ) is
      Quotient : FZ(Dividend'Range);
      pragma Unreferenced(Quotient);
   begin
      FZ_IDiv(Dividend, Divisor, Quotient, Remainder);
   end FZ_Mod;
   pragma Inline_Always(FZ_Mod);
 
end FZ_Divis;

And, now: “Egyptological” multiplication.

fz_mul.ads:

with FZ_Type; use FZ_Type;
 
 
package FZ_Mul is
 
   pragma Pure;
 
   -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
   procedure FZ_Mul_Egyptian(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ);
   pragma Precondition(X'Length = Y'Length and
                         XY_Lo'Length = XY_Hi'Length and
                         XY_Lo'Length = ((X'Length + Y'Length) / 2));
 
end FZ_Mul;

Observe that it is “symmetric” with the Division, i.e. the exact same process “in reverse gear”:

fz_mul.adb:

with Words;    use Words;
with FZ_Basic; use FZ_Basic;
with FZ_Pred;  use FZ_Pred;
with FZ_Arith; use FZ_Arith;
with FZ_Shift; use FZ_Shift;
 
 
package body FZ_Mul is
 
   -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
   procedure FZ_Mul_Egyptian(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ) is
 
      -- Register holding running product
      XY : FZ(1 .. X'Length + Y'Length);
 
      -- X-Slide
      XS : FZ(1 .. X'Length + Y'Length);
 
      -- Y-Slide
      YS : FZ(Y'Range) := Y;
 
   begin
      -- Product register begins empty
      FZ_Clear(XY);
 
      -- X-Slide initially equals X:
      XS(1            .. X'Length) := X;
      XS(X'Length + 1 .. XS'Last)  := (others => 0);
 
      -- For each bit of Y:
      for i in 1 .. FZ_Bitness(Y) loop
 
         -- If lowest bit of Y-Slide is 1, X-Slide is added into XY
         FZ_Add_Gated(X    => XY, Y => XS, Sum => XY,
                      Gate => FZ_OddP(YS));
 
         -- X-Slide := X-Slide * 2
         FZ_ShiftLeft(XS, XS, 1);
 
         -- Y-Slide := Y-Slide / 2
         FZ_ShiftRight(YS, YS, 1);
 
      end loop;
 
      -- Write out the Product's lower and upper FZs:
      XY_Lo := XY(1                .. XY_Lo'Length);
      XY_Hi := XY(XY_Lo'Length + 1 .. XY'Last);
 
   end FZ_Mul_Egyptian;
   pragma Inline_Always(FZ_Mul_Egyptian);
 
end FZ_Mul;

We begin with a working register XY, initially zero, wide enough to hold the product; a “X-Slide” XS, which initially equals multiplicand X; and a “Y-Slide”, YS, which initially equals multiplicand Y. XS slides “upwards”, and so is given room so as to be able to fit the uppermost possible bit of the product. YS slides “downward” and is therefore Y-sized.

The procedure requires a number of iterations equal to the bitness of Y (which, the attentive reader will observe, in current FFA will always equal the bitness of X — but why not specify the general case?) In each iteration, XS is conditionally added in constant time to the running product XY. The condition is determined by the lowest bit of current YS. At the end of each iteration, XS slides “up”, i.e. is multiplied by two, while YS slides “down”, i.e. is divided by two.

After the required number of iterations, XY will equal the product of the multiplicands X and Y. The two halves of this product, FZ integers equal in their bitness to that of the multiplicands, are finally written to XY_Lo and XY_Hi — the lower and upper halves respectively.

Observe that this multiplier runs in O(N^2) — FZ_Add being an O(N) operation, N being the current FZ bitness. This “bearded” algorithm is impractically slow for battlefield use, in e.g. RSA. However it is unbeatable in simplicity. The reader should be able to fully understand why it cannot fail, and why the result will be emitted in constant time (i.e. the required work is wholly independent of the Hamming weights of the multiplicands) and occupy exactly the allotted space.

In subsequent chapters, we will explore several other multiplication algorithms, which make tradeoffs in simplicity (but however pointedly without compromising constant-time operation!) and gain considerable improvements in speed.


Now, let’s add all of the new operations to FFACalc:

First, a helper function that prevents division by zero:

ffa_calc.adb:

      -- Ensure that a divisor is not zero
      procedure MustNotZero(D : in FZ) is
      begin
         if FZ_ZeroP(D) = 1 then
            E("Division by Zero!");
         end if;
      end MustNotZero;

The Division-with-Remainder, Integer Division, Modulus, and Multiplication operations:

First, ‘\’: expects a Dividend, and on top of it, a Divisor; and leaves the stack containing in their place a Quotient, and on top of it, a Remainder:

ffa_calc.adb:

               -- Divide and give Quotient and Remainder
            when '\' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_IDiv(Dividend  => Stack(SP - 1),
                       Divisor   => Stack(SP),
                       Quotient  => Stack(SP - 1),
                       Remainder => Stack(SP));

Then, ‘/’: exactly the same thing as the above, but leaves only the Quotient on the stack:

               -- Divide and give Quotient only
            when '/' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_Div(Dividend  => Stack(SP - 1),
                      Divisor   => Stack(SP),
                      Quotient  => Stack(SP - 1));
               Drop;

And, ‘%’: this is the familiar modulus operation; it performs division but leaves only the remainder on the stack:

               -- Divide and give Remainder only
            when '%' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_Mod(Dividend  => Stack(SP - 1),
                      Divisor   => Stack(SP),
                      Remainder => Stack(SP - 1));
               Drop;

Finally, multiplication, using the very slow Egyptian method. Observe that multiplication of two integers results in a product having as its bitness the sum of their bitnesses. Therefore FFACalc multiplication places two FZ integers on the stack, the lower half of the result, followed by the upper half. If the latter is unwanted, it must be dropped using the Drop (”_”) operation (See Chapter 4.) The Overflow flag is in all cases unaffected, as there is room for the entire product — its lower and upper halves occupy the same places on the stack which formerly held the multiplicands.

               -- Multiply, give bottom and top halves
            when '*' =>
               Want(2);
               MustNotZero(Stack(SP));
               -- Ch5: slow and simple 'Egyptological' method:
               FZ_Mul_Egyptian(X     => Stack(SP - 1),
                               Y     => Stack(SP),
                               XY_Lo => Stack(SP - 1),
                               XY_Hi => Stack(SP));

Now let’s build the improved FFACalc:

cd ffacalc
gprbuild

And try out the new operations:

$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\##" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F
000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0/#" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0%#" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0\#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0/#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0%#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\.A1398B1FBC8872F908C44763C5427EBD0F={[R OK]}{[R SAD]}_[; ].101D96F0815F0853E1D8D883B93F2D9={[Q OK]}{[Q SAD]}_[; ]" | ./bin/ffa_calc 256 32
 
R OK; Q OK;
 
$ echo -n -E ".FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF * ##  " | ./bin/ffa_calc 256 32 
 
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE
0000000000000000000000000000000000000000000000000000000000000001

Homework Problems, in order of difficulty:

  • Write a FFACalc tape which obtains two numbers from the top of the stack, multiplies them, then alternatingly divides the obtained product by each of the original multiplicands, and thereby verifying that both the multiplier and the divider work as expected. The message “OK” should be printed in the event of successful test of all four new FFACalc functions, and “Sad” in case of the opposite. “Sabotage” and rebuild FFACalc to produce the necessary erroneous arithmetic for this test.
  • Find the elementary optimization that cuts the constant factor in the run-times of both FZ_Mul_Egyptian and FZ_IDiv by a third. Hint: one of the O(N) suboperations in both of them, can be replaced with a O(1) one! But which?
  • Show how to cut the required CPU time of the improved, from the previous problem, algorithm, by another quarter — at the expense of a small amount of complexity, but without fundamentally departing from either of the two “Egyptological” algorithms.
  • Prove that the provided Division and Multiplication operations, work.

~To be continued!~

“Finite Field Arithmetic.” Chapter 4: Interlude: FFACalc.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ch4_ffacalc.vpatch.

Just like before, you will end up with two directories, libffa and ffademo.
However you will also see a new one, ffacalc.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


As the title of this chapter suggests, it will not introduce fundamentally new FFA material. Instead, you will meet FFACalc — a program which makes practical use of the routines presented in Chapters 1, 2, and 3. Henceforth every Chapter in this series will build on FFACalc, rather than continuing to expand the rather-uninteresting ffademo. When we reach the final Chapter, the reader will notice that… but let’s not spoil it!

For now, FFACalc is exactly what the name implies: a FFAtronic RPN calculator, capable strictly of addition, subtraction, basic bitwise ops, numeric comparison, a small number of simple stack manipulations (a la Forth), and some elementary I/O.

But first, a few small helper-function additions to libFFA.

Calculators need to accept numeric input, and it is to be processed one hexadecimal digit at a time. Therefore a nibble subtype of Word is called for:

words.ads:

   -- Word, restricted to Nibble range.
   subtype Nibble is Word range 0 .. 16#F#;

Predicate operators produce strictly WBool (see Chapter 1) outputs. FFACalc will operate strictly in FZ integers. Therefore a conversion is required:

fz_basic.ads:

   -- Set given FZ to a given truth value
   procedure WBool_To_FZ(V : in WBool; N : out FZ);

fz_basic.adb:

   -- Set given FZ to a given truth value
   procedure WBool_To_FZ(V : in WBool; N : out FZ) is
   begin
      FZ_Clear(N);
      FZ_Set_Head(N, V);
   end WBool_To_FZ;
   pragma Inline_Always(WBool_To_FZ);

Sometimes, we will need to go in the other direction, and produce a WBool from an FZ, based on the Boolean meaning of its value (i.e. whether it is a nonzero.) This will look like this:

w_pred.ads:

   -- Return 1 if N is unequal to 0; otherwise return 0.
   function W_NZeroP(N : in Word) return WBool;

w_pred.adb:

   -- Return 1 if N is unequal to 0; otherwise return 0.
   function W_NZeroP(N : in Word) return WBool is
   begin
      return 1 xor W_ZeroP(N);
   end W_NZeroP;
   pragma Inline_Always(W_NZeroP);

Now since we are introducing a program where the user is able to control the width of an instantiated FFAtron, we will need a validity predicate. FFA “bitness” is not an arbitrary positive number, but must be an integer multiple of all historical machine word sizes, and additionally a power of two (the reason for the latter restriction will become apparent in the Sub-Quadratic Multiplication Chapter.) And so:

fz_lim.ads:

package FZ_Lim is
 
   pragma Pure;
 
   FZ_Minimal_Bitness : constant Positive := 256;
 
   FZ_Validity_Rule_Doc : constant String
     := "Must be greater than or equal to 256, and a power of 2.";
 
   -- Determine if a proposed FFA Bitness is valid.
   function FZ_Valid_Bitness_P(B : in Positive) return Boolean;
 
end FZ_Lim;

fz_lim.adb:

package body FZ_Lim is
 
   -- Determine if a proposed FFA Bitness is valid.
   function FZ_Valid_Bitness_P(B : in Positive) return Boolean is
      Result   : Boolean := False;
      T        : Natural := B;
      PopCount : Natural := 0;
   begin
      -- Supposing we meet the minimal bitness:
      if B >= FZ_Minimal_Bitness then
         while T > 0 loop
            if T mod 2 = 1 then
               PopCount := PopCount + 1;
            end if;
            T := T / 2;
         end loop;
 
         -- Is B a power of 2?
         if PopCount = 1 then
            Result := True;
         end if;
      end if;
 
      return Result;
   end FZ_Valid_Bitness_P;
 
end FZ_Lim;

In a power of 2, the “popcount” (number of 1s) is necessarily 1. The mechanics of this routine should be apparent to the alert reader, nothing more will be said of it.

And that’s all for LibFFA, for the time being. Now returning to FFACalc: we will shed the old ffademo’s dependence on Ada’s standard I/O library, in favour of a more compact approach:

os.ads:

with Interfaces;   use Interfaces;
with Interfaces.C; use Interfaces.C;
 
 
package OS is
 
   -- Receive a character from the TTY, and True if success (False if EOF)
   function Read_Char(C : out Character) return Boolean;
 
   -- Send a character to the TTY.
   procedure Write_Char(C : in Character);
 
   -- Send a Newline to the TTY.
   procedure Write_Newline;
 
   -- Exit with an error condition report.
   procedure Eggog(M : String);
 
   procedure Quit(Return_Code : Integer);
   pragma Import
     (Convention    => C,
      Entity        => Quit,
      External_Name => "exit");
 
private
 
   -- POSIX stdio:   
   EOF : constant int := -1;
 
   function GetChar return int;
   pragma Import(C, getchar);
 
   function PutChar(item: int) return int;
   pragma Import(C, putchar);
 
   -- GNATistic
   procedure To_Stderr(C : Character);
   pragma Import(Ada, To_Stderr, "__gnat_to_stderr_char");
 
   Sadness_Code : constant Integer := -1;
 
end OS;

os.adb:

package body OS is
 
   -- Receive a character from the TTY, and True if success (False if EOF)
   function Read_Char(C : out Character) return Boolean is
      i : int;
      Result : Boolean := False;
   begin
      i := GetChar;
      if i /= EOF then
         C := Character'Val(i);
         Result := True;
      end if;
      return Result;
   end Read_Char;
 
 
   -- Send a character to the TTY.
   procedure Write_Char(C : in Character) is
      R : int;
      pragma Unreferenced(R);
   begin
      R := PutChar(int(Character'Pos(C)));
   end Write_Char;
 
 
   -- Send a Newline to the TTY.
   procedure Write_Newline is
   begin
      Write_Char(Character'Val(16#A#));
   end Write_Newline;
 
 
   -- Exit with an error condition report.
   procedure Eggog(M : String) is
   begin
      for i in 1 .. M'Length loop
         To_Stderr(M(I));
      end loop;
 
      -- Emit LF
      To_Stderr(Character'Val(16#A#));
 
      -- Exit
      Quit(Sadness_Code);
   end;
 
end OS;

Now for a bit of surprise. Did you know that it is impossible to make use of the standard Ada.Command_Line functionality in a program where the

pragma Restrictions(No_Secondary_Stack);

… restriction is in force?

The reason for this becomes apparent in a careful reading of the Standard, where we find the following turd:

function Argument (Number : in Positive) return String;

Indeed, a completely unnecessary invocation of the secondary stack ! Why did the authors of the Standard do this ? I have no idea, but we will have to correct their mistake! Unfortunately it is quite impossible to do this without invoking some GNATisms. And so, we must:

cmdline.ads:

with System;
 
package CmdLine is
 
   -- IMHO this is reasonable.
   CmdLineArg_Length : constant Positive := 256;
 
   subtype CmdLineArg is String(1 .. CmdLineArg_Length);
 
   function Initialized return Boolean;
 
   function Arg_Count return Natural;
   pragma Import(C, Arg_Count, "__gnat_arg_count");
 
   procedure Get_Argument(Number : in  Natural;
                          Result : out String);
 
private
 
   procedure Fill_Arg (A : System.Address; Arg_Num : Integer);
   pragma Import(C, Fill_Arg, "__gnat_fill_arg");
 
   function Len_Arg (Arg_Num : Integer) return Integer;
   pragma Import(C, Len_Arg, "__gnat_len_arg");
 
end CmdLine;

cmdline.adb:

with System; use System;
 
package body CmdLine is
 
   -- Test if GNAT's cmdline mechanism is available
   function Initialized return Boolean is
      gnat_argv : System.Address;
      pragma Import (C, gnat_argv, "gnat_argv");
 
   begin
      return gnat_argv /= System.Null_Address;
   end Initialized;
 
 
   -- Fill the provided string with the text of Number-th cmdline arg
   procedure Get_Argument(Number : in  Natural;
                          Result : out String) is
   begin
      if Number >= Arg_Count or (not Initialized) then
         raise Constraint_Error;
      end if;
 
      declare
         L   : constant Integer := Len_Arg(Number);
         Arg : aliased String(1 .. L);
      begin
         -- Will it fit into the available space?
         if L > Result'Length then
            raise Constraint_Error;
         end if;
 
         -- Get this arg string from where GNAT stowed it
         Fill_Arg(Arg'Address, Number);
 
         -- Copy it to Result:
         Result := (others => ' ');
         Result(Arg'Range) := Arg;
      end;
   end Get_Argument;
 
end CmdLine;

How this is invoked, will soon become quite apparent. Let’s at last proceed to FFACalc !

We make use here of nearly everything we have seen in Chapters 1-3:

ffa_calc.adb:

-- Basics
with OS;          use OS;
with CmdLine;     use CmdLine;
 
-- FFA
with FZ_Lim;   use FZ_Lim;
with Words;    use Words;
with W_Pred;   use W_Pred;
with FZ_Type;  use FZ_Type;
with FZ_Basic; use FZ_Basic;
with FZ_Arith; use FZ_Arith;
with FZ_Cmp;   use FZ_Cmp;
with FZ_Pred;  use FZ_Pred;
with FZ_BitOp; use FZ_BitOp;
with FZ_Shift; use FZ_Shift;
 
-- For Output
with FFA_IO;   use FFA_IO;
procedure FFA_Calc is
 
   Width  : Positive; -- Desired FFA Width
   Height : Positive; -- Desired Height of Stack
 
begin
   if Arg_Count /= 3 then
      Eggog("Usage: ./ffa_calc WIDTH HEIGHT");
   end if;
 
   declare
      Arg1 : CmdLineArg;
      Arg2 : CmdLineArg;
   begin
      -- Get commandline args:
      Get_Argument(1, Arg1); -- First arg
      Get_Argument(2, Arg2); -- Second arg
 
      -- Parse into Positives:
      Width  := Positive'Value(Arg1);
      Height := Positive'Value(Arg2);
   exception
      when others =>
         Eggog("Invalid arguments!");
   end;
 
   -- Test if proposed Width is permissible:
   if not FZ_Valid_Bitness_P(Width) then
      Eggog("Invalid Width: " & FZ_Validity_Rule_Doc);
   end if;

Above, we see how our replacement for Ada’s standard command-line argument reader works. Instead of demanding the secondary stack, we make use of pre-allocated strings, into which each argument is copied. The reader is invited to try and overflow these: the resulting death is a clean one.

Now for the calculator…

   -- The Calculator itself:
   declare
 
      -- The number of Words required to make a FZ of the given Bitness.
      Wordness : Indices := Indices(Width / Bitness);
 
      --------------------------------------------------------
      -- State --
      --------------------------------------------------------
      -- The Stack:
      subtype Stack_Positions is Natural range 0 .. Height;
      type Stacks is array(Stack_Positions range <>) of FZ(1 .. Wordness);
      Stack      : Stacks(Stack_Positions'Range);
 
      -- Stack Pointer:
      SP         : Stack_Positions := Stack_Positions'First;
 
      -- Carry/Borrow Flag:
      Flag       : WBool   := 0;
 
      -- Odometer:
      Pos        : Natural := 0;
 
      -- The current levels of the three types of nestedness:
      QuoteLevel : Natural := 0;
      CommLevel  : Natural := 0;
      CondLevel  : Natural := 0;
      --------------------------------------------------------

Observe that the FORTH-like stack is allocated on the stack (your machine’s, that is), and its height is determined by the HEIGHT parameter given in the second command line argument. The width of the FZ integers comprising the elements of this stack, is in turn given by WIDTH, the first command line argument.

Now for some elementary stack-manipulation routines:

      -- Clear the stack and set SP to bottom.
      procedure Zap is
      begin
         -- Clear the stack
         for i in Stack'Range loop
            FZ_Clear(Stack(i));
         end loop;
         -- Set SP to bottom
         SP   := Stack_Positions'First;
         -- Clear Overflow flag
         Flag := 0;
      end Zap;
 
 
      -- Report a fatal error condition at the current symbol
      procedure E(S : in String) is
      begin
         Eggog("Pos:" & Natural'Image(Pos) & ": " & S);
      end E;
 
 
      -- Move SP up
      procedure Push is
      begin
         if SP = Stack_Positions'Last then
            E("Stack Overflow!");
         else
            SP := SP + 1;
         end if;
      end Push;
 
 
      -- Discard the top of the stack
      procedure Drop is
      begin
         FZ_Clear(Stack(SP));
         SP := SP - 1;
      end Drop;
 
 
      -- Check if stack has the necessary N items
      procedure Want(N : in Positive) is
      begin
         if SP < N then
            E("Stack Underflow!");
         end if;
      end Want;

Here we make use of the FZ_ShiftLeft operation we implemented in Chapter 3:

      -- Slide a new hex digit into the FZ on top of stack
      procedure Ins_Hex_Digit(N : in out FZ;
                              D : in Nibble) is
         Overflow : Word := 0;
      begin
         -- Make room in this FZ for one additional hex digit
         FZ_ShiftLeft_O(N        => N,
                        ShiftedN => N,
                        Count    => 4,
                        Overflow => Overflow);
 
         -- Constants which exceed the Width are forbidden:
         if W_NZeroP(Overflow) = 1 then
            E("Constant Exceeds Bitness!");
         end if;
 
         -- Set the new digit
         FZ_Or_W(N, D);
      end;

And now for the "opcodes" comprising our stack machine:

      -- Execute a Normal Op
      procedure Op_Normal(C : in Character) is
 
         -- Over/underflow output from certain ops
         F : Word;
 
      begin
 
         case C is
 
            --------------
            -- Stickies --
            --------------
            -- Enter Commented
            when '(' =>
               CommLevel := 1;
 
               -- Exit Commented (but we aren't in it!)
            when ')' =>
               E("Mismatched close-comment parenthesis !");
 
               -- Enter Quoted
            when '[' =>
               QuoteLevel := 1;
 
               -- Exit Quoted (but we aren't in it!)
            when ']' =>
               E("Mismatched close-quote bracket !");
 
               -- Enter a ~taken~ Conditional branch:
            when '{' =>
               Want(1);
               if FZ_ZeroP(Stack(SP)) = 1 then
                  CondLevel := 1;
               end if;
               Drop;
 
               -- Exit from a ~non-taken~ Conditional branch:
               -- ... we push a 0, to suppress the 'else' clause
            when '}' =>
               Push;
               WBool_To_FZ(0, Stack(SP));
 
               ----------------
               -- Immediates --
               ----------------
 
               -- These operate on the FZ ~currently~ at top of the stack;
               -- and this means that the stack may NOT be empty.
 
            when '0' .. '9' =>
               Want(1);
               Ins_Hex_Digit(Stack(SP),
                             Character'Pos(C) - Character'Pos('0'));
 
            when 'A' .. 'F' =>
               Want(1);
               Ins_Hex_Digit(Stack(SP),
                             10 + Character'Pos(C) - Character'Pos('A'));
 
            when 'a' .. 'f' =>
               Want(1);
               Ins_Hex_Digit(Stack(SP),
                             10 + Character'Pos(C) - Character'Pos('a'));
 
               ------------------
               -- Stack Motion --
               ------------------
 
               -- Push a 0 onto the stack
            when '.' =>
               Push;
               FZ_Clear(Stack(SP));
 
               -- Dup
            when '″' =>
               Want(1);
               Push;
               Stack(SP) := Stack(SP - 1);
 
               -- Drop
            when '_' =>
               Want(1);
               Drop;
 
               -- Swap
            when ''' =>
               Want(2);
               FZ_Swap(Stack(SP), Stack(SP - 1));
 
               -- Over
            when '`' =>
               Want(2);
               Push;
               Stack(SP) := Stack(SP - 2);
 
               ----------------
               -- Predicates --
               ----------------
 
               -- Equality
            when '=' =>
               Want(2);
               WBool_To_FZ(FZ_Eqp(X => Stack(SP),
                                  Y => Stack(SP - 1)),
                           Stack(SP - 1));
               Drop;
 
               -- Less-Than
            when '< ' =>
               Want(2);
               WBool_To_FZ(FZ_LessThanP(X => Stack(SP - 1),
                                        Y => Stack(SP)),
                           Stack(SP - 1));
               Drop;
 
               -- Greater-Than
            when '>' =>
               Want(2);
               WBool_To_FZ(FZ_GreaterThanP(X => Stack(SP - 1),
                                           Y => Stack(SP)),
                           Stack(SP - 1));
               Drop;
 
               ----------------
               -- Arithmetic --
               ----------------
 
               -- Subtract
            when '-' =>
               Want(2);
               FZ_Sub(X          => Stack(SP - 1),
                      Y          => Stack(SP),
                      Difference => Stack(SP - 1),
                      Underflow  => F);
               Flag := W_NZeroP(F);
               Drop;
 
               -- Add
            when '+' =>
               Want(2);
               FZ_Add(X        => Stack(SP - 1),
                      Y        => Stack(SP),
                      Sum      => Stack(SP - 1),
                      Overflow => F);
               Flag := W_NZeroP(F);
               Drop;
 
               -----------------
               -- Bitwise Ops --
               -----------------
 
               -- Bitwise-And
            when '&' =>
               Want(2);
               FZ_And(X      => Stack(SP - 1),
                      Y      => Stack(SP),
                      Result => Stack(SP - 1));
               Drop;
 
               -- Bitwise-Or
            when '|' =>
               Want(2);
               FZ_Or(X      => Stack(SP - 1),
                     Y      => Stack(SP),
                     Result => Stack(SP - 1));
               Drop;
 
               -- Bitwise-Xor
            when '^' =>
               Want(2);
               FZ_Xor(X      => Stack(SP - 1),
                      Y      => Stack(SP),
                      Result => Stack(SP - 1));
               Drop;
 
               -- Bitwise-Not (1s-Complement)
            when '~' =>
               Want(1);
               FZ_Not(Stack(SP), Stack(SP));
 
               -----------
               -- Other --
               -----------
 
               -- mUx
            when 'U' =>
               Want(3);
               FZ_Mux(X      => Stack(SP - 2),
                      Y      => Stack(SP - 1),
                      Result => Stack(SP - 2),
                      Sel    => FZ_NZeroP(Stack(SP)));
               Drop;
               Drop;
 
               -- Put the Overflow flag on the stack
            when 'O' =>
               Push;
               WBool_To_FZ(Flag, Stack(SP));
 
               -- Print the FZ on the top of the stack
            when '#' =>
               Want(1);
               Dump(Stack(SP));
               Drop;
 
               -- Zap (reset)
            when 'Z' =>
               Zap;
 
               -- Quit with Stack Trace
            when 'Q' =>
               for I in reverse Stack'First + 1 .. SP loop
                  Dump(Stack(I));
               end loop;
               Quit(0);
 
               ----------
               -- NOPs --
               ----------
 
               -- Ops we have not yet spoken of -- do nothing
            when others =>
               null;
 
         end case;
 
      end Op_Normal;
 
 
      -- Process a Symbol
      procedure Op(C : in Character) is
      begin
         -- First, see whether we are in a state of nestedness:
 
         -- ... in a Comment block:
         if CommLevel > 0 then
            case C is
               when ')' =>  -- Drop a nesting level:
                  CommLevel := CommLevel - 1;
               when '(' =>  -- Add a nesting level:
                  CommLevel := CommLevel + 1;
               when others =>
                  null; -- Other symbols have no effect at all
            end case;
 
            -- ... in a Quote block:
         elsif QuoteLevel > 0 then
            case C is
               when ']' =>   -- Drop a nesting level:
                  QuoteLevel := QuoteLevel - 1;
               when '[' =>   -- Add a nesting level:
                  QuoteLevel := QuoteLevel + 1;
               when others =>
                  null; -- Other symbols have no effect on the level
            end case;
 
            -- If we aren't the mode-exiting ']', print current symbol:
            if QuoteLevel > 0 then
               Write_Char(C);
            end if;
 
            --- ... in a ~taken~ Conditional branch:
         elsif CondLevel > 0 then
            case C is
               when '}' =>   -- Drop a nesting level:
                  CondLevel := CondLevel - 1;
 
                  -- If we exited the Conditional as a result,
                  -- we push a 1 to trigger the possible 'else' clause:
                  if CondLevel = 0 then
                     Push;
                     WBool_To_FZ(1, Stack(SP));
                  end if;
 
               when '{' =>   -- Add a nesting level:
                  CondLevel := CondLevel + 1;
               when others =>
                  null; -- Other symbols have no effect on the level
            end case;
         else
            -- This is a Normal Op, so proceed with the normal rules.
            Op_Normal(C);
         end if;
 
      end Op;
 
 
      -- Current Character
      C : Character;
 
   begin
      -- Reset the Calculator      
      Zap;
      -- Process characters until EOF:
      loop
         if Read_Char(C) then
            -- Execute Op:
            Op(C);
            -- Advance Odometer
            Pos := Pos + 1;
         else
            Zap;
            Quit(0); -- if EOF, we're done
         end if;
      end loop;
   end;
 
end FFA_Calc;

But rather than describing in detail the operation of FFACalc, I will invite the reader to build it, and solve the following puzzle:


Write a FFACalc tape that will take seven numbers, presumed to be on the top of the stack, and return the largest.

Your answer should work with any legal WIDTH, and any stack HEIGHT large enough to hold the working set.

For instance, suppose file numbers.txt were to contain:

.9.1.7.5.1.1.0

Then the following example invocation:

cd ffacalc
gprbuild
cat numbers.txt youranswer.txt |  ./bin/ffa_calc 256 16

... should produce the output:

0000000000000000000000000000000000000000000000000000000000000009

... and similarly for any other seven numbers.

A solution will be posted in the next Chapter.


~To be continued!~