“Finite Field Arithmetic.” Chapter 5: “Egyptological” Multiplication and Division.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.



Эй, сюда плывите, рыбки,
Сом Считает без ошибки!
Сом И делит,
Сом И множит,
Сом Сложить и вычесть Может!

Борис Заходер. СОМОМНЕНИЕ


You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch5_egypt.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


First, let’s go through the “mail bag” from Chapter 4:

Reader Apeloyee observed that FZ_Valid_Bitness_P can be made slightly shorter and clearer. I have taken the more conservative of his two suggestions.

The rewritten procedure now looks like this:

   -- Determine if a proposed FFA Bitness is valid.
   function FZ_Valid_Bitness_P(B : in Positive) return Boolean is
      Result   : Boolean := False;
      T        : Natural := B;
      PopCount : Natural := 0;
   begin
      -- Supposing we meet the minimal bitness:
      if B >= FZ_Minimal_Bitness then
         while T > 0 loop
            PopCount := PopCount + T mod 2;
            T := T / 2;
         end loop;
 
         -- Is B a power of 2?
         if PopCount = 1 then
            Result := True;
         end if;
      end if;
 
      return Result;
   end FZ_Valid_Bitness_P;

I would like to thank again the eagle-eyed Apeloyee for his attentive and skeptical reading.


And now for the solution to the Chapter 4 Puzzle:

ch4_official_solution.txt

Readers who have written solutions of their own to this puzzle, are invited to post them as comments to this post (please try to avoid spoiling it for those who have not yet reached Chapter 5.)


In Chapter 4, we wrote a small and very limited RPN calculator, FFACalc, capable only of addition, subtraction, basic I/O, and a few stack motion operations. In Chapter 5, we will give it the ability to multiply and divide.

The division and multiplication routines in FFA will extend across more than one chapter in this article series, on account of the wide spectrum of possible tradeoffs between simplicity and speed in the known approaches to both problems. We will begin at the extreme of simplicity at all cost. And, as the reader may already have guessed from the title of this Chapter, we will begin with division and multiplication algorithms that were known to the ancient Egyptians.

But first, a few small helper-function additions to LibFFA.

A convenient WBool (See Chapter 1) inverter:

w_pred.ads:

   -- Return WBool-complement of N.
   function W_Not(N : in WBool) return WBool;

w_pred.adb:

   -- Return WBool-complement of N.
   function W_Not(N : in WBool) return WBool is
   begin
      return 1 xor N;
   end W_Not;
   pragma Inline_Always(W_Not);

A convenient function for determining the bitness of an FZ integer:

fz_type.ads:

   -- A count of Bits, anywhere (cannot be 0):
   subtype Bit_Count is Positive;

fz_basic.ads:

   -- Determine the Bitness of N
   function FZ_Bitness(N : in FZ) return Bit_Count;

fz_basic.adb:

   -- Determine the Bitness of N
   function FZ_Bitness(N : in FZ) return Bit_Count is
   begin
      return N'Length * Words.Bitness;
   end FZ_Bitness;
   pragma Inline_Always(FZ_Bitness);

And finally, a mechanism for constant-time conditional addition:

fz_arith.ads:

   -- Gate = 1: Sum := X + Y; Overflow := Carry
   -- Gate = 0: Sum := X;     Overflow := 0
   procedure FZ_Add_Gated_O(X          : in  FZ;
                            Y          : in  FZ;
                            Gate       : in  WBool;
                            Sum        : out FZ;
                            Overflow   : out WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length);

fz_arith.adb:

   procedure FZ_Add_Gated_O(X          : in  FZ;
                            Y          : in  FZ;
                            Gate       : in  WBool;
                            Sum        : out FZ;
                            Overflow   : out WBool) is
      Carry : WBool := 0;
      Mask  : constant Word := 0 - Gate;
   begin
      for i in 0 .. Word_Index(X'Length - 1) loop
         declare
            A : constant Word := X(X'First + i);
            B : constant Word := Y(Y'First + i) and Mask;
            S : constant Word := A + B + Carry;
         begin
            Sum(Sum'First + i) := S;
            Carry  := W_Carry(A, B, S);
         end;
      end loop;
      Overflow := Carry;
   end FZ_Add_Gated_O;
   pragma Inline_Always(FZ_Add_Gated_O);

This is something like a hybrid between FZ_Add (See Chapter 1) and FZ_Mux. If Gate is equal to 1, Mask is set to “maxint” of our Word, i.e. 11111….1; consequently the quantity B is unmolested, and the routine behaves exactly like FZ_Add. However, if Gate is equal to 0, Mask is then also equal to 0 and B is forced to also equal 0 during each iteration of the loop; the resulting addition is a constant-time “no-op”.

Work out the behaviour of this algorithm on a sheet of paper, as you have done for the material in earlier chapters.


And at last, we are ready for “Egyptological” division:

fz_divis.ads:

with FZ_Type; use FZ_Type;
 
 
package FZ_Divis is
 
   pragma Pure;
 
   -- Dividend is divided by Divisor, producing Quotient and Remainder.
   -- WARNING: NO div0 test here! Caller must test.
   procedure FZ_IDiv(Dividend  : in  FZ;
                     Divisor   : in  FZ;
                     Quotient  : out FZ;
                     Remainder : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Quotient'Length = Remainder'Length and
                         Dividend'Length = Quotient'Length);
 
   -- Exactly same thing as IDiv, but keep only the Quotient
   procedure FZ_Div(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Quotient  : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Dividend'Length = Quotient'Length);
 
   -- Exactly same thing as IDiv, but keep only the Remainder
   procedure FZ_Mod(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Remainder : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Dividend'Length = Remainder'Length);
 
end FZ_Divis;

This algorithm is quite exactly analogous to the “long division” everyone learns in grade school. The “Egyptian” variant is also known as “restoring division” and is discussed as such in, e.g., Vol. 2 of D. E. Knuth’s “Art of Programming”. Observe that, just like every other FFA operation, this procedure does not branch on the operand bits. This is essential in any routine intended for use in cryptographic arithmetic.

The mechanics are quite simple: the Dividend is shifted into a register R “head-first”, and the Divisor is repeatedly subtracted from R. If the subtraction results in an underflow, this is taken to signify that the Divisor “did not go in”, and the subtraction is then undone, in constant-time (via the conditional-adder introduced earlier in this Chapter.) In the latter case, a 0 is revealed as the current bit of the Quotient — whose bits are found backwards, from last (highest) to first (lowest); in the opposite case, a 1 is written. The Remainder obtained from the division is found in R at the end of the procedure, which requires a number of iterations equal to the bitness of the Dividend:

fz_divis.adb:

with Words;    use Words;
with W_Pred;   use W_Pred;
with FZ_Basic; use FZ_Basic;
with FZ_Arith; use FZ_Arith;
with FZ_BitOp; use FZ_BitOp;
with FZ_Shift; use FZ_Shift;
 
 
package body FZ_Divis is
 
   -- Dividend is divided by Divisor, producing Quotient and Remainder.
   -- WARNING: NO div0 test here! Caller must test.
   procedure FZ_IDiv(Dividend  : in  FZ;
                     Divisor   : in  FZ;
                     Quotient  : out FZ;
                     Remainder : out FZ) is
 
      -- The working register
      QR : FZ(1 .. Dividend'Length + Divisor'Length);
 
      -- Bottom seg of Z will end up containing the Quotient; top - remainder
      Q  : FZ renames QR(1                   .. Dividend'Length); -- Quotient
      R  : FZ renames QR(Dividend'Length + 1 .. QR'Last);         -- Remainder
 
      C : WBool := 0; -- Borrow, from comparator
   begin
      Q := Dividend; -- Q begins with the Dividend
      FZ_Clear(R);   -- R begins empty
 
      -- For each bit of Dividend:
      for i in 1 .. FZ_Bitness(Dividend) loop
 
         -- Advance QR by 1 bit:
         FZ_ShiftLeft(QR, QR, 1);
 
         -- Subtract Divisor from R; Underflow goes into C
         FZ_Sub(X => R, Y => Divisor, Difference => R, Underflow => C);
 
         -- If C=1, subtraction underflowed, and then Divisor gets added back:
         FZ_Add_Gated(X => R, Y => Divisor, Gate => C, Sum => R);
 
         -- Current result-bit is equal to Not-C, i.e. 1 if Divisor 'went in'
         FZ_Or_W(Q, W_Not(C));
 
      end loop;
 
      Quotient  := Q; -- Output the Quotient.
      Remainder := R; -- Output the Remainder.
 
   end FZ_IDiv;
   pragma Inline_Always(FZ_IDiv);

The above is, to my knowledge, the simplest practical integer division algorithm (simple repeated-subtraction is not practical for large numbers and will not be discussed.) There are several substantial improvements that can be made to it, and we will visit them in later chapters. However, it is worth noting that Division is the most expensive of the familiar arithmetical operations, and — unlike multiplication — no order-of-magnitude speedup for it is known.

Now we will define remainderless division, and modulus operations. These can be optimized to an extent, but their simplest possible formulation is the trivial one below, written in terms of FZ_IDiv:

   -- Exactly same thing as IDiv, but keep only the Quotient
   procedure FZ_Div(Dividend  : in  FZ;
                    Divisor   : in  FZ;
                    Quotient  : out FZ) is
      Remainder : FZ(Divisor'Range);
      pragma Unreferenced(Remainder);
   begin
      FZ_IDiv(Dividend, Divisor, Quotient, Remainder);
   end FZ_Div;
   pragma Inline_Always(FZ_Div);
 
 
   -- Exactly same thing as IDiv, but keep only the Remainder
   procedure FZ_Mod(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Remainder : out FZ) is
      Quotient : FZ(Dividend'Range);
      pragma Unreferenced(Quotient);
   begin
      FZ_IDiv(Dividend, Divisor, Quotient, Remainder);
   end FZ_Mod;
   pragma Inline_Always(FZ_Mod);
 
end FZ_Divis;

And, now: “Egyptological” multiplication.

fz_mul.ads:

with FZ_Type; use FZ_Type;
 
 
package FZ_Mul is
 
   pragma Pure;
 
   -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
   procedure FZ_Mul_Egyptian(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ);
   pragma Precondition(X'Length = Y'Length and
                         XY_Lo'Length = XY_Hi'Length and
                         XY_Lo'Length = ((X'Length + Y'Length) / 2));
 
end FZ_Mul;

Observe that it is “symmetric” with the Division, i.e. the exact same process “in reverse gear”:

fz_mul.adb:

with Words;    use Words;
with FZ_Basic; use FZ_Basic;
with FZ_Pred;  use FZ_Pred;
with FZ_Arith; use FZ_Arith;
with FZ_Shift; use FZ_Shift;
 
 
package body FZ_Mul is
 
   -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
   procedure FZ_Mul_Egyptian(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ) is
 
      -- Register holding running product
      XY : FZ(1 .. X'Length + Y'Length);
 
      -- X-Slide
      XS : FZ(1 .. X'Length + Y'Length);
 
      -- Y-Slide
      YS : FZ(Y'Range) := Y;
 
   begin
      -- Product register begins empty
      FZ_Clear(XY);
 
      -- X-Slide initially equals X:
      XS(1            .. X'Length) := X;
      XS(X'Length + 1 .. XS'Last)  := (others => 0);
 
      -- For each bit of Y:
      for i in 1 .. FZ_Bitness(Y) loop
 
         -- If lowest bit of Y-Slide is 1, X-Slide is added into XY
         FZ_Add_Gated(X    => XY, Y => XS, Sum => XY,
                      Gate => FZ_OddP(YS));
 
         -- X-Slide := X-Slide * 2
         FZ_ShiftLeft(XS, XS, 1);
 
         -- Y-Slide := Y-Slide / 2
         FZ_ShiftRight(YS, YS, 1);
 
      end loop;
 
      -- Write out the Product's lower and upper FZs:
      XY_Lo := XY(1                .. XY_Lo'Length);
      XY_Hi := XY(XY_Lo'Length + 1 .. XY'Last);
 
   end FZ_Mul_Egyptian;
   pragma Inline_Always(FZ_Mul_Egyptian);
 
end FZ_Mul;

We begin with a working register XY, initially zero, wide enough to hold the product; a “X-Slide” XS, which initially equals multiplicand X; and a “Y-Slide”, YS, which initially equals multiplicand Y. XS slides “upwards”, and so is given room so as to be able to fit the uppermost possible bit of the product. YS slides “downward” and is therefore Y-sized.

The procedure requires a number of iterations equal to the bitness of Y (which, the attentive reader will observe, in current FFA will always equal the bitness of X — but why not specify the general case?) In each iteration, XS is conditionally added in constant time to the running product XY. The condition is determined by the lowest bit of current YS. At the end of each iteration, XS slides “up”, i.e. is multiplied by two, while YS slides “down”, i.e. is divided by two.

After the required number of iterations, XY will equal the product of the multiplicands X and Y. The two halves of this product, FZ integers equal in their bitness to that of the multiplicands, are finally written to XY_Lo and XY_Hi — the lower and upper halves respectively.

Observe that this multiplier runs in O(N^2) — FZ_Add being an O(N) operation, N being the current FZ bitness. This “bearded” algorithm is impractically slow for battlefield use, in e.g. RSA. However it is unbeatable in simplicity. The reader should be able to fully understand why it cannot fail, and why the result will be emitted in constant time (i.e. the required work is wholly independent of the Hamming weights of the multiplicands) and occupy exactly the allotted space.

In subsequent chapters, we will explore several other multiplication algorithms, which make tradeoffs in simplicity (but however pointedly without compromising constant-time operation!) and gain considerable improvements in speed.


Now, let’s add all of the new operations to FFACalc:

First, a helper function that prevents division by zero:

ffa_calc.adb:

      -- Ensure that a divisor is not zero
      procedure MustNotZero(D : in FZ) is
      begin
         if FZ_ZeroP(D) = 1 then
            E("Division by Zero!");
         end if;
      end MustNotZero;

The Division-with-Remainder, Integer Division, Modulus, and Multiplication operations:

First, ‘\’: expects a Dividend, and on top of it, a Divisor; and leaves the stack containing in their place a Quotient, and on top of it, a Remainder:

ffa_calc.adb:

               -- Divide and give Quotient and Remainder
            when '\' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_IDiv(Dividend  => Stack(SP - 1),
                       Divisor   => Stack(SP),
                       Quotient  => Stack(SP - 1),
                       Remainder => Stack(SP));

Then, ‘/’: exactly the same thing as the above, but leaves only the Quotient on the stack:

               -- Divide and give Quotient only
            when '/' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_Div(Dividend  => Stack(SP - 1),
                      Divisor   => Stack(SP),
                      Quotient  => Stack(SP - 1));
               Drop;

And, ‘%’: this is the familiar modulus operation; it performs division but leaves only the remainder on the stack:

               -- Divide and give Remainder only
            when '%' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_Mod(Dividend  => Stack(SP - 1),
                      Divisor   => Stack(SP),
                      Remainder => Stack(SP - 1));
               Drop;

Finally, multiplication, using the very slow Egyptian method. Observe that multiplication of two integers results in a product having as its bitness the sum of their bitnesses. Therefore FFACalc multiplication places two FZ integers on the stack, the lower half of the result, followed by the upper half. If the latter is unwanted, it must be dropped using the Drop (”_”) operation (See Chapter 4.) The Overflow flag is in all cases unaffected, as there is room for the entire product — its lower and upper halves occupy the same places on the stack which formerly held the multiplicands.

               -- Multiply, give bottom and top halves
            when '*' =>
               Want(2);
               MustNotZero(Stack(SP));
               -- Ch5: slow and simple 'Egyptological' method:
               FZ_Mul_Egyptian(X     => Stack(SP - 1),
                               Y     => Stack(SP),
                               XY_Lo => Stack(SP - 1),
                               XY_Hi => Stack(SP));

Now let’s build the improved FFACalc:

cd ffacalc
gprbuild

And try out the new operations:

$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\##" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F
000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0/#" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0%#" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0\#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0/#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0%#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\.A1398B1FBC8872F908C44763C5427EBD0F={[R OK]}{[R SAD]}_[; ].101D96F0815F0853E1D8D883B93F2D9={[Q OK]}{[Q SAD]}_[; ]" | ./bin/ffa_calc 256 32
 
R OK; Q OK;
 
$ echo -n -E ".FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF * ##  " | ./bin/ffa_calc 256 32 
 
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE
0000000000000000000000000000000000000000000000000000000000000001

Homework Problems, in order of difficulty:

  • Write a FFACalc tape which obtains two numbers from the top of the stack, multiplies them, then alternatingly divides the obtained product by each of the original multiplicands, and thereby verifying that both the multiplier and the divider work as expected. The message “OK” should be printed in the event of successful test of all four new FFACalc functions, and “Sad” in case of the opposite. “Sabotage” and rebuild FFACalc to produce the necessary erroneous arithmetic for this test.
  • Find the elementary optimization that cuts the constant factor in the run-times of both FZ_Mul_Egyptian and FZ_IDiv by a third. Hint: one of the O(N) suboperations in both of them, can be replaced with a O(1) one! But which?
  • Show how to cut the required CPU time of the improved, from the previous problem, algorithm, by another quarter — at the expense of a small amount of complexity, but without fundamentally departing from either of the two “Egyptological” algorithms.
  • Prove that the provided Division and Multiplication operations, work.

~To be continued!~

This entry was written by Stanislav , posted on Saturday December 30 2017 , filed under Bitcoin, Cold Air, Computation, Cryptography, FFA, Friends, Mathematics, NonLoper, ShouldersGiants, SoftwareSucks . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

9 Responses to ““Finite Field Arithmetic.” Chapter 5: “Egyptological” Multiplication and Division.”

  • phf says:

    my own solution to the ch4 problem, instead of using a higher level (?) builtin operation, which i didn’t quite understand at the time, i used explicit conditionals.

    (compare two numbers
     and keep the largest)
    ``<{'_}{_}_
    (repeat)
    ``<{'_}{_}_
    ``<{'_}{_}_
    ``<{'_}{_}_
    ``<{'_}{_}_
    ``<{'_}{_}_
    #
    
  • PeterL says:

    I came up with the same solution, but I also realized it is equivalent to this slightly longer and less elegant solution:

    “<{'}__“<{'}__“<{'}__“<{'}__“<{'}__“<{'}__#

  • PeterL says:

    I imagine the ffa_calc-tape looks like spam to the website, no surprise that it gets caught in the filter.

    I did have one question: In the multiplication,
    pragma Precondition(X'Length = Y'Length and
    XY_Lo'Length = XY_Hi'Length and
    XY_Lo'Length = ((X'Length + Y'Length) / 2));

    why not just test XY_Lo’Length against X’Length?

  • Diana Coman says:

    Heh, it seems I got actually the “official” solution to the puzzle – when I finally got around to catch up with FFA.

    I don’t get something at this chapter though, at the simplest of the homework tasks: is it meant to be one single tape that starts with a stack with 2 numbers and then does everything (i.e. multiply, then divide by first number and check, divide by second number and check, output result)? If it is, there is something I don’t understand: result of multiplication will put 2 FZ on the stack. That is fine in itself, the checking division with each of the original numbers can be done of course in 2 parts, but I don’t get it how to access basically 2 steps rather than 1 single step behind on the stack to reach the divisor. Obviously, one can swap and discard in turn the high/low parts of the multiplication and then work with that but it seems horrible as approach to me. So what am I missing here?

  • Stanislav says:

    Dear Diana Coman,

    Congrats on finding the optimal answer to the Ch.4 puzzle!

    Hint: it’s perfectly fine to do the multiplication/division more than once, to get at the parts.

    Yours,
    -S

  • Diana Coman says:

    Hmmm, thank you for the hint: I was indeed trying to do it with a single operation. Making my life harder than it has to be, as usual.

    Why does FZ_Mul_Egyptian really need the YS value? Isn’t it enough to use the current bit of Y as the “gate” value? (i.e. “odd” at current step simply means current bit is 1 but there is no need for the value Y/2 itself, is there?)

    • Stanislav says:

      Dear Diana Coman,

      FZ_OddP(YS) is in fact “the current bit of Y”. YS is a copy of Y, which in Ada semantics we are not permitted to mutilate (having declared it an “in”.) FZ_ShiftRight(YS, YS, 1) moves the next bit into position. And so forth.

      In the next Chapter you will see a slightly more complicated — but considerably faster — mechanism for bitwise-walks through FZ integers.

      Yours,
      -S

  • Diana Coman says:

    Yes, my point was why do the division rather than simply walk the FZ but I understand it’ll be done at a later stage, ok.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">