Published on: Saturday December 30 2017

*This article is part of a series of hands-on tutorials introducing FFA, or the ***Finite Field Arithmetic** library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA **with his own hands**, and at the same time grasp the purpose of each moving part therein.

Эй, сюда плывите, рыбки,

Сом Считает без ошибки!

Сом И делит,

Сом И множит,

Сом Сложить и вычесть Может!

*– Борис Заходер. СОМОМНЕНИЕ*

You will need:

Add the above *vpatch* and *seal* to your V-set, and *press* to *ffa_ch5_egypt.vpatch.*

You should end up with the same directory structure as previously.

Now compile *ffacalc*:

But **do not run it quite yet.**

First, let’s go through the “mail bag” from Chapter 4:

Reader Apeloyee observed that *FZ_Valid_Bitness_P* can be made slightly shorter and clearer. I have taken the more conservative of his two suggestions.

The rewritten procedure now looks like this:

-- Determine if a proposed FFA Bitness is valid.
function FZ_Valid_Bitness_P(B : in Positive) return Boolean is
Result : Boolean := False;
T : Natural := B;
PopCount : Natural := 0;
begin
-- Supposing we meet the minimal bitness:
if B >= FZ_Minimal_Bitness then
while T > 0 loop
PopCount := PopCount + T mod 2;
T := T / 2;
end loop;
-- Is B a power of 2?
if PopCount = 1 then
Result := True;
end if;
end if;
return Result;
end FZ_Valid_Bitness_P;

I would like to thank again the eagle-eyed Apeloyee for his attentive and skeptical reading.

And now for the solution to the Chapter 4 Puzzle:

Readers who have written solutions of their own to this puzzle, are invited to post them as comments to *this* post (please try to avoid spoiling it for those who have not yet reached Chapter 5.)

In Chapter 4, we wrote a small and very limited RPN calculator, *FFACalc*, capable only of addition, subtraction, basic I/O, and a few stack motion operations. In Chapter 5, we will give it the ability to *multiply and divide.*

The division and multiplication routines in FFA will extend across more than one chapter in this article series, on account of the wide spectrum of possible tradeoffs between simplicity and speed in the known approaches to both problems. We will begin at the extreme of *simplicity at all cost*. And, as the reader may already have guessed from the title of this Chapter, we will begin with division and multiplication algorithms that were known **to the ancient Egyptians**.

But first, a few small helper-function additions to *LibFFA*.

A convenient *WBool* (See Chapter 1) inverter:

*w_pred.ads*:

-- Return WBool-complement of N.
function W_Not(N : in WBool) return WBool;

*w_pred.adb*:

-- Return WBool-complement of N.
function W_Not(N : in WBool) return WBool is
begin
return 1 xor N;
end W_Not;
pragma Inline_Always(W_Not);

A convenient function for determining the *bitness* of an *FZ* integer:

*fz_type.ads*:

-- A count of Bits, anywhere (cannot be 0):
subtype Bit_Count is Positive;

*fz_basic.ads*:

-- Determine the Bitness of N
function FZ_Bitness(N : in FZ) return Bit_Count;

*fz_basic.adb*:

-- Determine the Bitness of N
function FZ_Bitness(N : in FZ) return Bit_Count is
begin
return N'Length * Words.Bitness;
end FZ_Bitness;
pragma Inline_Always(FZ_Bitness);

And finally, a mechanism for **constant-time conditional addition:**

*fz_arith.ads*:

-- Gate = 1: Sum := X + Y; Overflow := Carry
-- Gate = 0: Sum := X; Overflow := 0
procedure FZ_Add_Gated_O(X : in FZ;
Y : in FZ;
Gate : in WBool;
Sum : out FZ;
Overflow : out WBool);
pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length);

*fz_arith.adb*:

procedure FZ_Add_Gated_O(X : in FZ;
Y : in FZ;
Gate : in WBool;
Sum : out FZ;
Overflow : out WBool) is
Carry : WBool := 0;
Mask : constant Word := 0 - Gate;
begin
for i in 0 .. Word_Index(X'Length - 1) loop
declare
A : constant Word := X(X'First + i);
B : constant Word := Y(Y'First + i) and Mask;
S : constant Word := A + B + Carry;
begin
Sum(Sum'First + i) := S;
Carry := W_Carry(A, B, S);
end;
end loop;
Overflow := Carry;
end FZ_Add_Gated_O;
pragma Inline_Always(FZ_Add_Gated_O);

This is something like a hybrid between *FZ_Add* (See Chapter 1) and *FZ_Mux*. If *Gate* is equal to 1, *Mask* is set to “maxint” of our Word, i.e. *11111….1*; consequently the quantity *B* is unmolested, and the routine behaves exactly like *FZ_Add*. However, if *Gate* is equal to 0, *Mask* is then also equal to 0 and *B* is forced to also equal 0 during each iteration of the loop; the resulting addition is a constant-time “no-op”.

Work out the behaviour of this algorithm on a sheet of paper, as you have done for the material in earlier chapters.

And at last, we are ready for “Egyptological” **division**:

*fz_divis.ads*:

with FZ_Type; use FZ_Type;
package FZ_Divis is
pragma Pure;
-- Dividend is divided by Divisor, producing Quotient and Remainder.
-- WARNING: NO div0 test here! Caller must test.
procedure FZ_IDiv(Dividend : in FZ;
Divisor : in FZ;
Quotient : out FZ;
Remainder : out FZ);
pragma Precondition(Dividend'Length = Divisor'Length and
Quotient'Length = Remainder'Length and
Dividend'Length = Quotient'Length);
-- Exactly same thing as IDiv, but keep only the Quotient
procedure FZ_Div(Dividend : in FZ;
Divisor : in FZ;
Quotient : out FZ);
pragma Precondition(Dividend'Length = Divisor'Length and
Dividend'Length = Quotient'Length);
-- Exactly same thing as IDiv, but keep only the Remainder
procedure FZ_Mod(Dividend : in FZ;
Divisor : in FZ;
Remainder : out FZ);
pragma Precondition(Dividend'Length = Divisor'Length and
Dividend'Length = Remainder'Length);
end FZ_Divis;

This algorithm is quite exactly analogous to the “long division” everyone learns in grade school. The “Egyptian” variant is also known as “restoring division” and is discussed as such in, e.g., Vol. 2 of D. E. Knuth’s “Art of Programming”. Observe that, just like every other *FFA* operation, this procedure **does not branch on the operand bits**. This is essential in any routine intended for use in cryptographic arithmetic.

The mechanics are quite simple: the *Dividend* is shifted into a register *R* “head-first”, and the *Divisor* is repeatedly subtracted from *R*. If the subtraction results in an underflow, this is taken to signify that the *Divisor* “did not go in”, and the subtraction is then undone, in *constant-time* (via the conditional-adder introduced earlier in this Chapter.) In the latter case, a **0** is revealed as the current bit of the *Quotient* — whose bits are found backwards, from last (highest) to first (lowest); in the opposite case, a **1** is written. The *Remainder* obtained from the division is found in *R* at the end of the procedure, which requires a number of iterations equal to the bitness of the *Dividend*:

*fz_divis.adb*:

with Words; use Words;
with W_Pred; use W_Pred;
with FZ_Basic; use FZ_Basic;
with FZ_Arith; use FZ_Arith;
with FZ_BitOp; use FZ_BitOp;
with FZ_Shift; use FZ_Shift;
package body FZ_Divis is
-- Dividend is divided by Divisor, producing Quotient and Remainder.
-- WARNING: NO div0 test here! Caller must test.
procedure FZ_IDiv(Dividend : in FZ;
Divisor : in FZ;
Quotient : out FZ;
Remainder : out FZ) is
-- The working register
QR : FZ(1 .. Dividend'Length + Divisor'Length);
-- Bottom seg of Z will end up containing the Quotient; top - remainder
Q : FZ renames QR(1 .. Dividend'Length); -- Quotient
R : FZ renames QR(Dividend'Length + 1 .. QR'Last); -- Remainder
C : WBool := 0; -- Borrow, from comparator
begin
Q := Dividend; -- Q begins with the Dividend
FZ_Clear(R); -- R begins empty
-- For each bit of Dividend:
for i in 1 .. FZ_Bitness(Dividend) loop
-- Advance QR by 1 bit:
FZ_ShiftLeft(QR, QR, 1);
-- Subtract Divisor from R; Underflow goes into C
FZ_Sub(X => R, Y => Divisor, Difference => R, Underflow => C);
-- If C=1, subtraction underflowed, and then Divisor gets added back:
FZ_Add_Gated(X => R, Y => Divisor, Gate => C, Sum => R);
-- Current result-bit is equal to Not-C, i.e. 1 if Divisor 'went in'
FZ_Or_W(Q, W_Not(C));
end loop;
Quotient := Q; -- Output the Quotient.
Remainder := R; -- Output the Remainder.
end FZ_IDiv;
pragma Inline_Always(FZ_IDiv);

The above is, to my knowledge, the simplest practical integer division algorithm (simple repeated-subtraction is not practical for large numbers and will not be discussed.) There are several substantial improvements that can be made to it, and we will visit them in later chapters. However, it is worth noting that Division is the **most expensive** of the familiar arithmetical operations, and — unlike multiplication — no order-of-magnitude speedup for it is known.

Now we will define remainderless division, and modulus operations. These can be optimized to an extent, but their simplest possible formulation is the trivial one below, written in terms of *FZ_IDiv*:

-- Exactly same thing as IDiv, but keep only the Quotient
procedure FZ_Div(Dividend : in FZ;
Divisor : in FZ;
Quotient : out FZ) is
Remainder : FZ(Divisor'Range);
pragma Unreferenced(Remainder);
begin
FZ_IDiv(Dividend, Divisor, Quotient, Remainder);
end FZ_Div;
pragma Inline_Always(FZ_Div);
-- Exactly same thing as IDiv, but keep only the Remainder
procedure FZ_Mod(Dividend : in FZ;
Divisor : in FZ;
Remainder : out FZ) is
Quotient : FZ(Dividend'Range);
pragma Unreferenced(Quotient);
begin
FZ_IDiv(Dividend, Divisor, Quotient, Remainder);
end FZ_Mod;
pragma Inline_Always(FZ_Mod);
end FZ_Divis;

And, now: “Egyptological” **multiplication**.

*fz_mul.ads*:

with FZ_Type; use FZ_Type;
package FZ_Mul is
pragma Pure;
-- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
procedure FZ_Mul_Egyptian(X : in FZ;
Y : in FZ;
XY_Lo : out FZ;
XY_Hi : out FZ);
pragma Precondition(X'Length = Y'Length and
XY_Lo'Length = XY_Hi'Length and
XY_Lo'Length = ((X'Length + Y'Length) / 2));
end FZ_Mul;

Observe that it is “symmetric” with the Division, i.e. the exact same process “in reverse gear”:

*fz_mul.adb*:

with Words; use Words;
with FZ_Basic; use FZ_Basic;
with FZ_Pred; use FZ_Pred;
with FZ_Arith; use FZ_Arith;
with FZ_Shift; use FZ_Shift;
package body FZ_Mul is
-- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
procedure FZ_Mul_Egyptian(X : in FZ;
Y : in FZ;
XY_Lo : out FZ;
XY_Hi : out FZ) is
-- Register holding running product
XY : FZ(1 .. X'Length + Y'Length);
-- X-Slide
XS : FZ(1 .. X'Length + Y'Length);
-- Y-Slide
YS : FZ(Y'Range) := Y;
begin
-- Product register begins empty
FZ_Clear(XY);
-- X-Slide initially equals X:
XS(1 .. X'Length) := X;
XS(X'Length + 1 .. XS'Last) := (others => 0);
-- For each bit of Y:
for i in 1 .. FZ_Bitness(Y) loop
-- If lowest bit of Y-Slide is 1, X-Slide is added into XY
FZ_Add_Gated(X => XY, Y => XS, Sum => XY,
Gate => FZ_OddP(YS));
-- X-Slide := X-Slide * 2
FZ_ShiftLeft(XS, XS, 1);
-- Y-Slide := Y-Slide / 2
FZ_ShiftRight(YS, YS, 1);
end loop;
-- Write out the Product's lower and upper FZs:
XY_Lo := XY(1 .. XY_Lo'Length);
XY_Hi := XY(XY_Lo'Length + 1 .. XY'Last);
end FZ_Mul_Egyptian;
pragma Inline_Always(FZ_Mul_Egyptian);
end FZ_Mul;

We begin with a working register *XY*, initially zero, wide enough to hold the product; a “X-Slide” *XS*, which initially equals multiplicand *X*; and a “Y-Slide”, *YS*, which initially equals multiplicand *Y*. *XS* slides “upwards”, and so is given room so as to be able to fit the uppermost possible bit of the product. *YS* slides “downward” and is therefore *Y*-sized.

The procedure requires a number of iterations equal to the bitness of *Y* (which, the attentive reader will observe, in current *FFA* will always equal the bitness of *X* — but why not specify the general case?) In each iteration, *XS* is *conditionally added* **in constant time** to the running product *XY*. The condition is determined by the lowest bit of current *YS*. At the end of each iteration, *XS* slides “up”, i.e. is multiplied by two, while *YS* slides “down”, i.e. is divided by two.

After the required number of iterations, *XY* will equal the product of the multiplicands *X* and *Y*. The two halves of this product, *FZ* integers equal in their bitness to that of the multiplicands, are finally written to *XY_Lo* and *XY_Hi* — the lower and upper halves respectively.

Observe that this multiplier runs in O(N^2) — FZ_Add being an O(N) operation, N being the current FZ bitness. This “bearded” algorithm is impractically slow for battlefield use, in e.g. RSA. However it is unbeatable in **simplicity**. The reader should be able to fully understand why it cannot fail, and why the result will be emitted in constant time (i.e. the required work is wholly independent of the Hamming weights of the multiplicands) and occupy exactly the allotted space.

In subsequent chapters, we will explore several other multiplication algorithms, which make tradeoffs in simplicity (but however pointedly **without** compromising constant-time operation!) and gain considerable improvements in speed.

Now, let’s add all of the new operations to *FFACalc:*

First, a helper function that prevents division by zero:

*ffa_calc.adb*:

-- Ensure that a divisor is not zero
procedure MustNotZero(D : in FZ) is
begin
if FZ_ZeroP(D) = 1 then
E("Division by Zero!");
end if;
end MustNotZero;

The Division-with-Remainder, Integer Division, Modulus, and Multiplication operations:

First, **‘\’**: expects a Dividend, and on top of it, a Divisor; and leaves the stack containing in their place a Quotient, and on top of it, a Remainder:

*ffa_calc.adb*:

-- Divide and give Quotient and Remainder
when '\' =>
Want(2);
MustNotZero(Stack(SP));
FZ_IDiv(Dividend => Stack(SP - 1),
Divisor => Stack(SP),
Quotient => Stack(SP - 1),
Remainder => Stack(SP));

Then, **‘/’**: exactly the same thing as the above, but leaves only the Quotient on the stack:

-- Divide and give Quotient only
when '/' =>
Want(2);
MustNotZero(Stack(SP));
FZ_Div(Dividend => Stack(SP - 1),
Divisor => Stack(SP),
Quotient => Stack(SP - 1));
Drop;

And, **‘%’**: this is the familiar *modulus* operation; it performs division but leaves only the remainder on the stack:

-- Divide and give Remainder only
when '%' =>
Want(2);
MustNotZero(Stack(SP));
FZ_Mod(Dividend => Stack(SP - 1),
Divisor => Stack(SP),
Remainder => Stack(SP - 1));
Drop;

Finally, multiplication, using the very slow Egyptian method. Observe that multiplication of two integers results in a product having as its bitness the sum of their bitnesses. Therefore *FFACalc* multiplication places **two** *FZ* integers on the stack, the lower half of the result, followed by the upper half. If the latter is unwanted, it must be dropped using the Drop *(”_”)* operation (See Chapter 4.) The Overflow flag is in all cases unaffected, as there is room for the entire product — its lower and upper halves occupy the same places on the stack which formerly held the multiplicands.

-- Multiply, give bottom and top halves
when '*' =>
Want(2);
MustNotZero(Stack(SP));
-- Ch5: slow and simple 'Egyptological' method:
FZ_Mul_Egyptian(X => Stack(SP - 1),
Y => Stack(SP),
XY_Lo => Stack(SP - 1),
XY_Hi => Stack(SP));

Now let’s build the improved *FFACalc*:

And try out the new operations:

$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\##" | ./bin/ffa_calc 256 32
000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F
000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0/#" | ./bin/ffa_calc 256 32
000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0%#" | ./bin/ffa_calc 256 32
000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0\#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0/#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0%#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\.A1398B1FBC8872F908C44763C5427EBD0F={[R OK]}{[R SAD]}_[; ].101D96F0815F0853E1D8D883B93F2D9={[Q OK]}{[Q SAD]}_[; ]" | ./bin/ffa_calc 256 32
R OK; Q OK;
$ echo -n -E ".FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF * ## " | ./bin/ffa_calc 256 32
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE
0000000000000000000000000000000000000000000000000000000000000001

**Homework Problems**, in order of difficulty:

- Write a
*FFACalc tape* which obtains two numbers from the top of the stack, multiplies them, then alternatingly divides the obtained product by each of the original multiplicands, and thereby verifying that both the multiplier and the divider work as expected. The message “OK” should be printed in the event of successful test of all four new *FFACalc* functions, and “Sad” in case of the opposite. “Sabotage” and rebuild *FFACalc* to produce the necessary erroneous arithmetic for this test.
- Find the elementary optimization that cuts the constant factor in the run-times of both
*FZ_Mul_Egyptian* and *FZ_IDiv* by **a third**. *Hint: one of the O(N) suboperations in both of them, can be replaced with a O(1) one! But which?*
- Show how to cut the required CPU time of the improved, from the previous problem, algorithm, by another quarter — at the expense of a small amount of complexity, but without fundamentally departing from either of the two “Egyptological” algorithms.
**Prove** that the provided Division and Multiplication operations, work.

*~To be continued!~*