## “Finite Field Arithmetic.” Chapter 5: “Egyptological” Multiplication and Division.

*This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.*

- Chapter 1: Genesis.
- Chapter 2: Logical and Bitwise Operations.
- Chapter 3: Shifts.
- Chapter 4: Interlude: FFACalc.
**Chapter 5: “Egyptological” Multiplication and Division.**

Эй, сюда плывите, рыбки,

Сом Считает без ошибки!

Сом И делит,

Сом И множит,

Сом Сложить и вычесть Может!

Эй, сюда плывите, рыбки,

Сом Считает без ошибки!

Сом И делит,

Сом И множит,

Сом Сложить и вычесть Может!

*– Борис Заходер. СОМОМНЕНИЕ*

You will need:

**All**of the materials from Chapters 1 – 4.- ffa_ch5_egypt.vpatch
- ffa_ch5_egypt.vpatch.asciilifeform.sig

Add the above *vpatch* and *seal* to your V-set, and *press* to *ffa_ch5_egypt.vpatch.*

You should end up with the same directory structure as previously.

Now compile *ffacalc*:

cd ffacalc gprbuild

But **do not run it quite yet.**

First, let’s go through the “mail bag” from Chapter 4:

Reader Apeloyee observed that *FZ_Valid_Bitness_P* can be made slightly shorter and clearer. I have taken the more conservative of his two suggestions.

The rewritten procedure now looks like this:

-- Determine if a proposed FFA Bitness is valid. function FZ_Valid_Bitness_P(B : in Positive) return Boolean is Result : Boolean := False; T : Natural := B; PopCount : Natural := 0; begin -- Supposing we meet the minimal bitness: if B >= FZ_Minimal_Bitness then while T > 0 loop PopCount := PopCount + T mod 2; T := T / 2; end loop; -- Is B a power of 2? if PopCount = 1 then Result := True; end if; end if; return Result; end FZ_Valid_Bitness_P;

I would like to thank again the eagle-eyed Apeloyee for his attentive and skeptical reading.

And now for the solution to the Chapter 4 Puzzle:

## ch4_official_solution.txt

Readers who have written solutions of their own to this puzzle, are invited to post them as comments to *this* post (please try to avoid spoiling it for those who have not yet reached Chapter 5.)

In Chapter 4, we wrote a small and very limited RPN calculator, *FFACalc*, capable only of addition, subtraction, basic I/O, and a few stack motion operations. In Chapter 5, we will give it the ability to *multiply and divide.*

The division and multiplication routines in FFA will extend across more than one chapter in this article series, on account of the wide spectrum of possible tradeoffs between simplicity and speed in the known approaches to both problems. We will begin at the extreme of *simplicity at all cost*. And, as the reader may already have guessed from the title of this Chapter, we will begin with division and multiplication algorithms that were known **to the ancient Egyptians**.

But first, a few small helper-function additions to *LibFFA*.

A convenient *WBool* (See Chapter 1) inverter:

*w_pred.ads*:

-- Return WBool-complement of N. function W_Not(N : in WBool) return WBool;

*w_pred.adb*:

-- Return WBool-complement of N. function W_Not(N : in WBool) return WBool is begin return 1 xor N; end W_Not; pragma Inline_Always(W_Not);

A convenient function for determining the *bitness* of an *FZ* integer:

*fz_type.ads*:

-- A count of Bits, anywhere (cannot be 0): subtype Bit_Count is Positive;

*fz_basic.ads*:

-- Determine the Bitness of N function FZ_Bitness(N : in FZ) return Bit_Count;

*fz_basic.adb*:

-- Determine the Bitness of N function FZ_Bitness(N : in FZ) return Bit_Count is begin return N'Length * Words.Bitness; end FZ_Bitness; pragma Inline_Always(FZ_Bitness);

And finally, a mechanism for **constant-time conditional addition:**

*fz_arith.ads*:

-- Gate = 1: Sum := X + Y; Overflow := Carry -- Gate = 0: Sum := X; Overflow := 0 procedure FZ_Add_Gated_O(X : in FZ; Y : in FZ; Gate : in WBool; Sum : out FZ; Overflow : out WBool); pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length);

*fz_arith.adb*:

procedure FZ_Add_Gated_O(X : in FZ; Y : in FZ; Gate : in WBool; Sum : out FZ; Overflow : out WBool) is Carry : WBool := 0; Mask : constant Word := 0 - Gate; begin for i in 0 .. Word_Index(X'Length - 1) loop declare A : constant Word := X(X'First + i); B : constant Word := Y(Y'First + i) and Mask; S : constant Word := A + B + Carry; begin Sum(Sum'First + i) := S; Carry := W_Carry(A, B, S); end; end loop; Overflow := Carry; end FZ_Add_Gated_O; pragma Inline_Always(FZ_Add_Gated_O);

This is something like a hybrid between *FZ_Add* (See Chapter 1) and *FZ_Mux*. If *Gate* is equal to 1, *Mask* is set to “maxint” of our Word, i.e. *11111….1*; consequently the quantity *B* is unmolested, and the routine behaves exactly like *FZ_Add*. However, if *Gate* is equal to 0, *Mask* is then also equal to 0 and *B* is forced to also equal 0 during each iteration of the loop; the resulting addition is a constant-time “no-op”.

Work out the behaviour of this algorithm on a sheet of paper, as you have done for the material in earlier chapters.

And at last, we are ready for “Egyptological” **division**:

*fz_divis.ads*:

with FZ_Type; use FZ_Type; package FZ_Divis is pragma Pure; -- Dividend is divided by Divisor, producing Quotient and Remainder. -- WARNING: NO div0 test here! Caller must test. procedure FZ_IDiv(Dividend : in FZ; Divisor : in FZ; Quotient : out FZ; Remainder : out FZ); pragma Precondition(Dividend'Length = Divisor'Length and Quotient'Length = Remainder'Length and Dividend'Length = Quotient'Length); -- Exactly same thing as IDiv, but keep only the Quotient procedure FZ_Div(Dividend : in FZ; Divisor : in FZ; Quotient : out FZ); pragma Precondition(Dividend'Length = Divisor'Length and Dividend'Length = Quotient'Length); -- Exactly same thing as IDiv, but keep only the Remainder procedure FZ_Mod(Dividend : in FZ; Divisor : in FZ; Remainder : out FZ); pragma Precondition(Dividend'Length = Divisor'Length and Dividend'Length = Remainder'Length); end FZ_Divis;

This algorithm is quite exactly analogous to the “long division” everyone learns in grade school. The “Egyptian” variant is also known as “restoring division” and is discussed as such in, e.g., Vol. 2 of D. E. Knuth’s “Art of Programming”. Observe that, just like every other *FFA* operation, this procedure **does not branch on the operand bits**. This is essential in any routine intended for use in cryptographic arithmetic.

The mechanics are quite simple: the *Dividend* is shifted into a register *R* “head-first”, and the *Divisor* is repeatedly subtracted from *R*. If the subtraction results in an underflow, this is taken to signify that the *Divisor* “did not go in”, and the subtraction is then undone, in *constant-time* (via the conditional-adder introduced earlier in this Chapter.) In the latter case, a **0** is revealed as the current bit of the *Quotient* — whose bits are found backwards, from last (highest) to first (lowest); in the opposite case, a **1** is written. The *Remainder* obtained from the division is found in *R* at the end of the procedure, which requires a number of iterations equal to the bitness of the *Dividend*:

*fz_divis.adb*:

with Words; use Words; with W_Pred; use W_Pred; with FZ_Basic; use FZ_Basic; with FZ_Arith; use FZ_Arith; with FZ_BitOp; use FZ_BitOp; with FZ_Shift; use FZ_Shift; package body FZ_Divis is -- Dividend is divided by Divisor, producing Quotient and Remainder. -- WARNING: NO div0 test here! Caller must test. procedure FZ_IDiv(Dividend : in FZ; Divisor : in FZ; Quotient : out FZ; Remainder : out FZ) is -- The working register QR : FZ(1 .. Dividend'Length + Divisor'Length); -- Bottom seg of Z will end up containing the Quotient; top - remainder Q : FZ renames QR(1 .. Dividend'Length); -- Quotient R : FZ renames QR(Dividend'Length + 1 .. QR'Last); -- Remainder C : WBool := 0; -- Borrow, from comparator begin Q := Dividend; -- Q begins with the Dividend FZ_Clear(R); -- R begins empty -- For each bit of Dividend: for i in 1 .. FZ_Bitness(Dividend) loop -- Advance QR by 1 bit: FZ_ShiftLeft(QR, QR, 1); -- Subtract Divisor from R; Underflow goes into C FZ_Sub(X => R, Y => Divisor, Difference => R, Underflow => C); -- If C=1, subtraction underflowed, and then Divisor gets added back: FZ_Add_Gated(X => R, Y => Divisor, Gate => C, Sum => R); -- Current result-bit is equal to Not-C, i.e. 1 if Divisor 'went in' FZ_Or_W(Q, W_Not(C)); end loop; Quotient := Q; -- Output the Quotient. Remainder := R; -- Output the Remainder. end FZ_IDiv; pragma Inline_Always(FZ_IDiv);

The above is, to my knowledge, the simplest practical integer division algorithm (simple repeated-subtraction is not practical for large numbers and will not be discussed.) There are several substantial improvements that can be made to it, and we will visit them in later chapters. However, it is worth noting that Division is the **most expensive** of the familiar arithmetical operations, and — unlike multiplication — no order-of-magnitude speedup for it is known.

Now we will define remainderless division, and modulus operations. These can be optimized to an extent, but their simplest possible formulation is the trivial one below, written in terms of *FZ_IDiv*:

-- Exactly same thing as IDiv, but keep only the Quotient procedure FZ_Div(Dividend : in FZ; Divisor : in FZ; Quotient : out FZ) is Remainder : FZ(Divisor'Range); pragma Unreferenced(Remainder); begin FZ_IDiv(Dividend, Divisor, Quotient, Remainder); end FZ_Div; pragma Inline_Always(FZ_Div); -- Exactly same thing as IDiv, but keep only the Remainder procedure FZ_Mod(Dividend : in FZ; Divisor : in FZ; Remainder : out FZ) is Quotient : FZ(Dividend'Range); pragma Unreferenced(Quotient); begin FZ_IDiv(Dividend, Divisor, Quotient, Remainder); end FZ_Mod; pragma Inline_Always(FZ_Mod); end FZ_Divis;

And, now: “Egyptological” **multiplication**.

*fz_mul.ads*:

with FZ_Type; use FZ_Type; package FZ_Mul is pragma Pure; -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y. procedure FZ_Mul_Egyptian(X : in FZ; Y : in FZ; XY_Lo : out FZ; XY_Hi : out FZ); pragma Precondition(X'Length = Y'Length and XY_Lo'Length = XY_Hi'Length and XY_Lo'Length = ((X'Length + Y'Length) / 2)); end FZ_Mul;

Observe that it is “symmetric” with the Division, i.e. the exact same process “in reverse gear”:

*fz_mul.adb*:

with Words; use Words; with FZ_Basic; use FZ_Basic; with FZ_Pred; use FZ_Pred; with FZ_Arith; use FZ_Arith; with FZ_Shift; use FZ_Shift; package body FZ_Mul is -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y. procedure FZ_Mul_Egyptian(X : in FZ; Y : in FZ; XY_Lo : out FZ; XY_Hi : out FZ) is -- Register holding running product XY : FZ(1 .. X'Length + Y'Length); -- X-Slide XS : FZ(1 .. X'Length + Y'Length); -- Y-Slide YS : FZ(Y'Range) := Y; begin -- Product register begins empty FZ_Clear(XY); -- X-Slide initially equals X: XS(1 .. X'Length) := X; XS(X'Length + 1 .. XS'Last) := (others => 0); -- For each bit of Y: for i in 1 .. FZ_Bitness(Y) loop -- If lowest bit of Y-Slide is 1, X-Slide is added into XY FZ_Add_Gated(X => XY, Y => XS, Sum => XY, Gate => FZ_OddP(YS)); -- X-Slide := X-Slide * 2 FZ_ShiftLeft(XS, XS, 1); -- Y-Slide := Y-Slide / 2 FZ_ShiftRight(YS, YS, 1); end loop; -- Write out the Product's lower and upper FZs: XY_Lo := XY(1 .. XY_Lo'Length); XY_Hi := XY(XY_Lo'Length + 1 .. XY'Last); end FZ_Mul_Egyptian; pragma Inline_Always(FZ_Mul_Egyptian); end FZ_Mul;

We begin with a working register *XY*, initially zero, wide enough to hold the product; a “X-Slide” *XS*, which initially equals multiplicand *X*; and a “Y-Slide”, *YS*, which initially equals multiplicand *Y*. *XS* slides “upwards”, and so is given room so as to be able to fit the uppermost possible bit of the product. *YS* slides “downward” and is therefore *Y*-sized.

The procedure requires a number of iterations equal to the bitness of *Y* (which, the attentive reader will observe, in current *FFA* will always equal the bitness of *X* — but why not specify the general case?) In each iteration, *XS* is *conditionally added* **in constant time** to the running product *XY*. The condition is determined by the lowest bit of current *YS*. At the end of each iteration, *XS* slides “up”, i.e. is multiplied by two, while *YS* slides “down”, i.e. is divided by two.

After the required number of iterations, *XY* will equal the product of the multiplicands *X* and *Y*. The two halves of this product, *FZ* integers equal in their bitness to that of the multiplicands, are finally written to *XY_Lo* and *XY_Hi* — the lower and upper halves respectively.

Observe that this multiplier runs in O(N^2) — FZ_Add being an O(N) operation, N being the current FZ bitness. This “bearded” algorithm is impractically slow for battlefield use, in e.g. RSA. However it is unbeatable in **simplicity**. The reader should be able to fully understand why it cannot fail, and why the result will be emitted in constant time (i.e. the required work is wholly independent of the Hamming weights of the multiplicands) and occupy exactly the allotted space.

In subsequent chapters, we will explore several other multiplication algorithms, which make tradeoffs in simplicity (but however pointedly **without** compromising constant-time operation!) and gain considerable improvements in speed.

Now, let’s add all of the new operations to *FFACalc:*

First, a helper function that prevents division by zero:

*ffa_calc.adb*:

-- Ensure that a divisor is not zero procedure MustNotZero(D : in FZ) is begin if FZ_ZeroP(D) = 1 then E("Division by Zero!"); end if; end MustNotZero;

The Division-with-Remainder, Integer Division, Modulus, and Multiplication operations:

First, **‘\’**: expects a Dividend, and on top of it, a Divisor; and leaves the stack containing in their place a Quotient, and on top of it, a Remainder:

*ffa_calc.adb*:

-- Divide and give Quotient and Remainder when '\' => Want(2); MustNotZero(Stack(SP)); FZ_IDiv(Dividend => Stack(SP - 1), Divisor => Stack(SP), Quotient => Stack(SP - 1), Remainder => Stack(SP));

Then, **‘/’**: exactly the same thing as the above, but leaves only the Quotient on the stack:

-- Divide and give Quotient only when '/' => Want(2); MustNotZero(Stack(SP)); FZ_Div(Dividend => Stack(SP - 1), Divisor => Stack(SP), Quotient => Stack(SP - 1)); Drop;

And, **‘%’**: this is the familiar *modulus* operation; it performs division but leaves only the remainder on the stack:

-- Divide and give Remainder only when '%' => Want(2); MustNotZero(Stack(SP)); FZ_Mod(Dividend => Stack(SP - 1), Divisor => Stack(SP), Remainder => Stack(SP - 1)); Drop;

Finally, multiplication, using the very slow Egyptian method. Observe that multiplication of two integers results in a product having as its bitness the sum of their bitnesses. Therefore *FFACalc* multiplication places **two** *FZ* integers on the stack, the lower half of the result, followed by the upper half. If the latter is unwanted, it must be dropped using the Drop *(”_”)* operation (See Chapter 4.) The Overflow flag is in all cases unaffected, as there is room for the entire product — its lower and upper halves occupy the same places on the stack which formerly held the multiplicands.

-- Multiply, give bottom and top halves when '*' => Want(2); MustNotZero(Stack(SP)); -- Ch5: slow and simple 'Egyptological' method: FZ_Mul_Egyptian(X => Stack(SP - 1), Y => Stack(SP), XY_Lo => Stack(SP - 1), XY_Hi => Stack(SP));

Now let’s build the improved *FFACalc*:

cd ffacalc gprbuild

And try out the new operations:

$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\##" | ./bin/ffa_calc 256 32 000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F 000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9 $ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0/#" | ./bin/ffa_calc 256 32 000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9 $ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0%#" | ./bin/ffa_calc 256 32 000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F $ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0\#" | ./bin/ffa_calc 256 32 Pos: 67: Division by Zero! $ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0/#" | ./bin/ffa_calc 256 32 Pos: 67: Division by Zero! $ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0%#" | ./bin/ffa_calc 256 32 Pos: 67: Division by Zero! $ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\.A1398B1FBC8872F908C44763C5427EBD0F={[R OK]}{[R SAD]}_[; ].101D96F0815F0853E1D8D883B93F2D9={[Q OK]}{[Q SAD]}_[; ]" | ./bin/ffa_calc 256 32 R OK; Q OK; $ echo -n -E ".FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF * ## " | ./bin/ffa_calc 256 32 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE 0000000000000000000000000000000000000000000000000000000000000001

**Homework Problems**, in order of difficulty:

- Write a
*FFACalc tape*which obtains two numbers from the top of the stack, multiplies them, then alternatingly divides the obtained product by each of the original multiplicands, and thereby verifying that both the multiplier and the divider work as expected. The message “OK” should be printed in the event of successful test of all four new*FFACalc*functions, and “Sad” in case of the opposite. “Sabotage” and rebuild*FFACalc*to produce the necessary erroneous arithmetic for this test. - Find the elementary optimization that cuts the constant factor in the run-times of both
*FZ_Mul_Egyptian*and*FZ_IDiv*by**a third**.*Hint: one of the O(N) suboperations in both of them, can be replaced with a O(1) one! But which?* - Show how to cut the required CPU time of the improved, from the previous problem, algorithm, by another quarter — at the expense of a small amount of complexity, but without fundamentally departing from either of the two “Egyptological” algorithms.
**Prove**that the provided Division and Multiplication operations, work.

*~To be continued!~*

my own solution to the ch4 problem, instead of using a higher level (?) builtin operation, which i didn’t quite understand at the time, i used explicit conditionals.

I came up with the same solution, but I also realized it is equivalent to this slightly longer and less elegant solution:

“<{'}__“<{'}__“<{'}__“<{'}__“<{'}__“<{'}__#

I imagine the ffa_calc-tape looks like spam to the website, no surprise that it gets caught in the filter.

I did have one question: In the multiplication,

`pragma Precondition(X'Length = Y'Length and`

XY_Lo'Length = XY_Hi'Length and

XY_Lo'Length = ((X'Length + Y'Length) / 2));

why not just test XY_Lo’Length against X’Length?

Dear PeterL,

> why not just test XY_Lo’Length against X’Length?Because it doesn’t actually do the same job?

Yours,

-S

Heh, it seems I got actually the “official” solution to the puzzle – when I finally got around to catch up with FFA.

I don’t get something at this chapter though, at the simplest of the homework tasks: is it meant to be one single tape that starts with a stack with 2 numbers and then does everything (i.e. multiply, then divide by first number and check, divide by second number and check, output result)? If it is, there is something I don’t understand: result of multiplication will put 2 FZ on the stack. That is fine in itself, the checking division with each of the original numbers can be done of course in 2 parts, but I don’t get it how to access basically 2 steps rather than 1 single step behind on the stack to reach the divisor. Obviously, one can swap and discard in turn the high/low parts of the multiplication and then work with that but it seems horrible as approach to me. So what am I missing here?

Dear Diana Coman,

Congrats on finding the optimal answer to the Ch.4 puzzle!

Hint: it’s perfectly fine to do the multiplication/division more than once, to get at the parts.

Yours,

-S

Hmmm, thank you for the hint: I was indeed trying to do it with a single operation. Making my life harder than it has to be, as usual.

Why does FZ_Mul_Egyptian really need the YS value? Isn’t it enough to use the current bit of Y as the “gate” value? (i.e. “odd” at current step simply means current bit is 1 but there is no need for the value Y/2 itself, is there?)

Dear Diana Coman,

FZ_OddP(YS)is in fact “the current bit of Y”. YS is a copy of Y, which in Ada semantics we are not permitted to mutilate (having declared it an “in”.)FZ_ShiftRight(YS, YS, 1)moves the next bit into position. And so forth.In the next Chapter you will see a slightly more complicated — but considerably faster — mechanism for bitwise-walks through FZ integers.

Yours,

-S

Yes, my point was why do the division rather than simply walk the FZ but I understand it’ll be done at a later stage, ok.

Hi Stan,

I believe to have properly understood Chapter 5 and have added my signature to the code shelf. I’ll post a write-up on my blog soon, covering some of the homework.

Dear spyked,

Congrats!

I have mirrored your sig here.

Yours,

-S

Dear Stan,

As promised, I posted some work regarding egyptian div and mul on my blog. More precisely, I wrote a mathematical proof that they work, or at least something that comes very close to a precise proof that also attempts to explain the mechanism in more detail. I’m really curious to hear your thoughts on this approach, if it makes any sense at all and especially whether you think it’d be worth it to extend it to the implementations in following chapters.

Thanks again for the work behind FFA! It’s certainly changed my perspective on how quality code is supposed to work.

Dear spyked,

This is pretty neat. And I will definitely read it, and properly comment, some time this week.

Yours,

-S

neato