“Finite Field Arithmetic.” Chapter 10: Introducing Karatsuba's Multiplication.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch10_karatsuba.kv.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


First things first:

Manul Inspection.

  • The mail bag from Chapter 9 will be dealt with in Chapter 11.
  • This Chapter may require a cup of coffee.
  • This Chapter is dedicated to my most dedicated readers, such that I did not even know that I had, who publicly raged when it did not appear on schedule.


Recall the algebraic identities in Mul_Word of Chapter 9, where a single Word × Word multiplication is turned into four HalfWord × HalfWord ones. Let's consider the general case.

Suppose that b is a positive integer, and we would like to multiply a 2b-bit-wide X by a 2b-bit-wide Y. We can express the multiplicands algebraically as follows:


X = XLo + 2bXHi
Y = YLo + 2bYHi

...where NLo consists of any 2b-wide integer N's least-significant b binary digits, while NHi consists of its b most-significant binary digits. The multiplication itself can then be written as:


XY = (XLo + 2bXHi)(YLo + 2bYHi)
.. = XLoYLo + 2bXLoYHi + 2bXHiYLo + 2bXHi2bYHi
.. = XLoYLo + 2b(XLoYHi + XHiYLo) + 22bXHiYHi

...or pictured "physically", drawn to scale (suppose that the most junior bit position in a given register is at the left edge of the rectangle representing said register; the most senior -- at the right edge) :

XLo XHi
× YLo YHi
=
XLoYLo
+ XLoYHi
+ XHiYLo
+ XHiYHi
= XY

Performing this "schoolbook" 2b × 2b multiplication costs four b × b-wide multiplications (multiplications by powers of two are performed using inexpensive position shifting and do not figure towards this count) and three 2b-wide additions. This cost, as we learned in previous chapters, is proportional to the square of b. But in fact it is possible to do better -- using a subquadratic multiplication algorithm.

We will make use of the oldest, best-known, and simplest such algorithm: the classic one derived from A. A. Karatsuba's (1937 - 2008) "Multiplication of many-digital numbers by automatic computers", Dokl. Akad. Nauk SSSR, 145:2 (1962), 293–294.

After working through this Chapter, the reader will possess a fits-in-head, constant-spacetime and provably-correct computerized implementation of Karatsuba's algorithm.

First, let's specify that our multiplicands X and Y are FZ-integers, and each of them occupies exactly L Words.

Recall that in Chapter 4, we have mandated that no FZ must come into existence with a bitwise length that is not a positive power of two. Therefore, at all times, a FZ will break cleanly into two halves of equal bitnesses, as many times as desired until halves each consisting of a FZ one Word in length are formed.

And so, at every recursive (see further below) invocation of our procedure, we are able to begin by finding a positive integer k (where L = 2k) i.e. the length of a multiplicand cut in half.

Thereby we can take b = MachineBitness × k, where k = L / 2.

Let's also define, for brevity, the following recurring terms:


LL = XLoYLo
HH = XHiYHi
MM = XLoYHi + XHiYLo

And trivially obtain the equation:


XY = LL + 2bMM + 22bHH

Karatsuba's discovery is the fact that it is possible to compute the MM middle term using only one b × b multiplication in place of two. In the most common implementations of the algorithm, this appears as the equivalence:


MM = XLoYHi + XHiYLo
.. = (XLo + XHi)(YLo + YHi) - HH - LL

This algebraic identity replaces an expensive operation (multiplication) with three cheap (addition/subtraction) arithmetical operations. However, the possibility of generating a carry during addition when computing the terms of the binomial makes the mechanism slightly slower and uglier than it strictly has to be. So we will instead use another variant of the algorithm, sometimes called subtractive Karatsuba. It makes use of the following trivial equivalence:


MM = LL + HH - (XLo - XHi)(YLo - YHi)

In FFA's realization, this will take the following -- at first, seemingly gnarlier -- form:


Dx = |XLo - XHi|
Dy = |YLo - YHi|
DD = Dx × Dy
MM = LL + HH - (-1CX)(-1CY)DD

...where CX is the sign (in the convention where it equals 1 if XLo is less than XHi, and 0 otherwise) of XLo - XHi and CY is the sign (similarly) of YLo - YHi.

And it is not difficult to show that, if we take


DDSub = CX XNOR CY

then:


MM = LL + HH + (-1DDSub)DD

Or, for kindergarten:

Or, for the thick, for the impatient,
CX CY DDSub (-1DDSub)DD MM =
0 0 1 -DD LL + HH - DD
0 1 0 DD LL + HH + DD
1 0 0 DD LL + HH + DD
1 1 1 -DD LL + HH - DD

...i.e. the DD term ends up subtracted if (XLo - XHi)(YLo - YHi) is positive; otherwise it gets added. Which is as it should be. And this gives us the formula:


XY = LL + 2b(LL + HH + (-1DDSub)DD) + 22bHH

And now let's draw a possible shape for this mechanism "electrically". First, let's do the obvious Right Thing and begin by stuffing LL and HH into the lower and upper, respectively, halves of the register in which we are computing the sum. And after that, we will add the "middle" terms, at the requisite offset (i.e. at the b-th binary place) :

LL HH
+ LL 0
+ HH 0
+ (-1DDSub)DD 0
= XY

The above would work as a physical realization, but our 2b-sized additions sadly became 3b-sized, because of the need to ripple out the carries (recall that the addition of any two s-sized integers is physically s+1-sized, as there is the possibility of overflow.) But can we do anything about this, or do we have to live with it?

Turns out: there is a pill. One economical way to make the additions stay 2b-sized, is to accumulate the carries. We will put them in a Word-sized register called TC (i.e. tail carry). After the three terms of the original equation have been added into the result XY, we will make the result correct by rippling out (at the cost of walking over a b-sized space) the accumulated carry in TC into the tail (i.e. most senior quarter segment) of the result XY.

The new scheme looks like this:

LL HH TC := 0
+ LL TC += Carry
+ HH TC += Carry
+ (-1DDSub)DD TC += Carry - DDSub
+ TC
= XY

The reason why, after the third addition, we must...


TC += Carry - DDSub

...is not uninteresting, and I am quite tempted to leave it as an exercise. And in fact I think I will:


Chapter 10, Exercise 1: Why is it necessary to subtract DDSub from TC ?


Observe that we can simplify the program we are about to write, by proving the lemma:


TC >= 0

The fact that TC is greater than or equal to zero by the time it gets rippled, trivially follows from the fact that:


MM = XLoYHi + XHiYLo

There is no way for the value of above expression to become negative, given that no subtraction happens in it, and that both of the multiplicands in both of the terms, are positive! From which it necessarily follows that, given as...


MM = LL + HH + (-1DDSub)DD

...it must then at all times be true that:


LL + HH >= (-1DDSub)DD

And therefore it will never become necessary to propagate a "borrow" when rippling TC into the tail, and we can use a strictly "additive" formulation in the pertinent code.

But what is the maximum possible value of TC ? It so happens that the answer is: two. And we can show... but wait. Why not leave this as an exercise ! So :


Chapter 10, Exercise 2: Can the value of TC ever become greater than 2 ? Give an example of a Karatsubaization where it does; or if it cannot, prove that it cannot.


And now let's put all of this together into a mechanism:

FZ_Mul.adb:

   procedure Mul_Karatsuba(X  : in  FZ;
                           Y  : in  FZ;
                           XY : out FZ) is
 
      -- L is the wordness of a multiplicand. Guaranteed to be a power of two.
      L : constant Word_Count := X'Length;
 
      -- An 'LSeg' is the same length as either multiplicand.
      subtype LSeg is FZ(1 .. L);
 
      -- K is HALF of the length of a multiplicand.
      K : constant Word_Index := L / 2;
 
      -- A 'KSeg' is the same length as HALF of a multiplicand.
      subtype KSeg is FZ(1 .. K);
 
      -- The three L-sized variables of the product equation, i.e.:
      -- XY = LL + 2^(K*Bitness)(LL + HH + (-1^DD_Sub)*DD) + 2^(2*K*Bitness)HH
      LL, DD, HH : LSeg;
 
      -- K-sized terms of Dx * Dy = DD
      Dx, Dy     : KSeg;  -- Dx = abs(XLo - XHi) , Dy = abs(YLo - YHi)
 
      -- Subtraction borrows, signs of (XL - XH) and (YL - YH),
      Cx, Cy     : WBool; -- so that we can calculate (-1^DD_Sub)
 
      -- Bottom and Top K-sized halves of the multiplicand X.
      XLo        : KSeg  renames    X(  X'First       ..   X'Last - K );
      XHi        : KSeg  renames    X(  X'First + K   ..   X'Last     );
 
      -- Bottom and Top K-sized halves of the multiplicand Y.
      YLo        : KSeg  renames    Y(  Y'First       ..   Y'Last - K );
      YHi        : KSeg  renames    Y(  Y'First + K   ..   Y'Last     );
 
      -- L-sized middle segment of the product XY (+/- K from the midpoint).
      XYMid      : LSeg  renames   XY( XY'First + K   ..  XY'Last - K );
 
      -- Bottom and Top L-sized halves of the product XY.
      XYLo       : LSeg  renames   XY( XY'First       ..  XY'Last - L );
      XYHi       : LSeg  renames   XY( XY'First + L   ..  XY'Last     );
 
      -- Topmost K-sized quarter segment of the product XY, or 'tail'
      XYHiHi     : KSeg  renames XYHi( XYHi'First + K .. XYHi'Last    );
 
      -- Whether the DD term is being subtracted.
      DD_Sub     : WBool;
 
      -- Carry from individual term additions.
      C          : WBool;
 
      -- Tail-Carry accumulator, for the final ripple
      TC         : Word;
 
   begin
 
      -- Recurse: LL := XL * YL
      FZ_Multiply(XLo, YLo, LL);
 
      -- Recurse: HH := XH * YH
      FZ_Multiply(XHi, YHi, HH);
 
      -- Dx := |XL - XH| , Cx := Borrow (i.e. 1 iff XL < XH)
      FZ_Sub_Abs(X => XLo, Y => XHi, Difference => Dx, Underflow => Cx);
 
      -- Dy := |YL - YH| , Cy := Borrow (i.e. 1 iff YL < YH)
      FZ_Sub_Abs(X => YLo, Y => YHi, Difference => Dy, Underflow => Cy);
 
      -- Recurse: DD := Dx * Dy
      FZ_Multiply(Dx, Dy, DD);
 
      -- Whether (XL - XH)(YL - YH) is positive, and so DD must be subtracted:
      DD_Sub := 1 - (Cx xor Cy);
 
      -- XY := LL + 2^(2 * K * Bitness) * HH
      XYLo := LL;
      XYHi := HH;
 
      -- XY += 2^(K * Bitness) * HH, but carry goes in Tail Carry accum.
      FZ_Add_D(X => XYMid, Y => HH, Overflow => TC);
 
      -- XY += 2^(K * Bitness) * LL, ...
      FZ_Add_D(X => XYMid, Y => LL, Overflow => C);
 
      -- ... but the carry goes into the Tail Carry accumulator.
      TC := TC + C;
 
      -- XY += 2^(K * Bitness) * (-1^DD_Sub) * DD
      FZ_Not_Cond_D(N => DD, Cond => DD_Sub); -- invert DD if 2s-complementing
      FZ_Add_D(OF_In    => DD_Sub,            -- ... and then must increment
               X        => XYMid,
               Y        => DD,
               Overflow => C); -- carry will go in Tail Carry accumulator
 
      -- Compute the final Tail Carry for the ripple
      TC := TC + C - DD_Sub;
 
      -- Barring a cosmic ray, 0 < = TC <= 2 .
      pragma Assert(TC <= 2);
 
      -- Ripple the Tail Carry into the tail.
      FZ_Add_D_W(X => XYHiHi, W => TC, Overflow => C);
 
      -- Barring a cosmic ray, the tail ripple will NOT overflow.
      pragma Assert(C = 0);
 
   end Mul_Karatsuba;

But... Does this routine work? How fast does it run? What do the new procedures like FZ_Not_Cond_D do? Where and when do we recurse ? Why did Comba's Multiplier have to change at all !? And, most importantly, is the proof "proofy" enough ?? Find out in Chapter 11!


~To be continued!~

This entry was written by Stanislav , posted on Tuesday February 13 2018 , filed under Ada, Bitcoin, Cold Air, Computation, Cryptography, FFA, Friends, Mathematics, ShouldersGiants, SoftwareArchaeology, SoftwareSucks . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

2 Responses to ““Finite Field Arithmetic.” Chapter 10: Introducing Karatsuba's Multiplication.”

  • PeterL says:

    I was reading through the vpatch and I see that you have (http://btcbase.org/patches/ffa_ch10_karatsuba.kv#L31) two procedures, FZ_Add_D and FZ_Add_D_W for adding a FZ or a Word into a FZ. I was then wondering why you didn't use overloading and have them both called FZ_Add_D? Wouldn't it be easier on the user to have just one procedure call for adding "something" into the FZ?

    My initial thought was that you are going to answer by saying "that would make it more confusing to read the code" or maybe "overloading is evil", but I was not sure?

    • Stanislav says:

      Dear PeterL,

      Re: FZ_Add_D and FZ_Add_D_W -- overloading is IMHO evil. Unlike actions ought to have reasonably-distinguishable names.

      BTW, Phf's patches WWW seems to be down just now.

      Yours,
      -S

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" highlight="">


MANDATORY: Please prove that you are human:

79 xor 114 = ?

What is the serial baud rate of the FG device ?


Answer the riddle correctly before clicking "Submit", or comment will NOT appear! Not in moderation queue, NOWHERE!