“Finite Field Arithmetic.” Chapter 11: Tuning and Unified API.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch11_tuning_and_api.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc

But do not run it quite yet.

First things first:

During the FFA sabbatical following Chapter 10, several of my readers did some very notable supporting work:

  • Reader apeloyee observed that the modular exponentiation operation is a poor benchmark for multiplication speed, on account of the fact that the current algorithm spends the vast majority of its CPU time inside the Knuth division routine.
  • Reader spyked tried his hand at producing a proof of correctness of the mechanisms in Chapter 5.
  • Reader ave1 carefully analyzed the executables generated by AdaCore’s x86-64 GNAT, and showed that this compiler is prone to ignore Inline pragmas unless they are specified in a particular way.
  • ave1 also produced a fully-self-building, retargetable, fully-static, and glibc-free GNAT. On top of this, at the time of this writing, he is also working on a entirely delibraryized GNAT, which will produce compact and deterministic — i.e. smoothly hand-auditable executables, and will eventually allow FFA to run on embedded devices (including even such systems as classical x86 DOS.)

As originally planned, this Chapter was to concern itself with the detailed mechanics of Karatsuba’s Multiplication. However, this discussion will be postponed until Chapter 12. Chapter 11 will instead discuss the result of incorporating the efforts of the alert readership into FFA; as well as a handful of minor/cosmetic improvements.

The first order of business is ave1’s discovery concerning GNAT’s Inline_Always pragma. It turns out that AdaCore’s compiler will selectively ignore this directive unless it is given inside an .ads declaration file containing the declaration of the routine that is to be inlined. (Formerly these were placed in the .adb routine bodies.)

An interesting side-effect of making this seemingly-trivial fix, was the discovery of the (somewhat obvious in retrospect) fact that forced inlining is incompatible with the use of Ada precondition statements. A particular routine can be inlined, or subject to runtime preconditions, but not both.

I then set about to determine which routines in FFA benefit from forced inlining. However, I found it impermissible to present to the end-user routines bereft of runtime precondition enforcement. Therefore I found it necessary to expose all end-user functionality strictly via a purpose-written API, where preconditions remain enforced whether or not the implementation makes use of inlining. This API appears below:


with Words;   use Words;
with FZ_Type; use FZ_Type;
with W_Pred;
with FZ_Lim;
with FZ_Basic;
with FZ_IO;
with FZ_Cmp;
with FZ_Pred;
with FZ_BitOp;
with FZ_Divis;
with FZ_ModEx;
-- FFA Exports
package FFA is
   pragma Pure;
   --- Fundamental Types and Sizes
   subtype Word       is Words.Word;
   subtype WBool      is Words.WBool;
   subtype Nibble     is Words.Nibble;
   subtype FZ         is FZ_Type.FZ;
   subtype Indices    is FZ_Type.Indices;
   subtype Char_Count is FZ_IO.Char_Count;
   Bitness  : Positive renames Words.Bitness;
   --- Word Predicates
   -- Return 1 if N is equal to 0; otherwise return 0.
   function FFA_Word_ZeroP(N : in Word) return WBool
     renames W_Pred.W_ZeroP;
   -- Return 1 if N is unequal to 0; otherwise return 0.
   function FFA_Word_NZeroP(N : in Word) return WBool
     renames W_Pred.W_NZeroP;
   -- Return WBool-complement of N.
   function FFA_Word_Not(N : in WBool) return WBool
     renames W_Pred.W_Not;
   -- Return 1 if N is odd; otherwise return 0.
   function FFA_Word_OddP(N : in Word) return WBool
     renames W_Pred.W_OddP;
   -- Return 1 if A is equal to B ; otherwise return 0.
   function FFA_Word_EqP(A : in Word; B : in Word) return WBool
     renames W_Pred.W_EqP;
   --- FZ Limits
   FFA_Validity_Rule_Doc : String renames FZ_Lim.FZ_Validity_Rule_Doc;
   -- Determine if a proposed FFA Bitness is valid.
   function FFA_FZ_Valid_Bitness_P(B : in Positive) return Boolean
     renames FZ_Lim.FZ_Valid_Bitness_P;
   --- FZ Basics
   -- Determine the Bitness of N
   function FFA_FZ_Bitness(N : in FZ) return Bit_Count
     renames FZ_Basic.FZ_Bitness;
   -- N := 0
   procedure FFA_FZ_Clear(N : out FZ)
     renames FZ_Basic.FZ_Clear;
   -- Set given FZ to a given truth value
   procedure FFA_WBool_To_FZ(V : in WBool; N : out FZ)
     renames FZ_Basic.WBool_To_FZ;
   -- First Word of N := Source
   procedure FFA_FZ_Set_Head(N : out FZ; Source : in Word)
     renames FZ_Basic.FZ_Set_Head;
   -- First Word of N
   function FFA_FZ_Get_Head(N : in FZ) return Word
     renames FZ_Basic.FZ_Get_Head;
   -- Exchange X and Y
   procedure FFA_FZ_Swap(X : in out FZ; Y : in out FZ)
     with Pre => X'Length = Y'Length;
   -- Constant-time MUX: Sel = 0: Result := X; Sel = 1: Result := Y
   procedure FFA_FZ_Mux(X : in FZ; Y : in FZ; Result : out FZ; Sel : in WBool)
     with Pre => X'Length = Y'Length and X'Length = Result'Length;
   --- FZ IO Operations
   -- Expand FZ N by nibble D, and determine whether this operation overflowed
   procedure FFA_FZ_Insert_Bottom_Nibble(N        : in out FZ;
                                         D        : in     Nibble;
                                         Overflow : out    WBool)
     renames FZ_IO.FZ_Insert_Bottom_Nibble;
   -- Determine the number of ASCII characters required to represent N
   function FFA_FZ_ASCII_Length(N : in FZ) return Char_Count
     renames FZ_IO.FZ_ASCII_Length;
   -- Write an ASCII hex representation of N into existing string buffer S
   procedure FFA_FZ_To_Hex_String(N : in FZ; S : out String)
     renames FZ_IO.FZ_To_Hex_String;
   --- Comparison Predicate Operations on FZ
   -- 1 iff X == Y (branch-free); else 0
   function FFA_FZ_EqP(X : in FZ; Y: in FZ) return WBool
     renames FZ_Cmp.FZ_EqP;
   -- 1 iff X < Y (branch-free); else 0
   function FFA_FZ_LessThanP(X : in FZ; Y : in FZ) return WBool
     renames FZ_Cmp.FZ_LessThanP;
   -- 1 iff X > Y (branch-free); else 0
   function FFA_FZ_GreaterThanP(X : in FZ; Y : in FZ) return WBool
     renames FZ_Cmp.FZ_GreaterThanP;
   --- Fundamental Predicate Operations on FZ
   -- 1 iff N == 0 (branch-free); else 0
   function FFA_FZ_ZeroP(N : in FZ) return WBool
     renames FZ_Pred.FZ_ZeroP;
   -- 1 iff N != 0 (branch-free); else 0
   function FFA_FZ_NZeroP(N : in FZ) return WBool
     renames FZ_Pred.FZ_NZeroP;
   -- 1 iff N is odd
   function FFA_FZ_OddP(N : in FZ) return WBool
     renames FZ_Pred.FZ_OddP;
   --- Bitwise Operations on FZ
   -- Result := X & Y
   procedure FFA_FZ_And(X : in FZ; Y : in FZ; Result : out FZ)
     with Pre => X'Length = Y'Length and X'Length = Result'Length;
   -- N := N & W, W is a Word
   procedure FFA_FZ_And_W(N : in out FZ; W : in Word)
     renames FZ_BitOp.FZ_And_W;
   -- Result := X | Y
   procedure FFA_FZ_Or(X : in FZ; Y : in FZ; Result : out FZ)
     with Pre => X'Length = Y'Length and X'Length = Result'Length;
   -- N := N | W, W is a Word
   procedure FFA_FZ_Or_W(N : in out FZ; W : in Word)
     renames FZ_BitOp.FZ_Or_W;
   -- Result := X ^ Y
   procedure FFA_FZ_Xor(X : in FZ; Y : in FZ; Result : out FZ)
     with Pre => X'Length = Y'Length and X'Length = Result'Length;
   -- N := N ^ W, W is a Word
   procedure FFA_FZ_Xor_W(N : in out FZ; W : in Word)
     renames FZ_BitOp.FZ_Xor_W;
   -- NotN := ~N ('ones complement')
   procedure FFA_FZ_Not(N : in FZ; NotN : out FZ)
     with Pre => N'Length = NotN'Length;
   --- Basic Arithmetic on FZ
   -- Sum := X + Y; Overflow := Carry
   procedure FFA_FZ_Add(X          : in  FZ;
                        Y          : in  FZ;
                        Sum        : out FZ;
                        Overflow   : out WBool)
     with Pre => X'Length = Y'Length and X'Length = Sum'Length;
   -- Difference := X - Y; Underflow := Borrow
   procedure FFA_FZ_Subtract(X          : in  FZ;
                             Y          : in  FZ;
                             Difference : out FZ;
                             Underflow  : out WBool)
     with Pre => X'Length = Y'Length and X'Length = Difference'Length;
   --- Division on FZ
   -- Dividend is divided by Divisor, producing Quotient and Remainder.
   -- WARNING: NO div0 test here! Caller must test.
   procedure FFA_FZ_IDiv(Dividend  : in  FZ;
                         Divisor   : in  FZ;
                         Quotient  : out FZ;
                         Remainder : out FZ)
     renames FZ_Divis.FZ_IDiv;
   -- Exactly same thing as IDiv, but keep only the Quotient
   procedure FFA_FZ_Div(Dividend  : in FZ;
                        Divisor   : in FZ;
                        Quotient  : out FZ)
     renames FZ_Divis.FZ_Div;
   -- Modulus.
   procedure FFA_FZ_Mod(Dividend  : in FZ;
                        Divisor   : in FZ;
                        Remainder : out FZ)
     renames FZ_Divis.FZ_Mod;
   --- Multiplication on FZ
   -- Multiplier. Preserves the inputs.
   procedure FFA_FZ_Multiply(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ)
     with Pre => X'Length = Y'Length and
     XY_Lo'Length = XY_Hi'Length and
     XY_Lo'Length = ((X'Length + Y'Length) / 2);
   --- Modular Operations on FZ
   -- Modular Multiply: Product := X*Y mod Modulus
   procedure FFA_FZ_Modular_Multiply(X        : in  FZ;
                                     Y        : in  FZ;
                                     Modulus  : in  FZ;
                                     Product  : out FZ)
     renames FZ_ModEx.FZ_Mod_Mul;
   -- Modular Exponent: Result := Base^Exponent mod Modulus
   procedure FFA_FZ_Modular_Exponentiate(Base     : in  FZ;
                                         Exponent : in  FZ;
                                         Modulus  : in  FZ;
                                         Result   : out FZ)
     renames FZ_ModEx.FZ_Mod_Exp;
end FFA;


with FZ_Arith;
with FZ_Shift;
with FZ_Mul;
-- Wrapper bodies for routines that we inline, but must enforce preconditions
-- on when called by FFA user.
package body FFA is
   --- FZ Basics
   -- Exchange X and Y
   procedure FFA_FZ_Swap(X : in out FZ; Y : in out FZ) is
      FZ_Basic.FZ_Swap(X => X, Y => Y);
   end FFA_FZ_Swap;
   -- Constant-time MUX: Sel = 0: Result := X; Sel = 1: Result := Y
   procedure FFA_FZ_Mux(X : in FZ; Y : in FZ;
                        Result : out FZ; Sel : in WBool) is
      FZ_Basic.FZ_Mux(X => X, Y => Y, Result => Result, Sel => Sel);
   end FFA_FZ_Mux;
   --- Bitwise Operations on FZ
   -- Result := X & Y
   procedure FFA_FZ_And(X : in FZ; Y : in FZ; Result : out FZ) is
      FZ_BitOp.FZ_And(X => X, Y => Y, Result => Result);
   end FFA_FZ_And;
   -- Result := X | Y
   procedure FFA_FZ_Or(X : in FZ; Y : in FZ; Result : out FZ) is
      FZ_BitOp.FZ_Or(X => X, Y => Y, Result => Result);
   end FFA_FZ_Or;
   -- Result := X ^ Y
   procedure FFA_FZ_Xor(X : in FZ; Y : in FZ; Result : out FZ) is
      FZ_BitOp.FZ_Xor(X => X, Y => Y, Result => Result);
   end FFA_FZ_Xor;
   -- NotN := ~N ('ones complement')
   procedure FFA_FZ_Not(N : in  FZ; NotN : out FZ) is
      FZ_BitOp.FZ_Not(N => N, NotN => NotN);
   end FFA_FZ_Not;
   --- Arithmetic on FZ
   -- Sum := X + Y; Overflow := Carry
   procedure FFA_FZ_Add(X          : in  FZ;
                        Y          : in  FZ;
                        Sum        : out FZ;
                        Overflow   : out WBool) is
      FZ_Arith.FZ_Add(X => X, Y => Y, Sum => Sum, Overflow => Overflow);
   end FFA_FZ_Add;
   -- Difference := X - Y; Underflow := Borrow
   procedure FFA_FZ_Subtract(X          : in  FZ;
                             Y          : in  FZ;
                             Difference : out FZ;
                             Underflow  : out WBool) is
      FZ_Arith.FZ_Sub(X => X, Y => Y, Difference => Difference,
                      Underflow => Underflow);
   end FFA_FZ_Subtract;
   --- Multiplication on FZ
   procedure FFA_FZ_Multiply(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ) is
      FZ_Mul.FZ_Multiply_Buffered(X     => X,     Y     => Y,
                                  XY_Lo => XY_Lo, XY_Hi => XY_Hi);
   end FFA_FZ_Multiply;
end FFA;

FFA_Calc has been reworked to make use of the strictly-specified API routines, i.e.:

-- FFA
with FFA;      use FFA;
-- For the intrinsic equality operator on Words
use type FFA.Word;

… in place of the old:

-- FFA
with FZ_Lim;   use FZ_Lim;
with Words;    use Words;
with W_Pred;   use W_Pred;
with FZ_Type;  use FZ_Type;
with FZ_Basic; use FZ_Basic;
with FZ_Arith; use FZ_Arith;
with FZ_Cmp;   use FZ_Cmp;
with FZ_Pred;  use FZ_Pred;
with FZ_BitOp; use FZ_BitOp;
with FZ_Shift; use FZ_Shift;
with FZ_Divis; use FZ_Divis;
with FZ_Mul;   use FZ_Mul;
with FZ_ModEx; use FZ_ModEx;

… and elsewhere, as necessary. (All of the exported routines have been renamed, to carry a prefix of FFA_ .)

One other change to FFA_Calc is the abolition of ffa_io.ads and ffa_io.adb: their functionality has been subsumed into FFA proper, in the shape of an FZ_IO module:


with Words;    use Words;
with FZ_Type;  use FZ_Type;
with FZ_Lim;   use FZ_Lim;
package FZ_IO is
pragma Pure;
-- Expand FZ N by nibble D, and determine whether this operation overflowed
procedure FZ_Insert_Bottom_Nibble(N        : in out FZ;
D        : in     Nibble;
Overflow : out    WBool);
-- A count of ASCII chars representing a humanized FZ:
subtype Char_Count is
Positive range Nibbleness * FZ_Minimal_Wordness .. Positive'Last;
-- Determine the number of ASCII characters required to represent N
function FZ_ASCII_Length(N : in FZ) return Char_Count;
pragma Inline_Always(FZ_ASCII_Length);
-- Hex Digits (currently used only in FZ_To_Hex_String)
HexDigs : constant array(0 .. 15) of Character := "0123456789ABCDEF";
-- Write an ASCII hex representation of N into existing string buffer S
procedure FZ_To_Hex_String(N : in FZ; S : out String)
with Pre => S'Length = FZ_ASCII_Length(N);
end FZ_IO;


with W_Pred;   use W_Pred;
with W_Shifts; use W_Shifts;
with FZ_BitOp; use FZ_BitOp;
with FZ_Shift; use FZ_Shift;
package body FZ_IO is
   -- Expand FZ N by nibble D, and determine whether this operation overflowed
   procedure FZ_Insert_Bottom_Nibble(N        : in out FZ;
                                     D        : in     Nibble;
                                     Overflow : out    WBool) is
      -- The overflow, if any, from shifting N in-place leftward by 4 bits
      Shifted_N_Overflow : Word := 0;
      -- Make room in N for one additional hex digit (i.e. multiply N by 16)
      FZ_ShiftLeft_O(N        => N,
                     ShiftedN => N,
                     Count    => 4,
                     Overflow => Shifted_N_Overflow);
      -- Place the new digit into the now-vacated four bits at the bottom of N.
      FZ_Or_W(N, D);
      -- Record whether the above operation overflowed N:
      Overflow := W_NZeroP(Shifted_N_Overflow);
   end FZ_Insert_Bottom_Nibble;
   -- Determine the number of ASCII characters required to represent N
   function FZ_ASCII_Length(N : in FZ) return Char_Count is
      return N'Length * Nibbleness;
   end FZ_ASCII_Length;
   -- Write an ASCII hex representation of N into existing string buffer S
   procedure FZ_To_Hex_String(N : in FZ; S : out String) is
      -- Indices into the string S (note, String always indexes from 1)
      subtype SiRange is Natural range S'First .. S'Last;
      -- Position of current character in S being written
      Si : SiRange; -- Walks from 1 to the string length of S
      -- Step through all indices of N, regardless of how it was indexed:
      for i in 0 .. Word_Index(N'Length - 1) loop
            -- Index of current Word, walks from ~top~ Word of N to ~bottom~
            Wi : constant Word_Index := N'Last - i;
            -- Currently-selected Word of N
            W  : Word := N(Wi);
            -- For each nibble in the Word:
            for j in 1 .. Nibbleness loop
               -- Current position in S that is to be written
               Si    := (Natural(i) * Nibbleness) + j;
               -- Rotate the top nibble of W into the bottom nibble.
               W     := Rotate_Left(W, 4);
               -- Write the ASCII representation of the bottom nibble.
               S(Si) := HexDigs(Natural(W and 16#F#));
            end loop;
            -- Barring cosmic ray, W will have rotated to its initial value
            pragma Assert(W = N(Wi));
      end loop;
      -- Barring cosmic ray, the last char written was to the final pos in S,
      pragma Assert(Si = SiRange'Last); -- as S is mandatorily exactly-sized.
   end FZ_To_Hex_String;
end FZ_IO;

All invocations of FZ-to-console output in FFA_Calc now make use of the following subroutine:

      -- Emit an ASCII representation of N to the terminal
      procedure Print_FZ(N : in FZ) is
         S : String(1 .. FFA_FZ_ASCII_Length(N)); -- Mandatorily, exact size
         FFA_FZ_To_Hex_String(N, S); -- Convert N to ASCII hex
         Write_String(S);            -- Print the result to stdout
         Write_Newline;              -- Print newline, for clarity.
      end Print_FZ;

Observe that under no circumstances is a string of indeterminate length returned by a FFA subroutine: this would require the the use of the Ada secondary stack, which I have renounced as a Bad Idea universally. Instead, the I/O routine makes use of the preallocation technique illustrated in Chapter 4’s commandline param processor.

Last but not least, the project now includes a HISTORY.TXT chronology. Currently this consists of:


### HISTORY ###
"Chapter 1: Genesis."
"Chapter 2: Logical and Bitwise Operations."
"Chapter 3: Shifts."
"Chapter 4: Interlude: FFACalc."
"Chapter 5: "Egyptological" Multiplication and Division."
"Chapter 6: "Geological" RSA."
"Chapter 7: "Turbo Egyptians.""
"Chapter 8: Interlude: Randomism."
"Chapter 9: "Exodus from Egypt" with Comba’s Algorithm."
"Chapter 10: Introducing Karatsuba’s Multiplication."
"Chapter 11: Tuning and Unified API."
### YOU ARE HERE ###

The particular format of this document may change in the future, as the pertinent vtronics work is still in progress.

In Chapter 12, we will discuss in detail the substantial performance gains obtained from the use of Chapter 10’s Karatsuba’s Multiplication in modular exponentiation; and likewise from the implementation of correct inlining, described in the current Chapter. And the effect of ave1’s new GNAT on FFA perfomance will be revealed.

We will also review the exact mechanics of Karatsuba’s algorithm, and present a small optimization thereof. Additionally, a sane approach to the “Iron vs. Soft MUL” problem posed in Chapter 9 will be presented.

~To be continued!~

The Google H1 Fritz Chip.

Edit: Step-by-step replication instructions for the skeptical experimenter.

This article is a review of what I have been able to discover regarding the Google H1, aka Cr50, aka the “G Chip”, found in all Chromebooks of recent manufacture, including the Asus C101PA, my current candidate for a full delousing attempt.

To my knowledge, there has been no detailed public discussion of this NSA-imposed atrocity anywhere on the Net, aside from The Logs. This article is intended as a reference point for the aficionado, the casual explorer of pseudo-”open” hardware, and the merely curious. The Chromebooks are probably the closest thing to be had on the current market to an inexpensive, portable, and “cleanable” computer with reasonable performance.

However, the Cr50 — a recent addition to the product line — is explicitly designed to get in the way of a full “liberation”. It is an item quite similar, in purpose and scope, to Intel’s “glued on with broken glass, For Your Own Good!” on-die “ME” boobytrap.

Yet, unlike Intel, Google — in its fetishistic pseudo-openness — appears to have published at least a portion of the source for the device’s firmware, making it a somewhat more promising target for attack and demolition than Intel’s equivalent CPU-integrated turd. But we will dig deeper into this, further below. First, let’s review the “big picture”:

The Cr50 device is a classic “Fritz chip” — i.e. a hardware “policeman”, built into a computing device (typically, though not always, a consumer product), so as to specifically and deliberately act against the purchaser’s interests, by subverting the Laws of Sane Computing in these three ways:

  1. Prevention of the full control of the machine by its physical owner, typically by inhibiting attempts to install modified firmware. This practice is also known as “Tivoization”, and is often justified to the gullible under the banner of “security”.
  2. Enablement of one or more types of “NOBUS” back door (Official NSA terminology! “No One But US“, a kind of NATO Reich “Gott Mit Uns” motto.) In practice this means that the folks with the magic keys, can trivially circumvent any and all protection mechanisms on the box, locally and often remotely; while the uninitiated — including the person who bought and paid for the hardware — typically remain unaware of the backdoor’s very existence.) A Fritzed machine serves its true master — USG — first, and only secondarily serves the hapless purchaser.
  3. Last but not least: prevention of a clueful hardware owner’s attempts to “jailbreak” — to disable, remove, or circumvent the Fritz chip itself. Often there is an attempt to conceal the very existence of the mechanism. (Google is peculiar in that it is fond of deliberately, if subtly, taunting the purchaser of its pseudo-open devices with the existence of its Fritz chip.) Perpetrators will often deliberately litter the Net with disinformation regarding the nature and purpose of the Fritz chip, in an effort to discourage circumvention and spur on sales; the chumps will “buy their own cross” while there is still a semblance of choice on the market; afterwards, the choice evaporates, and only Fritz-laden products remain available. The latter process has already run its course in the x86 PC world; and is verging on completion in the low-power ARM portable market.

So, back to the Cr50: this device appears to be present in all of the currently-produced Chromebooks, and is — per the vendor’s published source — able to rewrite all firmware, under the control of an external debug snake, or other, yet-undiscovered triggers; start and stop the CPU; master the I2C bus, on which, among other things, are to be found the sound card’s microphone; upgrade its own firmware; and other interesting things that may or may not align with the machine owner’s wishes at a particular moment. Possible usage scenarios include, but are not limited to, enablement of “lol enforcement” surreptitious searches and buggings of “borrowed” machines (and this is merely one obvious scenario.)


In re “glue with broken glass”, the machine owner cannot simply remove or cut the traces to the Cr50: it has been placed in control of power supply bringup; the valuable debug interface; and other essentials.

But it is the upgrade process in particular that interests me, as it is the locked gate to potentially neutering the boobytrap. But can the end user rewrite the Cr50 firmware?

Let’s begin with the disinfo! A Google employee informed me that “nobody has the cr50 key”. Is this actually true?

How about No?

From the horse’s mouth:

static const uint32_t LOADERKEY_A[RSA_NUM_WORDS + 1] = {
0xea54076f, 0xe986c871, 0x8cffffb4, 0xd7c50bda, 0x30700ee0, 0xc023a878,
0x30e7fdf8, 0x5bb0c06f, 0x1d25d80f, 0x18e181f7, 0xfbf7a8b0, 0x331c16d4,
0xeb042379, 0x4cef13ec, 0x5b2072e7, 0xc807b01d, 0x443fb117, 0xd2e04e5b,
0xcb984393, 0x85d90d9d, 0x0332dcb8, 0xd42ccacf, 0x787e3947, 0x1975095c,
0x2d523b0b, 0xf815be95, 0x00db9a2c, 0x6c08442b, 0x57a022bb, 0x9d5c84ed,
0x46a6d275, 0x4392dcf8, 0xfa6812e3, 0xe0f3a3e6, 0xc8ff3f61, 0xd518dbac,
0xbba7376a, 0x767a219a, 0x9d153119, 0x980b16f8, 0x79eb5078, 0xb869924d,
0x2e392cc2, 0x76c04f32, 0xe35ea788, 0xcb67fa62, 0x30efec79, 0x36f04ae0,
0x2212a5fc, 0x51c41de8, 0x2b0b84db, 0x6803ca1c, 0x39a248fd, 0xa0c31ee2,
0xb1ca22b6, 0x16e54056, 0x086f6591, 0x3825208d, 0x079c157b, 0xe51c15a6,
0x0dd1c66f, 0x8267b8ae, 0xf06b4f85, 0xc68b27ab, 0x31bcd5fc, 0x34d563b7,
0xc4d2212e, 0x1e770199, 0xaf797061, 0x824d4853, 0x526e18cd, 0x4bb8a0dc,
0xeb9377fe, 0x04fda73c, 0x2933f8a6, 0xe16c0432, 0x40ea1bd5, 0x9efcd77e,
0x92be9e55, 0x003c1128, 0x48442cf9, 0x80b4fb31, 0xfe1e3df3, 0x1d28e14d,
0xe99c0f9d, 0x521d38c2, 0x0082c4f1, 0xcff25d56, 0x0d3e7186, 0xe72b98f0,
0xefaa5689, 0x74051ed5, 0x6b7e7fff, 0x822b1944, 0x77a94732, 0x8d0b9aaf,

const uint32_t LOADERKEY_B[RSA_NUM_WORDS + 1] = {
0xeea8b39f, 0xdfa457a1, 0x8b81fdc3, 0xb0204c84, 0x297b9db2, 0xaa70318d,
0x8cd41a68, 0x4aa0f9bb, 0xf63f9d69, 0xf0fe64b0, 0x96e42e2d, 0x5e494b1d,
0x066cefd0, 0xde949c16, 0xc92499ed, 0x92229990, 0x48ac3b1a, 0x1dfc2388,
0xda71d258, 0x826ddedf, 0xd0220e70, 0x6140dedf, 0x92bcdec7, 0xcdf91c22,
0xaa110aed, 0xc371c2f9, 0xa3fedf2a, 0xfd2c6a07, 0xe71aabce, 0x7f426484,
0x0ac51128, 0x4bab4ca2, 0x0162d0b9, 0x49fef7e3, 0xeda8664e, 0x14b92b7a,
0x0397dbb7, 0x5b9eb94a, 0x069b5059, 0x3851f46b, 0x45bbcaba, 0x0b812652,
0x7cd8b10b, 0xcaeccc32, 0x0ffd08e3, 0xfe6f0306, 0x8c02d5f7, 0xafdc4595,
0xe0edda47, 0x0cc821db, 0x50beeae5, 0xb9868c18, 0xefd2de11, 0xdfecd15c,
0xa8937a70, 0x223d9d95, 0x1b70848b, 0x54fa9176, 0x8bf012ef, 0xd37c1446,
0xf9a7ebeb, 0xbf2dfa9a, 0xdc6b8ea0, 0xe5f8bc4d, 0x539222b5, 0x192521e4,
0xe7088628, 0x2646bb56, 0x6fcc5d70, 0x3f1cd8e9, 0xae9cec24, 0xf53b6559,
0x6f091891, 0x5342fa61, 0xbfee50e9, 0x211ad58a, 0xd1c5aa17, 0x252dfa56,
0x17131164, 0x4630a459, 0x2f681f51, 0x3fb9ab3c, 0x6c8e0a70, 0xa34a868b,
0xe960e702, 0xa470d241, 0x00647369, 0xa4c25391, 0xd1926cf9, 0x5fce5488,
0xd171cb2e, 0x8a7c982e, 0xc89cbe39, 0xc0e019d8, 0x82cd1ebe, 0x68918fce,

Anyone with the private factors to either RSA modulus, can reflash the Cr50 firmware trivially, via the debug cable. The vendor’s flash update utility accepts any candidate update that passes the board revision and version increment check; however, the update will be written to a temporary buffer, and RSA-signature-tested prior to being copied into the “read only” (i.e. active) partition of the flash. Got the key? reflash to your heart’s delight. No key? no update. Just like in other “Tivos”, e.g., the Apple iPhone, but in this case with an extra helping of Open Sores artificial flavouring!

But this is not even the only backdoor: there are at least two. The second one known to me thus far is the “RMA unlocker”. Anyone with access to a certain elliptic key can reset the Cr50 into a manufacturing test mode, and do whatever he likes.

Google even seems to offer an accidentally-”public” API for requesting this type of reset. Let’s try it and see what happens:

$ python cr50_rma_open.py -g -i "BOB E25-A6A-A7I-E9Q-A4R"
Running cr50_rma_open version 2
SERIALNAME: 02034029-90EBD060
DEVID: 0x02034029 0x90ebd060
testing /dev/ttyUSB0
Reset flags: 0x00000800 (hard)
Reset count: 0
Chip: g cr50 B2-C
RO keyid: 0xaa66150f(prod)
RW keyid: 0xde88588d(prod)
DEV_ID: 0x02034029 0x90ebd060
Rollback: 2/2/2

found device /dev/ttyUSB0
DEVICE: /dev/ttyUSB0
RMA support added in: 0.3.3
Running Cr50 Version: 0.3.4
Cr50 setup ok
ccd: Restricted
wp: enabled
rma_auth output:

If the server fails to debug the challenge make sure the RLZ is

Sadly, the result of loading this URL was… a GMail login prompt. So I log in with a GMail account, and get:

Failed to start authentication.

Quod licet Iovi, non licet bovi! or what exactly did you expect?

And so, dear reader, if you know how to disable this landmine — or are merely interested in advancing the state of the art in vermin removal — join us on #trilema! (Ask one of the speaking folks, for voice.)

To be continued.

The secret of the “Debug Accessory Mode” Adapter.

The exact internals of Google’s proprietary “Suzy-Q” debugging device are, at the time of this writing, unknown.

However, I have found how to make an apparently-compatible device:


We connect the USB-C “business end” into a Asus C101PA machine; the USB-B end into a reasonable Linux PC, where we then:

echo 18d1 5014 > /sys/bus/usb-serial/drivers/generic/new_id

…and /dev/ttyUSB0 … 5 , the UARTs of the RK3399 chip, appear.

Theoretically, there are also Google-particular “vendor” endpoints. But we will look at these later.

Example spew on boot.

The unfortunate bit is that the output is, evidently, molested between leaving the RK3399 and emerging from the USB-C debug controller, by the machine’s embedded controller Cr50 chip: observe, the typical reset output of Rockchip (e.g., DDR init info) is not seen in the spew.

Therefore the “debug accessory” cable can be used for kernel debugging, but not for bootloader debugging. Unless we diddle the EC controller firmware, to force it to relay UART output immediately from power-on.

Edit: Read here re: the USB endpoints.

(to be continued…)

Posted in: Cold Air, Hardware, NonLoper, SoftwareSucks by Stanislav No Comments

Open Problem: “Debug Accessory Mode” on the Asus C101PA

Edit #2: Aaaand it’s solved:

echo 18d1 5014 > /sys/bus/usb-serial/drivers/generic/new_id

triggers creation of /dev/ttyUSB0 … 5 , several of which spew console log…

Example spew on boot. (Looks like RK’s UART..?)

Edit: apparently they’re USB lines! When connected as D-/D+ through a USB B-connector, to a Linux box, we get a device that enumerates with this descriptor.

The Asus C101PA is supposedly able to bring out its UART via a specially-made USB-C cable.

Google’s WWW offers a “WinAmp playlist”-style taunting mention of this magical cable, but no schematics or even any info whatsoever regarding pinout.

A few folks on the Net hint at the possible use in this device of something called “debug accessory mode”, a feature of the USB-C standard.

So, looking at the standard:



It would appear that we have, at least, the “strapping” with which to trigger the desired “debug accessory mode.”

And so, tying contacts “CC” and “VCON” (i.e. CC1 and CC2, per the standard, but go and figure this out…) to ground via 5.1K resistors:


Theoretically a UART should appear on SBU1 and SBU2 “side band” contacts given by the USB-C expander board.

And we plug this barbaric device into the C101PA (FWIW — in “dev mode”, though this should not make any difference whatsoever if the UART is indeed connected to the RK3399 CPU directly.)

But in actual practice — no dice. Logic analyzer shows a burst of something vaguely UART-like on SBU1, but only when the plug is inserted (and not, e.g., on boot, which is where one actually needs the debug UART!) And this vaguely UART-like signal, is not Rockchip’s 1500000-baud TTY (logic analyzer shows missing stop bits regardless of what baud rate is picked.) And throwing bytes into /dev/ttyS2 (Rockchip UART under the sad Chromebook linux) has no effect on SBU1/SBU2.

Nothing interesting happens on reset, either.

Dear reader, do you know the secret of Google’s “Suzy-Q” (aka “Debug Accessory Mode”) magic cable? If you do — tell me!
(it’s a USB UART! see top of page.)

Open Problem: Forcing MaskROM Mode on the Asus C101PA

The Asus C101PA is based on a Rockchip RK3399. These have a “maskrom mode”, where if the SPI EEPROM is disabled, the chip will attempt to boot from other devices: first, NAND flash, then microSD, and then finally a USB debug mode where you can attach a A-A cable and use the rkflashtool utility to manipulate the firmware.

So, quickly going from theory to practice, let’s make a SPI ROM cutoff button:

The SPI EEPROM, labeled with pinout.

Let’s make a toggle:

#CS is an active-low chip select. Button will short it to the chip’s supply rail when pressed.


The button, again.

Now, according to the theory, booting up the box with the toggle held down, will send the RK3399 to look for the boot signature on the eMMC NAND; given as it will not find it there, it will then search the SD card, if inserted, and go into USB debug mode if this fails.

However, turns out: “мазать уже можно, есть пока нельзя”: the box indeed boots with black screen and running light active, if booted with the switch held down. But — it ignores a valid firmware image (copied from SPI ROM) on the SD card, and none of the 3 USB ports (either of the two USB-C ports, or the conventional USB3 A-connector, when A-A cable with PC is used) switches into debug mode. The box can be reset with the magic two-finger salute (powerswitch plus “refresh” key) and reboots normally, if switch is released. But nothing useful happened, evidently, from disabling the SPI EEPROM…

So, what gives? If you, dear reader, know — please write in!

Edit: somebody seems to know something

Wanted: Write-Once MicroSD Card !

Allegedly these exist! — though I have only been able to find them offered for sale by the railroad car.

For certain applications, nothing else will really suffice.

If any of my readers know of (or wish to become) a vendor offering, in (for starters) mid-three-digit quantities:

  • a) One Time Programmable MicroSD card
  • b) MicroSD card with a true hardware write-protect feature (specifically not the idiot floppy-style software-read tab

…please write in!

Open Problem: Delousing the Asus C101PA.

Apr. 26 update: This article is obsolete, the pill — was found; if you have this machine, scroll to the end.

The Asus C101PA Chromebook is a very interesting device: it contains a Rockchip CPU, for which we already have a working deloused Gentoo; it also contains such marvels as a non-blobulous Marvell 802.11 card, a reasonable IPS screen, a 9-hour battery, etc.

However it also contains Google’s Fritz chip boobytrap, the firmware write-protect.

And no, it does not have the screw in the same place as the visually-identical rubbish-LCD-crippled C100P. Or, apparently, anywhere at all:


Looks like the typical Chromebook defusing screw, neh?

Well, removal does precisely nothing:


Not even after hard-reset.

And again nothing:


Let’s take the thing apart entirely, strip off the speakers, pull the mainboard, peel off the heat sink:


None of the screw pads (aside from the bottom of the first “nope”) even have anything resembling multiple contact pads that could indicate a defusing screw.

No place to put crocodiles and rewrite the flash ROM, either! All BGA.

And the complete silence on, as far as I can tell, the entire Net, re: this beast, is IMHO most peculiar.

And yes I am aware of “crouton” and various other chumpatronic rubbish offered as an alternative to flensing the Google liquishit from this device (which appears to be the only laptop on the market with 9 hour battery, IPS, and a total absence of x86 crapolade).

And yes I am well aware that it is still possible to install a new OS on this box, if one is willing to tolerate the nagware screen and nag siren on boot. Which I am not.

Verdict: the C101P is a paperweight until and unless somebody reveals the secret of removing WP# from it. Do you know how? Tell me!

Update: A query in the heathen pits yielded up the pill: pulling the battery disables the WP#.

A Clean Gentoo for ‘Rockchip’.

This recipe will re-create the hygienic gentoo installed on the RK pilot plant
at Pizarro. The resulting install will also contain the full set of
distfiles used in the rebuilding of the 'world', in /usr/portage/distfiles.

No promises are made of fitness for a particular use, or of provenance,
OTHER THAN as described above.


0 ) My PGP key

1 ) A rk3328-roc-cc machine

2 ) microSD card of at least 256MB capacity

3 ) USB3 drive of at least 8GB capacity

4 ) The prepared gentoo root fs: rk-gentoo.tar.gz

^ This is a CONVENTIONAL (i.e. not musltronic ) gentoo for arm64 architecture.

Only the SD card will contain machine-specific bootloaderisms:

5 ) The bootloaderisms: loaders.tar.gz

6 ) OPTIONAL: vendor-patched kernel source and .config:


7 ) Verify the signature and hashes of above in MANIFEST.TXT !!

SD Card for Boot

The rk3328-roc-cc can boot from either eMMC (not discussed, I do not
use it ) or microSD card (we will use the latter. ) To prepare a u-SD
card for booting, obtain a card of at least 256MB capacity, and place
into a suitable reader. The instructions below assume that this device
shows up on your system as /dev/sdb, and is writeable to by the user
as which you are working. Adjust as necessary.

WARNING: if you are careless, you WILL nuke YOUR workstation's disk!!

First, create the partitions:

1 ) parted -s /dev/sdb mklabel gpt

2 ) parted -s -a optimal /dev/sdb unit s mkpart l1 64 8063

3 ) parted -s -a optimal /dev/sdb unit s mkpart l2 16384 24575

4 ) parted -s -a optimal /dev/sdb unit s mkpart t 24576 32767

5 ) parted -s -a optimal /dev/sdb unit s mkpart boot fat16 32768 262143

6 ) parted -s -a optimal /dev/sdb set 4 boot on

7 ) sync

Now extract the loader images:

8 ) tar xvfz loaders.tar.gz

9 ) dd if=l1.bin of=/dev/sdb1

^ this is the vendor's RAM initializer. Recent uboot obsoletes it,
but I have not tested this yet. For now we use the original.

10 ) dd if=l2.bin of=/dev/sdb2

^ this is more or less ordinary u-boot, Rockchip's www has the

11 ) dd if=t.bin of=/dev/sdb3

^ this is the ARM hypervisor turd; ARM's www has the source.
AFAIK this thing does not actually do anything without OS support.

12 ) dd if=kern.bin of=/dev/sdb4

^ this is a FAT16 partition containing my kernel ( see further
below. ) It will get mounted as /boot later .

13 ) sync

14 ) Disconnect the SD reader from your workstation !

USB3 Stick for '/' Partition

Obtain a suitable USB3 stick. I recommend Samsung's.

It is also possible to use a mechanical (or SSD) SATA drive,
via a suitable adapter cable.

Do not use USB2 disks, gentooation will be painfully slow on these.

The instructions again assume that it is seen by the machine as
/dev/sdb. If this is not the case on your system, adjust the recipe.

WARNING: if you are careless, you WILL nuke YOUR workstation's disk!!

1 ) parted -s -a optimal /dev/sdb mklabel gpt -- mkpart primary ext4 1 -1

2 ) mkfs.ext4 /dev/sdb1

3 ) sync

4 ) mount /dev/sdb1 /mnt/usb

5 ) tar pxvfz rk-gentoo.tar.gz -C /mnt/usb

6 ) edit, with your favourite text editor, /mnt/usb/conf.d/net :

comment out, if required, the line with dhcp, and uncomment and
properly adjust the ones pertaining to static ip; or leave this
config alone for a dhcp on boot.

7 ) put your hostname in /mnt/usb/conf.d/hostname

8 ) sync

9 ) umount /mnt/usb

10 ) Insert the SD card and USB3 drive into the RK machine;
connect the network cable, but see below first:


%%% the root password is null! %%%
%%% the ssh host privkey is unsecret! %%%

To fix these, boot up on an isolated LAN first, and, via ssh,

11-1 ) set the root password using the passwd command.

11-2 ) generate a NEW SSH HOSTKEY :

ssh-keygen -t rsa -b 4096 -f /etc/ssh/ssh_host_rsa_key -N ""

12 ) The kit is now ready for use.

Be aware that my kernel does not support video, mice, or keyboards,
it is meant purely for headless machines. If this is not what you
want, unpack asciilifeform-kernel.tar.gz , edit the config, and
build a new kernel; then install it into /boot (i.e. your SD card.)

The Return of Phuctor!

I have the pleasure of informing my readers that…

Phuctor is back!

It — exactly as it was, but with a few minor fix-ups for browsing speed — now lives on a very spiffy 32-core Opteron at Pizarro, the ISP.

The WWW UI is already up; the factoring proper will resume later tonight.

“Finite Field Arithmetic.” Chapter 10: Introducing Karatsuba’s Multiplication.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch10_karatsuba.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc

But do not run it quite yet.

First things first:

Manul Inspection.

  • The mail bag from Chapter 9 will be dealt with in Chapter 11.
  • This Chapter may require a cup of coffee.
  • This Chapter is dedicated to my most dedicated readers, such that I did not even know that I had, who publicly raged when it did not appear on schedule.

Recall the algebraic identities in Mul_Word of Chapter 9, where a single Word × Word multiplication is turned into four HalfWord × HalfWord ones. Let’s consider the general case.

Suppose that b is a positive integer, and we would like to multiply a 2b-bit-wide X by a 2b-bit-wide Y. We can express the multiplicands algebraically as follows:

X = XLo + 2bXHi
Y = YLo + 2bYHi

…where NLo consists of any 2b-wide integer N’s least-significant b binary digits, while NHi consists of its b most-significant binary digits. The multiplication itself can then be written as:

XY = (XLo + 2bXHi)(YLo + 2bYHi)
.. = XLoYLo + 2bXLoYHi + 2bXHiYLo + 2bXHi2bYHi
.. = XLoYLo + 2b(XLoYHi + XHiYLo) + 22bXHiYHi

…or pictured “physically”, drawn to scale (suppose that the most junior bit position in a given register is at the left edge of the rectangle representing said register; the most senior — at the right edge) :

× YLo XHi
+ XLoYHi
+ XHiYLo
+ XHiYHi
= XY

Performing this “schoolbook” 2b × 2b multiplication costs four b × b-wide multiplications (multiplications by powers of two are performed using inexpensive position shifting and do not figure towards this count) and three 2b-wide additions. This cost, as we learned in previous chapters, is proportional to the square of b. But in fact it is possible to do better — using a subquadratic multiplication algorithm.

We will make use of the oldest, best-known, and simplest such algorithm: the classic one derived from A. A. Karatsuba’s (1937 – 2008) “Multiplication of many-digital numbers by automatic computers”, Dokl. Akad. Nauk SSSR, 145:2 (1962), 293–294.

After working through this Chapter, the reader will possess a fits-in-head, constant-spacetime and provably-correct computerized implementation of Karatsuba’s algorithm.

First, let’s specify that our multiplicands X and Y are FZ-integers, and each of them occupies exactly L Words.

Recall that in Chapter 4, we have mandated that no FZ must come into existence with a bitwise length that is not a positive power of two. Therefore, at all times, a FZ will break cleanly into two halves of equal bitnesses, as many times as desired until halves each consisting of a FZ one Word in length are formed.

And so, at every recursive (see further below) invocation of our procedure, we are able to begin by finding a positive integer k (where L = 2k) i.e. the length of a multiplicand cut in half.

Thereby we can take b = MachineBitness × k, where k = L / 2.

Let’s also define, for brevity, the following recurring terms:

MM = XLoYHi + XHiYLo

And trivially obtain the equation:

XY = LL + 2bMM + 22bHH

Karatsuba’s discovery is the fact that it is possible to compute the MM middle term using only one b × b multiplication in place of two. In the most common implementations of the algorithm, this appears as the equivalence:

MM = XLoYHi + XHiYLo
.. = (XLo + XHi)(YLo + YHi) - HH - LL

This algebraic identity replaces an expensive operation (multiplication) with three cheap (addition/subtraction) arithmetical operations. However, the possibility of generating a carry during addition when computing the terms of the binomial makes the mechanism slightly slower and uglier than it strictly has to be. So we will instead use another variant of the algorithm, sometimes called subtractive Karatsuba. It makes use of the following trivial equivalence:

MM = LL + HH - (XLo - XHi)(YLo - YHi)

In FFA’s realization, this will take the following — at first, seemingly gnarlier — form:

Dx = |XLo - XHi|
Dy = |YLo - YHi|
DD = Dx × Dy
MM = LL + HH - (-1CX)(-1CY)DD

…where CX is the sign (in the convention where it equals 1 if XLo is less than XHi, and 0 otherwise) of XLo – XHi and CY is the sign (similarly) of YLo – YHi.

And it is not difficult to show that, if we take



MM = LL + HH + (-1DDSub)DD

Or, for kindergarten:

Or, for the thick, for the impatient,
CX CY DDSub (-1DDSub)DD MM =
0 0 1 -DD LL + HH - DD
0 1 0 DD LL + HH + DD
1 0 0 DD LL + HH + DD
1 1 1 -DD LL + HH - DD

…i.e. the DD term ends up subtracted if (XLo – XHi)(YLo – YHi) is positive; otherwise it gets added. Which is as it should be. And this gives us the formula:

XY = LL + 2b(LL + HH + (-1DDSub)DD) + 22bHH

And now let’s draw a possible shape for this mechanism “electrically”. First, let’s do the obvious Right Thing and begin by stuffing LL and HH into the lower and upper, respectively, halves of the register in which we are computing the sum. And after that, we will add the “middle” terms, at the requisite offset (i.e. at the b-th binary place) :

+ LL 0
+ HH 0
+ (-1DDSub)DD 0
= XY

The above would work as a physical realization, but our 2b-sized additions sadly became 3b-sized, because of the need to ripple out the carries (recall that the addition of any two s-sized integers is physically s+1-sized, as there is the possibility of overflow.) But can we do anything about this, or do we have to live with it?

Turns out: there is a pill. One economical way to make the additions stay 2b-sized, is to accumulate the carries. We will put them in a Word-sized register called TC (i.e. tail carry). After the three terms of the original equation have been added into the result XY, we will make the result correct by rippling out (at the cost of walking over a b-sized space) the accumulated carry in TC into the tail (i.e. most senior quarter segment) of the result XY.

The new scheme looks like this:

LL HH TC := 0
+ LL TC += Carry
+ HH TC += Carry
+ (-1DDSub)DD TC += Carry – DDSub
+ TC
= XY

The reason why, after the third addition, we must…

TC += Carry - DDSub

…is not uninteresting, and I am quite tempted to leave it as an exercise. And in fact I think I will:

Chapter 10, Exercise 1: Why is it necessary to subtract DDSub from TC ?

Observe that we can simplify the program we are about to write, by proving the lemma:

TC >= 0

The fact that TC is greater than or equal to zero by the time it gets rippled, trivially follows from the fact that:

MM = XLoYHi + XHiYLo

There is no way for the value of above expression to become negative, given that no subtraction happens in it, and that both of the multiplicands in both of the terms, are positive! From which it necessarily follows that, given as…

MM = LL + HH + (-1DDSub)DD

…it must then at all times be true that:

LL + HH >= (-1DDSub)DD

And therefore it will never become necessary to propagate a “borrow” when rippling TC into the tail, and we can use a strictly “additive” formulation in the pertinent code.

But what is the maximum possible value of TC ? It so happens that the answer is: two. And we can show… but wait. Why not leave this as an exercise ! So :

Chapter 10, Exercise 2: Can the value of TC ever become greater than 2 ? Give an example of a Karatsubaization where it does; or if it cannot, prove that it cannot.

And now let’s put all of this together into a mechanism:


   procedure Mul_Karatsuba(X  : in  FZ;
                           Y  : in  FZ;
                           XY : out FZ) is
      -- L is the wordness of a multiplicand. Guaranteed to be a power of two.
      L : constant Word_Count := X'Length;
      -- An 'LSeg' is the same length as either multiplicand.
      subtype LSeg is FZ(1 .. L);
      -- K is HALF of the length of a multiplicand.
      K : constant Word_Index := L / 2;
      -- A 'KSeg' is the same length as HALF of a multiplicand.
      subtype KSeg is FZ(1 .. K);
      -- The three L-sized variables of the product equation, i.e.:
      -- XY = LL + 2^(K*Bitness)(LL + HH + (-1^DD_Sub)*DD) + 2^(2*K*Bitness)HH
      LL, DD, HH : LSeg;
      -- K-sized terms of Dx * Dy = DD
      Dx, Dy     : KSeg;  -- Dx = abs(XLo - XHi) , Dy = abs(YLo - YHi)
      -- Subtraction borrows, signs of (XL - XH) and (YL - YH),
      Cx, Cy     : WBool; -- so that we can calculate (-1^DD_Sub)
      -- Bottom and Top K-sized halves of the multiplicand X.
      XLo        : KSeg  renames    X(  X'First       ..   X'Last - K );
      XHi        : KSeg  renames    X(  X'First + K   ..   X'Last     );
      -- Bottom and Top K-sized halves of the multiplicand Y.
      YLo        : KSeg  renames    Y(  Y'First       ..   Y'Last - K );
      YHi        : KSeg  renames    Y(  Y'First + K   ..   Y'Last     );
      -- L-sized middle segment of the product XY (+/- K from the midpoint).
      XYMid      : LSeg  renames   XY( XY'First + K   ..  XY'Last - K );
      -- Bottom and Top L-sized halves of the product XY.
      XYLo       : LSeg  renames   XY( XY'First       ..  XY'Last - L );
      XYHi       : LSeg  renames   XY( XY'First + L   ..  XY'Last     );
      -- Topmost K-sized quarter segment of the product XY, or 'tail'
      XYHiHi     : KSeg  renames XYHi( XYHi'First + K .. XYHi'Last    );
      -- Whether the DD term is being subtracted.
      DD_Sub     : WBool;
      -- Carry from individual term additions.
      C          : WBool;
      -- Tail-Carry accumulator, for the final ripple
      TC         : Word;
      -- Recurse: LL := XL * YL
      FZ_Multiply(XLo, YLo, LL);
      -- Recurse: HH := XH * YH
      FZ_Multiply(XHi, YHi, HH);
      -- Dx := |XL - XH| , Cx := Borrow (i.e. 1 iff XL < XH)
      FZ_Sub_Abs(X => XLo, Y => XHi, Difference => Dx, Underflow => Cx);
      -- Dy := |YL - YH| , Cy := Borrow (i.e. 1 iff YL < YH)
      FZ_Sub_Abs(X => YLo, Y => YHi, Difference => Dy, Underflow => Cy);
      -- Recurse: DD := Dx * Dy
      FZ_Multiply(Dx, Dy, DD);
      -- Whether (XL - XH)(YL - YH) is positive, and so DD must be subtracted:
      DD_Sub := 1 - (Cx xor Cy);
      -- XY := LL + 2^(2 * K * Bitness) * HH
      XYLo := LL;
      XYHi := HH;
      -- XY += 2^(K * Bitness) * HH, but carry goes in Tail Carry accum.
      FZ_Add_D(X => XYMid, Y => HH, Overflow => TC);
      -- XY += 2^(K * Bitness) * LL, ...
      FZ_Add_D(X => XYMid, Y => LL, Overflow => C);
      -- ... but the carry goes into the Tail Carry accumulator.
      TC := TC + C;
      -- XY += 2^(K * Bitness) * (-1^DD_Sub) * DD
      FZ_Not_Cond_D(N => DD, Cond => DD_Sub); -- invert DD if 2s-complementing
      FZ_Add_D(OF_In    => DD_Sub,            -- ... and then must increment
               X        => XYMid,
               Y        => DD,
               Overflow => C); -- carry will go in Tail Carry accumulator
      -- Compute the final Tail Carry for the ripple
      TC := TC + C - DD_Sub;
      -- Barring a cosmic ray, 0 < = TC <= 2 .
      pragma Assert(TC <= 2);
      -- Ripple the Tail Carry into the tail.
      FZ_Add_D_W(X => XYHiHi, W => TC, Overflow => C);
      -- Barring a cosmic ray, the tail ripple will NOT overflow.
      pragma Assert(C = 0);
   end Mul_Karatsuba;

But… Does this routine work? How fast does it run? What do the new procedures like FZ_Not_Cond_D do? Where and when do we recurse ? Why did Comba’s Multiplier have to change at all !? And, most importantly, is the proof “proofy” enough ?? Find out in Chapter 11!

~To be continued!~