“Finite Field Arithmetic.” Chapter 11: Tuning and Unified API.
This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.
- Chapter 1: Genesis.
- Chapter 2: Logical and Bitwise Operations.
- Chapter 3: Shifts.
- Chapter 4: Interlude: FFACalc.
- Chapter 5: "Egyptological" Multiplication and Division.
- Chapter 6: "Geological" RSA.
- Chapter 7: "Turbo Egyptians."
- Chapter 8: Interlude: Randomism.
- Chapter 9: "Exodus from Egypt" with Comba's Algorithm.
- Chapter 10: Introducing Karatsuba's Multiplication.
- Chapter 11: Tuning and Unified API.
You will need:
- All of the materials from Chapters 1 - 10.
ffa_ch11_tuning_and_api.vpatch(see here)ffa_ch11_tuning_and_api.vpatch.asciilifeform.sig- ffa_ch11_tuning_and_api.kv.vpatch
- ffa_ch11_tuning_and_api.kv.vpatch.asciilifeform.sig
Add the above vpatch and seal to your V-set, and press to ffa_ch11_tuning_and_api.kv.vpatch.
You should end up with the same directory structure as previously.
Now compile ffacalc:
cd ffacalc gprbuild
But do not run it quite yet.
First things first:
During the FFA sabbatical following Chapter 10, several of my readers did some very notable supporting work:
- Reader apeloyee observed that the modular exponentiation operation is a poor benchmark for multiplication speed, on account of the fact that the current algorithm spends the vast majority of its CPU time inside the Knuth division routine.
- Reader spyked tried his hand at producing a proof of correctness of the mechanisms in Chapter 5.
- Reader ave1 carefully analyzed the executables generated by AdaCore's x86-64 GNAT, and showed that this compiler is prone to ignore Inline pragmas unless they are specified in a particular way.
- ave1 also produced a fully-self-building, retargetable, fully-static, and glibc-free GNAT. On top of this, at the time of this writing, he is also working on a entirely delibraryized GNAT, which will produce compact and deterministic -- i.e. smoothly hand-auditable executables, and will eventually allow FFA to run on embedded devices (including even such systems as classical x86 DOS.)
As originally planned, this Chapter was to concern itself with the detailed mechanics of Karatsuba's Multiplication. However, this discussion will be postponed until Chapter 12. Chapter 11 will instead discuss the result of incorporating the efforts of the alert readership into FFA; as well as a handful of minor/cosmetic improvements.
The first order of business is ave1's discovery concerning GNAT's Inline_Always pragma. It turns out that AdaCore's compiler will selectively ignore this directive unless it is given inside an .ads declaration file containing the declaration of the routine that is to be inlined. (Formerly these were placed in the .adb routine bodies.)
An interesting side-effect of making this seemingly-trivial fix, was the discovery of the (somewhat obvious in retrospect) fact that forced inlining is incompatible with the use of Ada precondition statements. A particular routine can be inlined, or subject to runtime preconditions, but not both.
I then set about to determine which routines in FFA benefit from forced inlining. However, I found it impermissible to present to the end-user routines bereft of runtime precondition enforcement. Therefore I found it necessary to expose all end-user functionality strictly via a purpose-written API, where preconditions remain enforced whether or not the implementation makes use of inlining. This API appears below:
ffa.ads:
with Words; use Words; with FZ_Type; use FZ_Type; with W_Pred; with FZ_Lim; with FZ_Basic; with FZ_IO; with FZ_Cmp; with FZ_Pred; with FZ_BitOp; with FZ_Divis; with FZ_ModEx; -- FFA Exports package FFA is pragma Pure; ---------------------------------------------------------------------------- --- Fundamental Types and Sizes ---------------------------------------------------------------------------- subtype Word is Words.Word; subtype WBool is Words.WBool; subtype Nibble is Words.Nibble; subtype FZ is FZ_Type.FZ; subtype Indices is FZ_Type.Indices; subtype Char_Count is FZ_IO.Char_Count; Bitness : Positive renames Words.Bitness; ---------------------------------------------------------------------------- --- Word Predicates ---------------------------------------------------------------------------- -- Return 1 if N is equal to 0; otherwise return 0. function FFA_Word_ZeroP(N : in Word) return WBool renames W_Pred.W_ZeroP; -- Return 1 if N is unequal to 0; otherwise return 0. function FFA_Word_NZeroP(N : in Word) return WBool renames W_Pred.W_NZeroP; -- Return WBool-complement of N. function FFA_Word_Not(N : in WBool) return WBool renames W_Pred.W_Not; -- Return 1 if N is odd; otherwise return 0. function FFA_Word_OddP(N : in Word) return WBool renames W_Pred.W_OddP; -- Return 1 if A is equal to B ; otherwise return 0. function FFA_Word_EqP(A : in Word; B : in Word) return WBool renames W_Pred.W_EqP; ---------------------------------------------------------------------------- --- FZ Limits ---------------------------------------------------------------------------- FFA_Validity_Rule_Doc : String renames FZ_Lim.FZ_Validity_Rule_Doc; -- Determine if a proposed FFA Bitness is valid. function FFA_FZ_Valid_Bitness_P(B : in Positive) return Boolean renames FZ_Lim.FZ_Valid_Bitness_P; ---------------------------------------------------------------------------- --- FZ Basics ---------------------------------------------------------------------------- -- Determine the Bitness of N function FFA_FZ_Bitness(N : in FZ) return Bit_Count renames FZ_Basic.FZ_Bitness; -- N := 0 procedure FFA_FZ_Clear(N : out FZ) renames FZ_Basic.FZ_Clear; -- Set given FZ to a given truth value procedure FFA_WBool_To_FZ(V : in WBool; N : out FZ) renames FZ_Basic.WBool_To_FZ; -- First Word of N := Source procedure FFA_FZ_Set_Head(N : out FZ; Source : in Word) renames FZ_Basic.FZ_Set_Head; -- First Word of N function FFA_FZ_Get_Head(N : in FZ) return Word renames FZ_Basic.FZ_Get_Head; -- Exchange X and Y procedure FFA_FZ_Swap(X : in out FZ; Y : in out FZ) with Pre => X'Length = Y'Length; -- Constant-time MUX: Sel = 0: Result := X; Sel = 1: Result := Y procedure FFA_FZ_Mux(X : in FZ; Y : in FZ; Result : out FZ; Sel : in WBool) with Pre => X'Length = Y'Length and X'Length = Result'Length; ---------------------------------------------------------------------------- --- FZ IO Operations ---------------------------------------------------------------------------- -- Expand FZ N by nibble D, and determine whether this operation overflowed procedure FFA_FZ_Insert_Bottom_Nibble(N : in out FZ; D : in Nibble; Overflow : out WBool) renames FZ_IO.FZ_Insert_Bottom_Nibble; -- Determine the number of ASCII characters required to represent N function FFA_FZ_ASCII_Length(N : in FZ) return Char_Count renames FZ_IO.FZ_ASCII_Length; -- Write an ASCII hex representation of N into existing string buffer S procedure FFA_FZ_To_Hex_String(N : in FZ; S : out String) renames FZ_IO.FZ_To_Hex_String; ---------------------------------------------------------------------------- --- Comparison Predicate Operations on FZ ---------------------------------------------------------------------------- -- 1 iff X == Y (branch-free); else 0 function FFA_FZ_EqP(X : in FZ; Y: in FZ) return WBool renames FZ_Cmp.FZ_EqP; -- 1 iff X < Y (branch-free); else 0 function FFA_FZ_LessThanP(X : in FZ; Y : in FZ) return WBool renames FZ_Cmp.FZ_LessThanP; -- 1 iff X > Y (branch-free); else 0 function FFA_FZ_GreaterThanP(X : in FZ; Y : in FZ) return WBool renames FZ_Cmp.FZ_GreaterThanP; ---------------------------------------------------------------------------- --- Fundamental Predicate Operations on FZ ---------------------------------------------------------------------------- -- 1 iff N == 0 (branch-free); else 0 function FFA_FZ_ZeroP(N : in FZ) return WBool renames FZ_Pred.FZ_ZeroP; -- 1 iff N != 0 (branch-free); else 0 function FFA_FZ_NZeroP(N : in FZ) return WBool renames FZ_Pred.FZ_NZeroP; -- 1 iff N is odd function FFA_FZ_OddP(N : in FZ) return WBool renames FZ_Pred.FZ_OddP; ---------------------------------------------------------------------------- --- Bitwise Operations on FZ ---------------------------------------------------------------------------- -- Result := X & Y procedure FFA_FZ_And(X : in FZ; Y : in FZ; Result : out FZ) with Pre => X'Length = Y'Length and X'Length = Result'Length; -- N := N & W, W is a Word procedure FFA_FZ_And_W(N : in out FZ; W : in Word) renames FZ_BitOp.FZ_And_W; -- Result := X | Y procedure FFA_FZ_Or(X : in FZ; Y : in FZ; Result : out FZ) with Pre => X'Length = Y'Length and X'Length = Result'Length; -- N := N | W, W is a Word procedure FFA_FZ_Or_W(N : in out FZ; W : in Word) renames FZ_BitOp.FZ_Or_W; -- Result := X ^ Y procedure FFA_FZ_Xor(X : in FZ; Y : in FZ; Result : out FZ) with Pre => X'Length = Y'Length and X'Length = Result'Length; -- N := N ^ W, W is a Word procedure FFA_FZ_Xor_W(N : in out FZ; W : in Word) renames FZ_BitOp.FZ_Xor_W; -- NotN := ~N ('ones complement') procedure FFA_FZ_Not(N : in FZ; NotN : out FZ) with Pre => N'Length = NotN'Length; ---------------------------------------------------------------------------- --- Basic Arithmetic on FZ ---------------------------------------------------------------------------- -- Sum := X + Y; Overflow := Carry procedure FFA_FZ_Add(X : in FZ; Y : in FZ; Sum : out FZ; Overflow : out WBool) with Pre => X'Length = Y'Length and X'Length = Sum'Length; -- Difference := X - Y; Underflow := Borrow procedure FFA_FZ_Subtract(X : in FZ; Y : in FZ; Difference : out FZ; Underflow : out WBool) with Pre => X'Length = Y'Length and X'Length = Difference'Length; ---------------------------------------------------------------------------- --- Division on FZ ---------------------------------------------------------------------------- -- Dividend is divided by Divisor, producing Quotient and Remainder. -- WARNING: NO div0 test here! Caller must test. procedure FFA_FZ_IDiv(Dividend : in FZ; Divisor : in FZ; Quotient : out FZ; Remainder : out FZ) renames FZ_Divis.FZ_IDiv; -- Exactly same thing as IDiv, but keep only the Quotient procedure FFA_FZ_Div(Dividend : in FZ; Divisor : in FZ; Quotient : out FZ) renames FZ_Divis.FZ_Div; -- Modulus. procedure FFA_FZ_Mod(Dividend : in FZ; Divisor : in FZ; Remainder : out FZ) renames FZ_Divis.FZ_Mod; ---------------------------------------------------------------------------- --- Multiplication on FZ ---------------------------------------------------------------------------- -- Multiplier. Preserves the inputs. procedure FFA_FZ_Multiply(X : in FZ; Y : in FZ; XY_Lo : out FZ; XY_Hi : out FZ) with Pre => X'Length = Y'Length and XY_Lo'Length = XY_Hi'Length and XY_Lo'Length = ((X'Length + Y'Length) / 2); ---------------------------------------------------------------------------- --- Modular Operations on FZ ---------------------------------------------------------------------------- -- Modular Multiply: Product := X*Y mod Modulus procedure FFA_FZ_Modular_Multiply(X : in FZ; Y : in FZ; Modulus : in FZ; Product : out FZ) renames FZ_ModEx.FZ_Mod_Mul; -- Modular Exponent: Result := Base^Exponent mod Modulus procedure FFA_FZ_Modular_Exponentiate(Base : in FZ; Exponent : in FZ; Modulus : in FZ; Result : out FZ) renames FZ_ModEx.FZ_Mod_Exp; end FFA;
ffa.adb:
with FZ_Arith; with FZ_Shift; with FZ_Mul; -- Wrapper bodies for routines that we inline, but must enforce preconditions -- on when called by FFA user. package body FFA is ---------------------------------------------------------------------------- --- FZ Basics ---------------------------------------------------------------------------- -- Exchange X and Y procedure FFA_FZ_Swap(X : in out FZ; Y : in out FZ) is begin FZ_Basic.FZ_Swap(X => X, Y => Y); end FFA_FZ_Swap; -- Constant-time MUX: Sel = 0: Result := X; Sel = 1: Result := Y procedure FFA_FZ_Mux(X : in FZ; Y : in FZ; Result : out FZ; Sel : in WBool) is begin FZ_Basic.FZ_Mux(X => X, Y => Y, Result => Result, Sel => Sel); end FFA_FZ_Mux; ---------------------------------------------------------------------------- --- Bitwise Operations on FZ ---------------------------------------------------------------------------- -- Result := X & Y procedure FFA_FZ_And(X : in FZ; Y : in FZ; Result : out FZ) is begin FZ_BitOp.FZ_And(X => X, Y => Y, Result => Result); end FFA_FZ_And; -- Result := X | Y procedure FFA_FZ_Or(X : in FZ; Y : in FZ; Result : out FZ) is begin FZ_BitOp.FZ_Or(X => X, Y => Y, Result => Result); end FFA_FZ_Or; -- Result := X ^ Y procedure FFA_FZ_Xor(X : in FZ; Y : in FZ; Result : out FZ) is begin FZ_BitOp.FZ_Xor(X => X, Y => Y, Result => Result); end FFA_FZ_Xor; -- NotN := ~N ('ones complement') procedure FFA_FZ_Not(N : in FZ; NotN : out FZ) is begin FZ_BitOp.FZ_Not(N => N, NotN => NotN); end FFA_FZ_Not; ---------------------------------------------------------------------------- --- Arithmetic on FZ ---------------------------------------------------------------------------- -- Sum := X + Y; Overflow := Carry procedure FFA_FZ_Add(X : in FZ; Y : in FZ; Sum : out FZ; Overflow : out WBool) is begin FZ_Arith.FZ_Add(X => X, Y => Y, Sum => Sum, Overflow => Overflow); end FFA_FZ_Add; -- Difference := X - Y; Underflow := Borrow procedure FFA_FZ_Subtract(X : in FZ; Y : in FZ; Difference : out FZ; Underflow : out WBool) is begin FZ_Arith.FZ_Sub(X => X, Y => Y, Difference => Difference, Underflow => Underflow); end FFA_FZ_Subtract; ---------------------------------------------------------------------------- --- Multiplication on FZ ---------------------------------------------------------------------------- procedure FFA_FZ_Multiply(X : in FZ; Y : in FZ; XY_Lo : out FZ; XY_Hi : out FZ) is begin FZ_Mul.FZ_Multiply_Buffered(X => X, Y => Y, XY_Lo => XY_Lo, XY_Hi => XY_Hi); end FFA_FZ_Multiply; end FFA;
FFA_Calc has been reworked to make use of the strictly-specified API routines, i.e.:
-- FFA with FFA; use FFA; -- For the intrinsic equality operator on Words use type FFA.Word;
... in place of the old:
-- FFA with FZ_Lim; use FZ_Lim; with Words; use Words; with W_Pred; use W_Pred; with FZ_Type; use FZ_Type; with FZ_Basic; use FZ_Basic; with FZ_Arith; use FZ_Arith; with FZ_Cmp; use FZ_Cmp; with FZ_Pred; use FZ_Pred; with FZ_BitOp; use FZ_BitOp; with FZ_Shift; use FZ_Shift; with FZ_Divis; use FZ_Divis; with FZ_Mul; use FZ_Mul; with FZ_ModEx; use FZ_ModEx;
... and elsewhere, as necessary. (All of the exported routines have been renamed, to carry a prefix of FFA_ .)
One other change to FFA_Calc is the abolition of ffa_io.ads and ffa_io.adb: their functionality has been subsumed into FFA proper, in the shape of an FZ_IO module:
fz_io.ads:
with Words; use Words; with FZ_Type; use FZ_Type; with FZ_Lim; use FZ_Lim; package FZ_IO is pragma Pure; -- Expand FZ N by nibble D, and determine whether this operation overflowed procedure FZ_Insert_Bottom_Nibble(N : in out FZ; D : in Nibble; Overflow : out WBool); -- A count of ASCII chars representing a humanized FZ: subtype Char_Count is Positive range Nibbleness * FZ_Minimal_Wordness .. Positive'Last; -- Determine the number of ASCII characters required to represent N function FZ_ASCII_Length(N : in FZ) return Char_Count; pragma Inline_Always(FZ_ASCII_Length); -- Hex Digits (currently used only in FZ_To_Hex_String) HexDigs : constant array(0 .. 15) of Character := "0123456789ABCDEF"; -- Write an ASCII hex representation of N into existing string buffer S procedure FZ_To_Hex_String(N : in FZ; S : out String) with Pre => S'Length = FZ_ASCII_Length(N); end FZ_IO;
fz_io.adb:
with W_Pred; use W_Pred; with W_Shifts; use W_Shifts; with FZ_BitOp; use FZ_BitOp; with FZ_Shift; use FZ_Shift; package body FZ_IO is -- Expand FZ N by nibble D, and determine whether this operation overflowed procedure FZ_Insert_Bottom_Nibble(N : in out FZ; D : in Nibble; Overflow : out WBool) is -- The overflow, if any, from shifting N in-place leftward by 4 bits Shifted_N_Overflow : Word := 0; begin -- Make room in N for one additional hex digit (i.e. multiply N by 16) FZ_ShiftLeft_O(N => N, ShiftedN => N, Count => 4, Overflow => Shifted_N_Overflow); -- Place the new digit into the now-vacated four bits at the bottom of N. FZ_Or_W(N, D); -- Record whether the above operation overflowed N: Overflow := W_NZeroP(Shifted_N_Overflow); end FZ_Insert_Bottom_Nibble; -- Determine the number of ASCII characters required to represent N function FZ_ASCII_Length(N : in FZ) return Char_Count is begin return N'Length * Nibbleness; end FZ_ASCII_Length; -- Write an ASCII hex representation of N into existing string buffer S procedure FZ_To_Hex_String(N : in FZ; S : out String) is -- Indices into the string S (note, String always indexes from 1) subtype SiRange is Natural range S'First .. S'Last; -- Position of current character in S being written Si : SiRange; -- Walks from 1 to the string length of S begin -- Step through all indices of N, regardless of how it was indexed: for i in 0 .. Word_Index(N'Length - 1) loop declare -- Index of current Word, walks from ~top~ Word of N to ~bottom~ Wi : constant Word_Index := N'Last - i; -- Currently-selected Word of N W : Word := N(Wi); begin -- For each nibble in the Word: for j in 1 .. Nibbleness loop -- Current position in S that is to be written Si := (Natural(i) * Nibbleness) + j; -- Rotate the top nibble of W into the bottom nibble. W := Rotate_Left(W, 4); -- Write the ASCII representation of the bottom nibble. S(Si) := HexDigs(Natural(W and 16#F#)); end loop; -- Barring cosmic ray, W will have rotated to its initial value pragma Assert(W = N(Wi)); end; end loop; -- Barring cosmic ray, the last char written was to the final pos in S, pragma Assert(Si = SiRange'Last); -- as S is mandatorily exactly-sized. end FZ_To_Hex_String; end FZ_IO;
All invocations of FZ-to-console output in FFA_Calc now make use of the following subroutine:
-- Emit an ASCII representation of N to the terminal procedure Print_FZ(N : in FZ) is S : String(1 .. FFA_FZ_ASCII_Length(N)); -- Mandatorily, exact size begin FFA_FZ_To_Hex_String(N, S); -- Convert N to ASCII hex Write_String(S); -- Print the result to stdout Write_Newline; -- Print newline, for clarity. end Print_FZ;
Observe that under no circumstances is a string of indeterminate length returned by a FFA subroutine: this would require the the use of the Ada secondary stack, which I have renounced as a Bad Idea universally. Instead, the I/O routine makes use of the preallocation technique illustrated in Chapter 4's commandline param processor.
Last but not least, the project now includes a HISTORY.TXT chronology. Currently this consists of:
HISTORY.TXT:
############### ### HISTORY ### ############### "Chapter 1: Genesis." ffa_ch1_genesis.vpatch http://www.loper-os.org/?p=1913 "Chapter 2: Logical and Bitwise Operations." ffa_ch2_logicals.vpatch http://www.loper-os.org/?p=2026 "Chapter 3: Shifts." ffa_ch3_shifts.vpatch http://www.loper-os.org/?p=2032 "Chapter 4: Interlude: FFACalc." ffa_ch4_ffacalc.vpatch http://www.loper-os.org/?p=2051 "Chapter 5: "Egyptological" Multiplication and Division." ffa_ch5_egypt.vpatch http://www.loper-os.org/?p=2071 "Chapter 6: "Geological" RSA." ffa_ch6_simplest_rsa.vpatch http://www.loper-os.org/?p=2105 "Chapter 7: "Turbo Egyptians."" ffa_ch7_turbo_egyptians.vpatch http://www.loper-os.org/?p=2118 "Chapter 8: Interlude: Randomism." ffa_ch8_randomism.vpatch http://www.loper-os.org/?p=2175 "Chapter 9: "Exodus from Egypt" with Comba’s Algorithm." ffa_ch9_exodus.vpatch http://www.loper-os.org/?p=2186 "Chapter 10: Introducing Karatsuba’s Multiplication." ffa_ch10_karatsuba.vpatch http://www.loper-os.org/?p=2238 "Chapter 11: Tuning and Unified API." ffa_ch11_tuning_and_api.vpatch ### YOU ARE HERE ###
The particular format of this document may change in the future, as the pertinent vtronics work is still in progress.
In Chapter 12, we will discuss in detail the substantial performance gains obtained from the use of Chapter 10's Karatsuba's Multiplication in modular exponentiation; and likewise from the implementation of correct inlining, described in the current Chapter. And the effect of ave1's new GNAT on FFA perfomance will be revealed.
We will also review the exact mechanics of Karatsuba's algorithm, and present a small optimization thereof. Additionally, a sane approach to the "Iron vs. Soft MUL" problem posed in Chapter 9 will be presented.
~To be continued!~