## "Finite Field Arithmetic." Chapter 1: Genesis.

FFA, or the *Finite Field Arithmetic* library, differs from the typical *Open Sores* abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to *read* and *fully understand* every single line. In exactly the same manner that you would understand and pack your own parachute.

*Current Table of Contents:*

- "Finite Field Arithmetic." Chapter 1: Genesis.
- “Finite Field Arithmetic.” Chapter 2: Logical and Bitwise Operations.
- “Finite Field Arithmetic.” Chapter 3: Shifts.
- “Finite Field Arithmetic.” Chapter 4: Interlude: FFACalc.
- “Finite Field Arithmetic.” Chapter 5: "Egyptological" Multiplication and Division.
- “Finite Field Arithmetic.” Chapter 6: “Geological” RSA.
- “Finite Field Arithmetic.” Chapter 7: “Turbo Egyptians.”
- “Finite Field Arithmetic.” Chapter 8: Interlude: Randomism.
- “Finite Field Arithmetic.” Chapter 9: "Exodus from Egypt" with Comba's Algorithm.
- “Finite Field Arithmetic.” Chapter 10: Introducing Karatsuba's Multiplication.
- “Finite Field Arithmetic.” Chapter 11: Tuning and Unified API.
- Hypertext Concordance for "Finite Field Arithmetic."
- “Finite Field Arithmetic” Regrind into Keccak-V Format.
- “Finite Field Arithmetic.” Chapter 12A: Karatsuba Redux. (Part 1 of 2)
- “Finite Field Arithmetic.” Chapter 12B: Karatsuba Redux. (Part 2 of 2)
- “Finite Field Arithmetic.” Chapter 13: "Width-Measure" and "Quiet Shifts."
- “Finite Field Arithmetic.” Chapter 14A: Barrett's Modular Reduction. (Part 1 of 2)
- “Finite Field Arithmetic.” Chapter 14A-Bis: Barrett's Modular Reduction. (Physical Bounds Proof.)
- “Finite Field Arithmetic.” Chapter 14B: Barrett's Modular Reduction. (Part 2 of 2.)
- “Finite Field Arithmetic” vs MPI.
- “Finite Field Arithmetic.” Chapter 15: Greatest Common Divisor.
- “Finite Field Arithmetic.” Chapter 16A: The Miller-Rabin Test.
- “Finite Field Arithmetic.” Chapter 17: Introduction to Peh.
- “Finite Field Arithmetic.” Chapter 18A: Subroutines in Peh.
- “Finite Field Arithmetic.” Chapter 18B: "Cutouts" in Peh.
- “Finite Field Arithmetic.” Chapter 18C: Peh School: Generation of Cryptographic Primes.
- “Finite Field Arithmetic.” Chapter 19: Peh Tuning and Demo Tapes.
- "Finite Field Arithmetic." Chapter 20: "Litmus", a Peh-Powered Verifier for GPG Signatures.
- "Finite Field Arithmetic." Chapter 20B: Support for Selected Ancient Hash Algos in "Litmus."
- "Finite Field Arithmetic." Chapter 20C: Support for 'Clearsigned' GPG texts in "Litmus."
- "Finite Field Arithmetic." Chapter 20D: "Litmus" Errata: Support for Nested Clearsigned Texts.
- "Finite Field Arithmetic." Chapter 21A: Extended GCD and Modular Multiplicative Inverse. (Part 1 of 3)
- "Finite Field Arithmetic." Chapter 21A-Bis: Fix for Lethal Flaw in Ch.15's Greatest Common Divisor.

This article is the first in a series, at the end of which the patient reader will have glued together, with his own hands -- and *fully fit into his head* -- a demonstrably-correct set of routines for multiple-precision arithmetic, sufficient for carrying out RSA and many other forms of number-theoretical cryptography.

The "bignum" arithmetic libraries currently in common use were, near as I can tell, all and without exception written by NSA assets and their various "useful idiots". The conclusion is difficult to avoid, given as every example known to me suffers from one or more of the following ills:

- Elephantine size (e.g. OpenSSL) to the point of effectively-total unauditability -- demanding that users place trust in people who have repeatedly proven themselves untrustworthy.
- Pointless complexity, where small improvements in performance are bought at the cost of thousands of lines of gnarly micro-optimization.
- Written in a language which makes use of pointer arithmetic, array accesses with unchecked bounds, physically capable of buffer overflow. (e.g. C, C++)
- Or, alternatively to above, written in a language which incorporates a massive, frequently-changing, and effectively-unauditable runtime into the program logic. (e.g. Python)
- Written in a vendor-controlled language with no paper standard and arbitrary future breakage. (e.g. Python, Golang, various Microsoftisms)
- Rely heavily on inline assembler and other intimately iron-specific optimizations
**Leaks secret values via timing and electromagnetic side channels.**Often mitigated with pseudo-mitigations.

FFA, on the other hand:

- Written in a heavily-restricted subset of the Ada programming language -- the only currently-existing nonproprietary statically-compiled language which permits fully bounds-checked,
*pointerolade*-free code and practically-auditable binaries. We will be using GNAT, which relies on the GCC backend. - No external dependencies or libraries are made use of; suitable for embedded systems with limited memory. Language features which bloat the generated binary (e.g. OOP) are avoided.
- No use of heap allocators. (In fact, wholly allocator-agnostic.) Memory requirement for all operations is readily calculable in advance from the width of the inputs.
- The total set of components needed for RSA occupies around 3000 lines, incl. comments, of clear and unidiomatic (i.e. containing no "perlisms") code.
- Contains no inline assembler and is married to no particular machine (i.e. can be built for microcontrollers, MS-DOS boxes, CPUs of virtually any "bitness")
**Does not branch-on-secret-bits****Does not address memory by secret values**, i.e. is cachebleed-proof- Fully unrollable, deterministic (and ergo auditable) control flow. I.e. all RSA modular-exponentiation operations will take exactly the same amount of time, via exactly the same flow of instructions; all
*algorithmic-complexity attack*vectors are ruled out.

FFA will be presented to the reader in small pieces. Each should take more than one hour, but probably fewer than two, to fully study and understand.

You will need:

- A
**V**-tron. V is not optional. If you have no idea what V is, start here. ~~ffa_ch1_genesis.vpatch~~(see here)~~ffa_ch1_genesis.vpatch.asciilifeform.sig~~- ffa_ch1_genesis.kv.vpatch
- ffa_ch1_genesis.kv.vpatch.asciilifeform.sig
- My public key.
- GNAT. It may already be on your machine, if you have GCC! Alternatively, you can use AdaCore's "official" distribution of GNAT: it is easy to install. Avoid GCC version 5 or above -- the project has fallen into the hands of questionable people.
- And, of course, a PC, running something other than MSWin (I can't think of any reason why FFA would not work there, but I haven't any notion re: why a MS victim would take an interest in FFA...)

Just as *"theater begins with the coat room"*, each FFA article begins with a V-*press*. Set up your Vtron if you have not already done so, and press the "genesis" -- "ffa_ch1_genesis".

You will end up with two directories, *libffa* and *ffademo*. The former contains the subset of FFA discussed in this chapter; the latter -- the demo routines for same. The reader is invited to modify the demo routines, experiment, learn.

Each subsequent chapter of the FFA series will contain a *V-patch* on the previous set, and the next -- another, until the final article in the series: where *"P"* -- an *FFA-tronic* GPG replacement -- is introduced.

First, verify that your V-Tron, GNAT, and genesis press are in working order, by compiling the introductory demo:

cd ffademo gprbuild

The output should resemble:

using project file ffa_demo.gpr gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections ffa_demo.adb gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections -gnatec=/home/you/ffa/libffa/restrict.adc word_ops.adb gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections -gnatec=/home/you/ffa/libffa/restrict.adc fz_type.ads gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections -gnatec=/home/you/ffa/libffa/restrict.adc fz_arith.adb gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections -gnatec=/home/you/ffa/libffa/restrict.adc w_shifts.ads gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections -gnatec=/home/you/ffa/libffa/restrict.adc iron.ads gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections -gnatec=/home/you/ffa/libffa/restrict.adc words.ads gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections demo_ch1.adb gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections ffa_io.adb gprlib FFA.lexch ar cr /home/you/ffa/libffa/lib/libFFA.a /home/you/ffa/libffa/obj/word_ops.o /home/you/ffa/libffa/obj/fz_type.o ... ranlib libFFA.a gprbind ffa_demo.bexch gnatbind ffa_demo.ali gcc -c b__ffa_demo.adb gcc ffa_demo.o -Wl,--gc-sections -static -o ffa_demo

Observe that the contents of the 'libffa' directory build recursively.

If you are an Apple victim, your OS prohibits static linking, and you may see barf instead. You may use the debug (dynamically linked) variant:

gprbuild -Xmode=debug

Do not run the demo yet. (you read programs before running, don't you?) Instead, let's take a careful look at what is inside.

Let's begin with the *restriction pragmas*. These define a particular, constrained subset of the Ada programming language, by prohibiting the use of certain features. GNAT will balk if it encounters a prohibited feature in FFA or the demo routines.

The logic behind the restrictions was a combination of "guilty until proven innocent" -- if we can do without a construct, we will -- and "enable all safety features":

*restrict.adc*:

pragma Restrictions(Immediate_Reclamation); pragma Restrictions(Max_Asynchronous_Select_Nesting => 0); pragma Restrictions(Max_Protected_Entries => 0); pragma Restrictions(Max_Select_Alternatives => 0); pragma Restrictions(Max_Task_Entries => 0); pragma Restrictions(Max_Tasks => 0); pragma Restrictions(No_Abort_Statements); pragma Restrictions(No_Access_Parameter_Allocators); pragma Restrictions(No_Allocators); pragma Restrictions(No_Asynchronous_Control); pragma Restrictions(No_Calendar); pragma Restrictions(No_Coextensions); pragma Restrictions(No_Default_Stream_Attributes); pragma Restrictions(No_Delay); pragma Restrictions(No_Dispatch); pragma Restrictions(No_Dispatching_Calls); pragma Restrictions(No_Dynamic_Attachment); pragma Restrictions(No_Dynamic_Priorities); pragma Restrictions(No_Entry_Calls_In_Elaboration_Code); pragma Restrictions(No_Entry_Queue); pragma Restrictions(No_Enumeration_Maps); pragma Restrictions(No_Exception_Propagation); pragma Restrictions(No_Exception_Registration); pragma Restrictions(No_Finalization); pragma Restrictions(No_Fixed_Io); pragma Restrictions(No_Floating_Point); pragma Restrictions(No_Implementation_Aspect_Specifications); pragma Restrictions(No_Implementation_Units); pragma Restrictions(No_Implicit_Aliasing); pragma Restrictions(No_Implicit_Conditionals); pragma Restrictions(No_Implicit_Dynamic_Code); pragma Restrictions(No_Implicit_Heap_Allocations); pragma Restrictions(No_Implicit_Protected_Object_Allocations); pragma Restrictions(No_Implicit_Task_Allocations); pragma Restrictions(No_Initialize_Scalars); pragma Restrictions(No_Local_Protected_Objects); pragma Restrictions(No_Local_Timing_Events); pragma Restrictions(No_Multiple_Elaboration); pragma Restrictions(No_Nested_Finalization); pragma Restrictions(No_Protected_Type_Allocators); pragma Restrictions(No_Protected_Types); pragma Restrictions(No_Relative_Delay); pragma Restrictions(No_Requeue_Statements); pragma Restrictions(No_Secondary_Stack); pragma Restrictions(No_Select_Statements); pragma Restrictions(No_Specific_Termination_Handlers); pragma Restrictions(No_Standard_Allocators_After_Elaboration); pragma Restrictions(No_Stream_Optimizations); pragma Restrictions(No_Streams); pragma Restrictions(No_Task_Allocators); pragma Restrictions(No_Task_At_Interrupt_Priority); pragma Restrictions(No_Task_Attributes_Package); pragma Restrictions(No_Task_Hierarchy); pragma Restrictions(No_Tasking); pragma Restrictions(No_Task_Termination); pragma Restrictions(No_Terminate_Alternatives); pragma Restrictions(No_Unchecked_Access); pragma Restrictions(No_Unchecked_Conversion); pragma Restrictions(No_Unchecked_Deallocation); pragma Restrictions(No_Wide_Characters); pragma Restrictions(Pure_Barriers); pragma Restrictions(Simple_Barriers); pragma Restrictions(Static_Priorities); pragma Restrictions(Static_Storage_Size); pragma Validity_Checks(ALL_CHECKS);

Towards the end of this article series, we will investigate the impact of such "fascism" on program performance, and the possibility of cautiously replacing certain bounds checks with formal proof of nonoverflowability.

Let's proceed to the builder. Observe that GnuMake was not listed in the list of required programs for this tutorial. Instead we make use of GNAT's "GprBuild", a substantially cleaner and better-behaved item:

*ffa.gpr*:

project FFA is for Object_Dir use "obj"; type Mode_Type is ("debug", "release"); Mode : Mode_Type := external ("mode", "release"); for Languages use ("Ada"); for Source_Dirs use ("."); for Library_Dir use "lib"; for Library_Name use "FFA"; for Library_Kind use "static"; package Binder is case Mode is when "debug" => for Switches ("Ada") use (); when "release" => for Switches ("Ada") use ("-static", "-r"); end case; end Binder; package Builder is for Switches ("Ada") use ("-nostdlib"); end Builder; package Compiler is case Mode is when "debug" => for Switches ("Ada") use ("-g"); when "release" => for Switches ("Ada") use ("-O2", "-fdump-scos", "-gnata", "-fstack-check", "-fdata-sections", "-ffunction-sections", "-gnatec=" & FFA'Project_Dir & "restrict.adc"); end case; end Compiler; end FFA;

Now let's proceed to the only machine-dependent section of FFA.

The supplied *iron.ads* pre-supposes eight-bit bytes, and a 64-bit CPU:

*iron.ads*:

package Iron is pragma Pure; -------------------------------------- -------- For a 64-bit system: -------- -------------------------------------- MachineBitness : constant Positive := 64; MachineBitnessLog2 : constant Positive := 6; -- log2(64) -------------------------------------- -- Bits per Byte ByteBits : constant Positive := 8; end Iron;

If this is not what you have, change the constants, e.g.:

------------------------------------ ------ For a 32-bit system: -------- ------------------------------------ MachineBitness : constant Positive := 32; MachineBitnessLog2 : constant Positive := 5; -- log2(32) ------------------------------------

The purpose of *MachineBitnessLog2* will become apparent later; ignore it for now.

Let's proceed to the most fundamental building block of FFA, the machine *Word*. A *Word*, from this point on, is simply a single machine word on your particular machine. It is expressed as an Ada "modular type".

*words.ads*:

with Iron; package Words is pragma Pure; -- The most fundamental fact about Word: width. Bitness : constant Positive := Iron.MachineBitness; -- It is possible to calculate BitnessLog2 at elaboration time, -- but we will avoid having ~any~ elaboration at all in FFA. BitnessLog2 : constant Positive := Iron.MachineBitnessLog2; -- The Word width, expressed in ~bytes~: Byteness : constant Positive := Bitness / Iron.ByteBits; -- What kind of words to use. Must be machine-word or smaller. type Word is mod 2**Bitness; for Word'Size use Bitness; -- The very same Word, but its only legal values are 0 and 1. subtype WBool is Word range 0 .. 1; -- When we must refer to individual bit positions of a machine word: subtype WBit_Index is Natural range 0 .. Bitness - 1; end Words;

A *WBool* is simply a *Word* which is permitted to contain strictly Boolean (0, 1) values. The compiler and runtime will enforce this restriction.

GNAT needs a little bit of help in making the full set of bitwise operations available for use on *Word*.

For some peculiar reason, the Ada standards group made the fundamental Shift and Rotate bitwise ops into optional components.

Fortunately, on GNAT we can force them to exist (On a non-GNAT compiler, you're on your own) :

*w_shifts.ads*:

with Words; use Words; package W_Shifts is pragma Pure; function Shift_Left (Value : Word; Amount : Natural) return Word; pragma Import(Intrinsic, Shift_Left); function Shift_Right (Value : Word; Amount : Natural) return Word; pragma Import(Intrinsic, Shift_Right); function Rotate_Left (Value : Word; Amount : Natural) return Word; pragma Import(Intrinsic, Rotate_Left); function Rotate_Right (Value : Word; Amount : Natural) return Word; pragma Import(Intrinsic, Rotate_Right); end W_Shifts;

Now let's proceed to the most fundamental arithmetic operations on *Word*s:

*word_ops.ads*:

with Words; use Words; -- Fundamental Operations on Words: package Word_Ops is pragma Pure; -- Branch-free calculation of 'carry' from a machine-word addition. function W_Carry(A : in Word; B : in Word; S : in Word) return WBool; -- Branch-free calculation of 'borrow' from a machine-word subtraction. function W_Borrow(A : in Word; B : in Word; D : in Word) return WBool; -- Without any branching: if Sel == 0, return A; if Sel == 1, return B. function W_Mux(A : in Word; B : in Word; Sel : in WBool) return Word; -- Exchange A and B. procedure W_Swap(A : in out Word; B : in out Word); end Word_Ops;

*word_ops.adb*:

with W_Shifts; use W_Shifts; -- Fundamental Operations on Words: package body Word_Ops is

But wait.

Ada -- like C -- does not give us any *portable* method of determining the over/underflow from output from machine arithmetic. And, it turns out, there existed in the past -- and apparently exist today -- CPUs made by idiots and wreckers (e.g. 'RISC-V') that do not have flags at all!

However, for multi-word addition, subtraction, the inner loop of Comba's multiplication, and for a handful of other ops, we demand it.

So we derive the Carry or Borrow at the 'eldest' binary position, without using any inline assembly or special compiler features:

-- Find the Carry, from an addition where it is known that A + B == S: function W_Carry(A : in Word; B : in Word; S : in Word) return WBool is begin return WBool(Shift_Right((A and B) or ((A or B) and (not S)), Bitness - 1)); end W_Carry; pragma Inline_Always(W_Carry); -- Find the Borrow, from a subtraction where it is known that A - B == D: function W_Borrow(A : in Word; B : in Word; D : in Word) return WBool is begin return WBool(Shift_Right(((not A) and B) or (((not A) or B) and D), Bitness - 1)); end W_Borrow; pragma Inline_Always(W_Borrow);

To derive the Carry from the addition *S* of words *A* and *B*, we take the highest bit ( i.e. *Shift_Right( ... , Bitness - 1)* ) of the result of *(A and B) or ((A or B) and (not S)*.

To derive the Borrow from the subtraction *D* of words *A* and *B*, we take the highest bit ( i.e. *Shift_Right( ... , Bitness - 1)* ) of the result of *((not A) and B) or (((not A) or B) and D)*.

This derivation of the flags, strictly from the inputs and outputs of a machine-word addition or subtraction, is the single most tricky item in this article. Be prepared to spend some time drawing the circuit (of a *full adder*, with carry-in and carry-out) to verify the offered proof:

A+B+C is the output bit of 1-bit adder; C is carry-in; A-B-C is the output bit of 1-bit subber; C is borrow-in. Observe that A+B+C is equal to A-B-C for all A,B,C ! +-+-+-+-----+--------------+-----+----------------+ | | 'Carry' out: | | 'Borrow' out: | +-+-+-+-----+--------------+-----+----------------+ | | | | |(a and b) or | |(~a and b) or | |A|B|C|A+B+C| ((a or b) and|A-B-C| ((~a or b) and | | | | | | ~(A+B+C)) | | (A-B-C)) | +-+-+-+-----+--------------+-----+----------------+ |0|0|0| 0 | 0 | 0 | 0 | +-+-+-+-----+--------------+-----+----------------+ |1|0|0| 1 | 0 | 1 | 0 | +-+-+-+-----+--------------+-----+----------------+ |0|1|0| 1 | 0 | 1 | 1 | +-+-+-+-----+--------------+-----+----------------+ |1|1|0| 0 | 1 | 0 | 0 | +-+-+-+-----+--------------+-----+----------------+ |0|0|1| 1 | 0 | 1 | 1 | +-+-+-+-----+--------------+-----+----------------+ |1|0|1| 0 | 1 | 0 | 0 | +-+-+-+-----+--------------+-----+----------------+ |0|1|1| 0 | 1 | 0 | 1 | +-+-+-+-----+--------------+-----+----------------+ |1|1|1| 1 | 1 | 1 | 1 | +-+-+-+-----+--------------+-----+----------------+ --- This base case extends to any N bit register, since --- both equations depend ~strictly~ on A, B, and C.

Now study how the Mux and Swap operations work. Observe that no transfer-of-control CPU instructions are required:

-- Without any branching: if Sel == 0, return A; if Sel == 1, return B. function W_Mux(A : in Word; B : in Word; Sel : in WBool) return Word is begin return B xor ((Sel - 1) and (A xor B)); end W_Mux; pragma Inline_Always(W_Mux); -- Exchange A and B. procedure W_Swap(A : in out Word; B : in out Word) is begin A := A xor B; B := A xor B; A := A xor B; end W_Swap; pragma Inline_Always(W_Swap); end Word_Ops;

Now we are finally ready to define *FZ* -- the fundamental FFA type: an unsigned integer of fixed width, i.e. a contiguous array of machine words.

*fz_type.ads*:

with Words; use Words; package FZ_Type is pragma Pure; -- Indices of all indexable items: type Indices is new Natural; -- Index of a particular Word in an FZ: subtype Word_Index is Indices; -- The FZ, in person! I.e. a bignum of permanently fixed bitness. type FZ is array(Word_Index range <>) of Word; -- A count of Words in an FZ (cannot be 0): subtype Word_Count is Indices range 1 .. Indices'Last; -- An index of a particular ~bit~ in an FZ: subtype FZBit_Index is Indices; end FZ_Type;

And finally, our first multi-word FFA arithmetical operations: ordinary addition and subtraction, with *carry output*.

Observe that a program which invokes FFA is expected to allocate memory of the proper size for each FFA integer; and that no such thing as normalization is permitted or permissible in FFA -- it would result in branches-on-secrets, i.e. timing side-channels. An addition or a subtraction of a pair of FFA integers with no bits set (i.e. all bits, in each word, equal 0) takes exactly the same amount of time, and is performed via the same sequence of CPU instructions, as of two FFA integers with all bits set (i.e. all bits, in each word, equal 1), or of any other possible integers of the same bitness (i.e. number of words.) In other words, the algorithmic complexity of FFA's arithmetical operations is always independent of the Hamming weights of the given operands.

*fz_arith.ads*:

with Words; use Words; with FZ_Type; use FZ_Type; -- Fundamental Arithmetic operators on FZ: package FZ_Arith is pragma Pure; -- Sum := X + Y; Overflow := Carry procedure FZ_Add(X : in FZ; Y : in FZ; Sum : out FZ; Overflow : out WBool); pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length); -- Difference := X - Y; Underflow := Borrow procedure FZ_Sub(X : in FZ; Y : in FZ; Difference : out FZ; Underflow : out WBool); pragma Precondition(X'Length = Y'Length and X'Length = Difference'Length); end FZ_Arith;

Observe that the preconditions for addition and subtraction demand equal bitnesses for pairwise FFA operators. This pattern will likewise hold in the algorithms we will discuss in later articles.

And, now at last, the FZ adder and FZ subtracter:

*fz_arith.adb*:

with Word_Ops; use Word_Ops; -- Fundamental Arithmetic operators on FZ: package body FZ_Arith is -- Sum := X + Y; Overflow := Carry procedure FZ_Add(X : in FZ; Y : in FZ; Sum : out FZ; Overflow : out WBool) is Carry : WBool := 0; begin for i in X'Range loop declare A : constant Word := X(I); B : constant Word := Y(I); S : constant Word := A + B + Carry; begin Sum(i) := S; Carry := W_Carry(A, B, S); end; end loop; Overflow := Carry; end FZ_Add; pragma Inline_Always(FZ_Add); -- Difference := X - Y; Underflow := Borrow procedure FZ_Sub(X : in FZ; Y : in FZ; Difference : out FZ; Underflow : out WBool) is Borrow : WBool := 0; begin for i in 0 .. Word_Index(X'Length - 1) loop declare A : constant Word := X(X'First + i); B : constant Word := Y(Y'First + i); S : constant Word := A - B - Borrow; begin Difference(Difference'First + i) := S; Borrow := W_Borrow(A, B, S); end; end loop; Underflow := Borrow; end FZ_Sub; pragma Inline_Always(FZ_Sub); end FZ_Arith;

*The alert reader might ask why the iteration schemes differ. This will be discussed in a subsequent chapter, when we walk through multiplication and the peculiarities of Ada's "array slicing" construct.*

Now let's define a simple mechanism for printing FFA integers to the console:

*ffa_io.ads*:

with Words; use Words; with FZ_Type; use FZ_Type; package FFA_IO is -- Character representation of a Word type WChars is array(1 .. 2 * Byteness) of Character; -- Obtain the WChars corresponding to the given Word function W_To_WChars(N : Word) return WChars; -- Display a hex representation of W to stdout procedure Dump(W : in Word); -- Display a hex representation of N to stdout procedure Dump(N : in FZ); end FFA_IO;

*ffa_io.adb*:

with Ada.Text_IO; use Ada.Text_IO; with Words; use Words; with W_Shifts; use W_Shifts; with FZ_Type; use FZ_Type; package body FFA_IO is -- Obtain the WChars corresponding to the given Word function W_To_WChars(N : Word) return WChars is H : constant array(0 .. 15) of Character := "0123456789ABCDEF"; W : Word := N; Result : WChars; begin for b in WChars'Range loop -- From bottom to top: Result(B) := H(Natural(W and 16#F#)); -- Get current nibble. W := Shift_Right(W, 4); -- Get the next nibble. end loop; return Result; end W_To_WChars; -- Display a hex representation of W to stdout procedure Dump(W : in Word) is T : WChars := W_To_WChars(W); begin for i in reverse T'Range loop Put(T(i)); end loop; end Dump; -- Display a hex representation of N to stdout procedure Dump(N : in FZ) is begin for i in reverse N'Range loop Dump(N(i)); end loop; end Dump; end FFA_IO;

Nothing particularly complicated here.

Now for Chapter 1's demo routine itself:

*demo_ch1.ads*:

package Demo_Ch1 is procedure Demo_Add_Sub; end Demo_Ch1;

In this Chapter, we do not yet have a means for programmatically setting the value of an FFA integer other than via Ada's array assignment construct. But it will suffice for now, as we wish to simply explore addition and subtraction:

*demo_ch1.adb*:

-- From Ada: with Ada.Text_IO; use Ada.Text_IO; -- From FFA: with Words; use Words; with FZ_Type; use FZ_Type; with FZ_Arith; use FZ_Arith; -- From the Demo: with FFA_IO; use FFA_IO; package body Demo_Ch1 is procedure Demo_Add_Sub is -- We are using 64-bit Words (see iron.ads). -- We'll begin with an elementary demo, using 256-bit FZ: X : FZ(1 .. 4) := (0, 0, 0, 0); Y : FZ(1 .. 4) := (16#5555#, 0, 0, 0); Z : FZ(1 .. 4) := (0, 0, 0, 0); -- Carry. C : WBool := 0; begin Put_Line("X ="); Dump(X); New_Line; Put_Line("Y ="); Dump(Y); New_Line; FZ_Add(X, Y, Z, C); Put_Line("X + Y ="); Dump(Z); New_Line; Put_Line("C = " & WBool'Image(C)); FZ_Sub(X, Y, Z, C); Put_Line("X - Y ="); Dump(Z); New_Line; Put_Line("C = " & WBool'Image(C)); end Demo_Add_Sub; end Demo_Ch1;

Our "main" :

*ffa_demo.adb*:

with Demo_Ch1; use Demo_Ch1; procedure FFA_Demo is begin Demo_Add_Sub; end FFA_Demo;

And finally, the "builder" in the *ffademo* directory.

This illustrates how the static library produced by *libffa*'s builder is put to use:

*ffa_demo.gpr*:

with "../libffa/ffa.gpr"; project FFA_Demo is for Object_Dir use "obj"; type Mode_Type is ("debug", "release"); Mode : Mode_Type := external ("mode", "release"); for Languages use ("Ada"); for Source_Dirs use ("."); for Exec_Dir use "bin"; for Main use ("ffa_demo.adb"); package Compiler is case Mode is when "debug" => for Switches ("Ada") use ("-g"); when "release" => for Switches ("Ada") use ("-O2", "-fdump-scos", "-gnata", "-fstack-check", "-fdata-sections", "-ffunction-sections"); end case; end Compiler; package Binder is case Mode is when "debug" => for Switches ("Ada") use (); when "release" => for Switches ("Ada") use ("-static"); end case; end Binder; package Linker is case Mode is when "debug" => for Switches ("Ada") use (); when "release" => for Switches ("Ada") use ("-Wl,--gc-sections", "-static"); end case; end Linker; end FFA_Demo;

At this point, the reader should be reasonably confident that the Chapter 1 demo will not format his HDD; and perhaps even has a reasonable supposition as to what the output will be.

Let's run it:

./bin/ffa_demo

And this is what you should see:

X = 0000000000000000000000000000000000000000000000000000000000000000 Y = 0000000000000000000000000000000000000000000000000000000000005555 X + Y = 0000000000000000000000000000000000000000000000000000000000005555 C = 0 X - Y = FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAAAB C = 1

Now it should be clear to the reader, that we are simply emulating a "stretched" version of something resembling a typical CPU's ALU: a mechanism which arithmetizes on registers of fixed physical size.

*~To be continued!~*

Hi,

Great post, eagerly awating its sequels.

However, given the requirement for no "perlisms", your decision to

implement the swap operation with xor instead of using a temporary

variable confuses me. Is it because of "cache-bleed" shenanigans

or am I way off the mark?

Cheers

João

Dear jooliv,

The fast-swap-with-xor trick dates to the 1950s, and somehow I assumed it were known -- and immediately comprehensible -- to everyone who has ever programmed anything. To the point where I see swaps via temp variable as peculiar and in need of justification.

Yours,

-S

> To the point where I see swaps via temp variable as peculiar and in need of justification.

xor swap seems to me a rake, which would hit you if the swapped values happen to be stored in the same location. This seems to be against fit-in-head philosophy.

Dear apeloyee,

I will point out that W_Mux, abused in this way, will also fail in exactly the same way. And, by extension, FZ_Mux.

But

whywould you call either W_Swap(X, X) or W_Mux(X, X, ...) ? Seems to me thatthatwould be the placement of the rake on the ground and the stepping thereon at the same moment.I banned pointers ("Access variables", in Ada-land), ergo the situation cannot arise spontaneously at run time. I wonder if GNAT is smart enough to take a precondition banning identical args for these without shitting out runtime checks into the built binary.

Incidentally, after reading your comment I went to look where in FFA W_Swap is actually used, and found that it is used nowhere other than in FZ_Swap, which itself is used nowhere (in FFA, that is.)

Possibly FFA has no business offering swap at all, the user can create it himself as FZ are simply arrays, which are assignable/copyable entities in Ada. Perhaps FZ_Swap ought to be thrown out entirely?

Edit: and now also a thread.Yours,

-S

> Possibly FFA has no business offering swap at all, the user can create it himself as FZ are simply arrays, which are assignable/copyable entities in Ada. Perhaps FZ_Swap ought to be thrown out entirely?

I agree.

> I will point out that W_Mux, abused in this way, will also fail in exactly the same way. And, by extension, FZ_Mux.

W_Mux is a function, therefore uses temporary vars and will, in fact, work. FZ_Mux therefore also.

The xor swap technique is well-known, as you say. It is also well-known that it produces incorrect answers when the two inputs are the same register (again, as is discussed). To my mind, that would be reason enough to avoid it: The construct being used is known to be incorrect in some cases, meaning that correctness analysis then has to look further afield to the places where that code is called, rather than being self-contained.

Personally, I find the temporary variable version clearer in intent, particularly when other "bit magic" things are happening. However, I feel there is a more significant reason for using a temporary variable: it frequently reduces down to no work at all. Since the function is inlined, the compiler has full knowledge of the registers containing the specific values. Propagating the register analysis through the swap function (when written as three assignments using the temporary variable) simply results in an adjustment of the compilers internal variable register mapping at that point. In contrast, the xor version requires the compiler to emit three instructions unless its analysis is sophisticated.

Dear FTWW,

Xor swap was abolished in Ch. 2, for the reasons you described.

Yours,

-S

* a pure function

Hello Stanislav,

I had problems compiling the code using gnat-gpl-2017 (AdaCore) from Gentoo repositories. Turned out, `pragma Inline_Always(...)` statements must be moved from .adb to .ads files. With GNAT 2016 from AdaCore (gcc 4.9) everything compiles fine.

On the other hand, looks that GNAT 2017 actually did the inlining when compiling the fixed sources (https://pastebin.com/DMkB8q2t), unlike GNAT 2016 with the original sources (https://pastebin.com/fS8BYrZS). With the fixed sources, GNAT 2016 does inlining as well.

Dear BT,

The paste has vanished?

Yours,

-S

The pastes did not vanish, but the blog engine considers closing parentheses to be part of URL -- they are not.

To complete the report:

1. Patch that I applied to sources: http://p.bvulpes.com/pastes/t9cyX/?raw=true

2. Error logs from gprbuild and nm output on libFFA.a. GNAT 2017 removes bodies of inlined functions from the static library.

2.1 Original source, GNAT 2016: http://p.bvulpes.com/pastes/D95yq/?raw=true

2.2 Original source, GNAT 2017: http://p.bvulpes.com/pastes/yWrT8/?raw=true

2.3 Fixed source, GNAT 2016: http://p.bvulpes.com/pastes/PHiLb/?raw=true

2.4 Fixed source, GNAT 2017: http://p.bvulpes.com/pastes/NhkCU/?raw=true

3. Objdump outputs for the bin/ffademo (only relevant parts):

3.1 Original source, GNAT 2016: http://p.bvulpes.com/pastes/EBXjr/?raw=true

3.2 Original source, GNAT 2017: build failed.

3.3 Fixed source, GNAT 2016: http://p.bvulpes.com/pastes/yMvy1/?raw=true

3.4 Fixed source, GNAT 2017: http://p.bvulpes.com/pastes/ryNPg/?raw=true

Dear BT,

Thank you for running this test.

I did know that 2016 GNATs -- GCC and AdaCore both -- leave inlined routine bodies sitting in the binary. (They actually do the inlining, I disasmed the output and looked. But they

alsoleave the routines sitting, as if they had not been also inlined.) What I did not know is that this has been fixed in 2017. This is actually a desirable behaviour from the point of view of acompleteFFA library -- the routines that actually comprise the user-callable interface, will not have inline pragmas. But for the demo articles, a GNAT that snips off inlined routines is a problem.I will have to think about how to handle this -- but a solution that leaves the thing unbuildable or ill-behaved with GNAT-2016 will not be considered.

Yours,

-S

Minor formatting nitpick

Each of the source files in the genesis has a copy-pasted license header, for many composing most of the lines in the file, for almost all composing most of the words in the file. This seems somewhat obscene for a piece of text that boils down to "the concept of license headers is statist braindamage, these words cannot bind you".

Are there strong drawbacks to consolidating them into a single LICENSE file, at least?

Dear Nonymous,

Read ( and consider joining ) the discussion of this, here.

And it may please you to learn that many of the files will get slightly longer.

Yours,

-S

The files will get longer, though hopefully not too much for individual tests / iron / w_shifts / etc; the individual first-chapter patch, kind of by definition, will not.

I will grant that as a spoonful of tar to find at the top of the file when you open it, the suggested ascii pr0n would be more distracting; and possibly useful if the browser presentation was hard to amend to display an equivalent graphic in 24-bit True Color.

I'm curious what happens with xor swaps when the integers are close to max for the word size, is there an overflow flag in Gnat?

This type of bootstrapping up a useful system is very useful independent of the application, as to my mind, we get to explicitly find out our all our mistaken assumptions, so I'll be following this a closely as I can.

Dear Dave,

Try as I might, I am unable to make proper sense of your question.

On what planet does XOR overflow ? It isn't an arithmetic op.

Wordis a "modular type", analogous to "unsigned int" in C. When we add and subtract, portably determining overflow requires the elaborate dance shown in this chapter. But what does this have to do with XOR ?Yours,

-S

My mistake, when I first saw this I thought that you were packing all the bits of both numbers into one location, hence the comment on overflow. Now I've looked it up I see how it works.

signature ffa_ch1_genesis.vpatch.peterl.sig

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1

iQIcBAABAgAGBQJaNUCPAAoJELTi4Nystj0lifQP/2+VBlFeAG+expdmgy86HhES

WAKlz9Cj/aRITopzTai5qDOqqfZo9YgSNxii5L4JTSIu6gmvaBkPP3uSp68lyUn8

pr8CMIm8L2LPTeX0QXZzjxJI4Oz94HPc+4WOLuSrOxezXnvezndCairnCTAKl7Eh

J1qnNJ7uF9Cd7B46lLzhZW7Esm9Y83xnV5ccew+KoT+6Vm091997O/9Q/6WM72Ys

hEgzq7QFSPtmd1EqkpLkQ1rN6JwzlAPSN43KlP+pd/TrH/J853qdVZ8IXGyaVHsx

P859k6M1/wuwl8LcHwsCP+1PRxtnC43vp+r03HaIpaetl7wapvd+cLLg0OpnDT3W

q4cxcFSROm7chqOQAISKTtEGhPimaOAnTi29fu14uJ1VOVAAF0+hw+QHpb0VJTZ6

MKeTREzuoA1/9mgbnL9zn1U5SkAqbo8Vgzpjxb0kpNahZORC+Mv4IcDEYE5jx5Xs

kYRJehFj5dAl9EwHXLBjB2WsT15Yj58V6NGUno5Q+gY4u3OY0/LEnOs4SfbGpm3S

RFFBVPwn6Y/P2TP2FB11CJVB5W25s6Vf9iPsR5Uhu2SJpzlcO2zVF0ycnW1/AJgC

qtISofNGfYU6bP9DjEqVk+QBzk7xV9rNDt8T2mQJwjULysFbIUbIoo9fkrSFYuMG

mCmDxi2TyLt03gbzNaIM

=jASF

-----END PGP SIGNATURE-----

Dear PeterL,

Congrats on becoming the first Ch. 1 graduate !

I have perma-hosted your seal here.

Yours,

-S

Great stuff, Stan.

I have pressed this patch, compiled the code, and checked the demo's output.

Dear BenVulpes,

Congrats!

I have perma-hosted your seal here.

Yours,

-S

Thank you for writing this, Stan! I have added my signature below:

-----BEGIN PGP SIGNATURE-----

`iQIcBAABCgAGBQJaRPeAAAoJEL2unQUaPTuVcsAP/RVIdK4oHD88V/ksZ1wZLvYp`

UP6Q7C6iwjQUOGBRXmdNACek+XmpOLl7zwl3waipeBqTkTkjnJHfF4iGXf/0J6Yy

OI7mw+V9X+A9YsFVNuI2CuRRySa4MSYLgvPtnq0/MzgwUFjUaKtvuRAng5q8fcEB

nGs5Wd0BK0nHto72pHa3QwJiNh+sCQZu3EhNBb2W53ypE7TGxsCJQs1H6CZ2ZHur

iRUpjdw6ZA9Kh3JSZbN6qWOEOuukjTw5es84OkDc4SDLt6MYmrsKneiZ2kdVkx02

MqQDwzOafND+nZO0Oi4xmXWHM1ZwVOviLPvWacmThPfmlk+VSr+TmWk8jetSQx9x

FSCxvZjfC3nDRxInYbJS71PBd+inAH7MwA2svwA/ug9+hp7kt20CJ5JlcIt5PPCY

A6DkZcUiV9/1C7sjkzIYbgE5ZyMRIV4o44SDcyzNBVQK6F3Pb4OE0kNIabQSSisa

IIjOIqeiIJ4JHZ4sTdjk/mHABEIDhQTq8WHcgqPxQLN1pdIVAO+MKB2nL7kaZlI2

sHKqfktarXzU04J8HFEgfHpzDckHZCwd8IpVTqCgiohK+iiFvofqZwlx6uIqc2vB

eSO+BrMye/bFFHeAbQ7YQLeZ5pTgT1DWkLTEkb3Q5uyW2AzoyKPiA6t/gzAVzoR5

ftKEZi8n7Chi4vwwtRJU

=DX0Y

-----END PGP SIGNATURE-----

Dear spyked,

Congrats!

I have perma-hosted your seal here.

Yours,

-S

Dear Stanislav,

Thank you for this article!, I can build it fine on a MUSL GNAT installation.

I've put my notes and comment on reading this article here..

Cheers,

Arjen

It seems the inlining pragma needs to be in the specification not in the implementation file, report here.

Cheers,

Arjen

I feel late to the party; oh well. This is already a nice learning resource for someone green to Ada, as I'm. This is your first article categorized under

Adaand I haven't read all of your previous articles, so how did you go about learning not only Ada, but GPRbuild and other such things, Stanislav?My main thoughts on FFA so far are the following; I'm commenting without being familiar with any of the later articles in this series, of course:

Why use GNAT to provide Shift_Left and Shift_Right when you can easily use multiplication and division by powers of two for this?

Shouldn't there be a better means to get non-branching behavior in W_Mux, such that a sufficiently clever compiler can't notice and optimize it into a conditional instruction anyway? Also, an alternative implementation that comes to mind, although not necessarily fulfilling this, is to treat it like APL and have a two-element vector filled with those parameters that you select the zeroeth or first element from.

Why is the type FZ not private? Since there's no normalization to the smallest size required, you don't open yourself to erroneous situations with it, but this seems like something that no program using FFA should need know the inner workings of.

Why do you use Pragma for preconditions instead of the Pre aspect?

I've already asked why you don't use child packages and you've already told me it was a personal preference, if I recall correctly and in sufficient detail.

Dear Verisimilitude,

...so how did you go about learning not only Ada, but GPRbuild and other such things...See Mocky's review, he collected many of the relevant conversations in one place.

> Why use GNAT to provide Shift_Left and Shift_Right when you can easily use multiplication and division by powers of two for this?Neither GNAT nor the GCC backend it is based on offers any hard guarantee that X * 2^k is compiled into a left shift, or X / 2^k into a right shift.

> is to treat it like APL and have a two-element vector filled with those parameters that you select the zeroeth or first element fromIf you had read

thisfirst chapter attentively, you would know thatindexing memory by secret values is banned in FFA.Permanently, with no exceptions.> Why is the type FZ not private?If you had read any of the subsequent chapters, you would know the answer to this.

> Why do you use Pragma for preconditions instead of the Pre aspect?Ditto, as above.

Yours,

-S

> Neither GNAT nor the GCC backend it is based on offers any hard guarantee that X * 2^k is compiled into a left shift, or X / 2^k into a right shift.

I suppose you want to avoid the hardware multiplier for this, then.

> If you had read this first chapter attentively, you would know that indexing memory by secret values is banned in FFA. Permanently, with no exceptions.

I figured that thought would likely result in code that was internally conditional or had side-channel issues. That makes sense. I suppose the only way to be certain of some properties is to inspect the compiled code.

I'll make my way through the next chapters, eventually, and continue commenting in turn.

Dear Verisimilitude,

> I suppose you want to avoid the hardware multiplierMultiplication and division by powers of 2 have no business tying up the iron multiplier, correct.

> code that was internally conditional or had side-channel issuesRead up re

cachebleedand related effects.> inspect the compiled codeDefinitely mandatory prior to any battlefield use.

> I’ll make my way through the next chaptersEnjoy. I recommend to also do the hands-on exercises.

Yours,

-S

Here is my signature for Ch.1, read, pressed, and demo ran as expected. I will mirror all patches and subsequent seals on my site here: http://btc.info.gf/devel/ada/ffa/

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEJg+le85nelwEv2C6SnWIPMGx00wFAl3QHH0ACgkQSnWIPMGx

00xsURAAjdvpPNueE+PjfVmtnA4sGcAt/nBiQ5ATiaP0tLjkG/YR59G6kRR1Svjr

9FNX2vOyXvJWfGUhkfZphL8cfanSmpIOplEPEx000I4m70OxVY2esglKvRX9tFtO

WkysofkDPvo5/DWNJfb14K0o0SyjEmAkBzIrnocL4B+V1BXjAcp9tjMPXgRBtIgk

WA0gGBKKdXfe0JJXGhc8Ny94/4uozb4or5L9v3GaUaovfaEF6Gr84LRYtD+nB9I6

dYC/IXMM6AAeJMwv+QN8PAeqWk8Vr+xhviJ9CgeyQsKTyR75vV34ivtjS0ibC6AM

rfNzv2bDWjbtN7ycM+DvMjjjURmAk29VsAhygXMiYkKUjHZsMsxLyTWjHUgUE7Kl

NWJOIEmmDG7tl+osotzubYi+8Y0tksl9TrJ2FSrlq80Z8+gHtj/2+jC3SobxKwo+

bsz89nElL/zgzTq6umb6dpdYoaRjA5fC7rCDhl55CMUi/s2O9zf0dW2X1cY9GcNw

k6i7boGdbG+O8nhphWy7UFCQYc9l+047BzqZGvSmbl6XQofrt6uKt83JQh05UqMd

kWqbEw9LJMmOhAmSfR7iB2FLzx83soApnCd7cwb1/UesDKXixDHEG0GMNoeS8gS0

Vw00eiBkWTw56OvXuwTmad3GKV1g3wtIenbLt18XR6dQFTauKd0=

=9fjm

-----END PGP SIGNATURE-----

Dear shinohai,

Congrats. I will mirror this sig with the others.

Yours,

-S

This morning asciilifeform pointed out that ffa patches got a keccak regrind, so I am resubmitting my signatures. I will update my mirror to remove old sigs as well.

ffa_ch1_genesis.kv.vpatch:

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEJg+le85nelwEv2C6SnWIPMGx00wFAl3Rr1IACgkQSnWIPMGx

00zX6Q//ZVYsLBCMxcYyYUoGJJCS5otFnHAKphVga8Q7br56tU7N7p+hxxk/8/Nf

oAvQuF5ko0fuSxdpaqKLxQ14OcJboumryHeV1yiPJWI4yGjMp5Wd4GJd8GqMyuzT

ReQJmb0Pf2Y8CrVASu0NfsPVAdmTVxkHN5+qVUQzWommSd+ifDx7S3VV5hkKzLbv

qBzyduKE9OMFr0x+Xa5m9Ax98mTVmrt34cQv4KXeDA20xsEIRgEn2b589oad8f/I

Rnns7wBRMcnE6H5wJHgo3hArjNglkij7UeeaBZvYwn6jEpy34LzTGIwvo6z0um2q

WC8vO3c8zpbIdh29tC9bPDzROXbX8p7YrmnU7f1M9HTxl4lvRjsofDljt6OhhfpT

9hueU4JYdGkvZD7OwErsX3bEvA/9U2PA9wCanp+DBGjscow1ZgUlqkyh0LmHYjpG

OZ02GTvFhQhHvL3Xh7FpYanCrMN/CP9je0LDE23rXLEKGiU6rckuvP6sY6sC/AVD

Xjvs1sn0PfeY3/2rn+tXZp0dc4OC3MIL5akL1UsnjqbyAVLyLDfxywyvF/YtqeP5

dUyQHKCumnU4E9pjT96h1r8cb7tT8OHTjMO4eWSgwgDs1VTCCocQS+//kA5R34jn

c65T3sBNxNzzL7NS62m3kmDyJHJtJFqjecokaGLEScn80Op0XWE=

=mYKy

-----END PGP SIGNATURE-----