“Finite Field Arithmetic.” Chapter 1: Genesis.

FFA, or the Finite Field Arithmetic library, differs from the typical Open Sores abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute.

This article is the first in a series, at the end of which the patient reader will have glued together, with his own hands — and fully fit into his head — a demonstrably-correct set of routines for multiple-precision arithmetic, sufficient for carrying out RSA and many other forms of number-theoretical cryptography.

The “bignum” arithmetic libraries currently in common use were, near as I can tell, all and without exception written by NSA assets and their various “useful idiots”. The conclusion is difficult to avoid, given as every example known to me suffers from one or more of the following ills:

FFA, on the other hand:

  • Written in a heavily-restricted subset of the Ada programming language — the only currently-existing nonproprietary statically-compiled language which permits fully bounds-checked, pointerolade-free code and practically-auditable binaries. We will be using GNAT, which relies on the GCC backend.
  • No external dependencies or libraries are made use of; suitable for embedded systems with limited memory. Language features which bloat the generated binary (e.g. OOP) are avoided.
  • No use of heap allocators. (In fact, wholly allocator-agnostic.) Memory requirement for all operations is readily calculable in advance from the width of the inputs.
  • The total set of components needed for RSA occupies around 3000 lines, incl. comments, of clear and unidiomatic (i.e. containing no “perlisms”) code.
  • Contains no inline assembler and is married to no particular machine (i.e. can be built for microcontrollers, MS-DOS boxes, CPUs of virtually any “bitness”)
  • Does not branch-on-secret-bits
  • Does not address memory by secret values, i.e. is cachebleed-proof
  • Fully unrollable, deterministic (and ergo auditable) control flow. I.e. all RSA modular-exponentiation operations will take exactly the same amount of time, via exactly the same flow of instructions; all algorithmic-complexity attack vectors are ruled out.

FFA will be presented to the reader in small pieces. Each should take more than one hour, but probably fewer than two, to fully study and understand.

You will need:

Just as “theater begins with the coat room”, each FFA article begins with a V-press. Set up your Vtron if you have not already done so, and press the “genesis” — “ffa_ch1_genesis”.

You will end up with two directories, libffa and ffademo. The former contains the subset of FFA discussed in this chapter; the latter — the demo routines for same. The reader is invited to modify the demo routines, experiment, learn.

Each subsequent chapter of the FFA series will contain a V-patch on the previous set, and the next — another, until the final article in the series: where “P” — an FFA-tronic GPG replacement — is introduced.

First, verify that your V-Tron, GNAT, and genesis press are in working order, by compiling the introductory demo:

cd ffademo
gprbuild

The output should resemble:

using project file ffa_demo.gpr
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections ffa_demo.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc word_ops.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc fz_type.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc fz_arith.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc w_shifts.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc iron.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc words.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections demo_ch1.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections ffa_io.adb
gprlib FFA.lexch
ar cr /home/you/ffa/libffa/lib/libFFA.a /home/you/ffa/libffa/obj/word_ops.o
/home/you/ffa/libffa/obj/fz_type.o ...
ranlib libFFA.a
gprbind ffa_demo.bexch
gnatbind ffa_demo.ali
gcc -c b__ffa_demo.adb
gcc ffa_demo.o -Wl,--gc-sections -static -o ffa_demo

Observe that the contents of the ‘libffa’ directory build recursively.

If you are an Apple victim, your OS prohibits static linking, and you may see barf instead. You may use the debug (dynamically linked) variant:

gprbuild -Xmode=debug

Do not run the demo yet. (you read programs before running, don’t you?) Instead, let’s take a careful look at what is inside.

Let’s begin with the restriction pragmas. These define a particular, constrained subset of the Ada programming language, by prohibiting the use of certain features. GNAT will balk if it encounters a prohibited feature in FFA or the demo routines.

The logic behind the restrictions was a combination of “guilty until proven innocent” — if we can do without a construct, we will — and “enable all safety features”:

restrict.adc:

pragma Restrictions(Immediate_Reclamation);
pragma Restrictions(Max_Asynchronous_Select_Nesting => 0);
pragma Restrictions(Max_Protected_Entries => 0);
pragma Restrictions(Max_Select_Alternatives => 0);
pragma Restrictions(Max_Task_Entries => 0);
pragma Restrictions(Max_Tasks => 0);
pragma Restrictions(No_Abort_Statements);
pragma Restrictions(No_Access_Parameter_Allocators);
pragma Restrictions(No_Allocators);
pragma Restrictions(No_Asynchronous_Control);
pragma Restrictions(No_Calendar);
pragma Restrictions(No_Coextensions);
pragma Restrictions(No_Default_Stream_Attributes);
pragma Restrictions(No_Delay);
pragma Restrictions(No_Dispatch);
pragma Restrictions(No_Dispatching_Calls);
pragma Restrictions(No_Dynamic_Attachment);
pragma Restrictions(No_Dynamic_Priorities);
pragma Restrictions(No_Entry_Calls_In_Elaboration_Code);
pragma Restrictions(No_Entry_Queue);
pragma Restrictions(No_Enumeration_Maps);
pragma Restrictions(No_Exception_Propagation);
pragma Restrictions(No_Exception_Registration);
pragma Restrictions(No_Finalization);
pragma Restrictions(No_Fixed_Io);
pragma Restrictions(No_Floating_Point);
pragma Restrictions(No_Implementation_Aspect_Specifications);
pragma Restrictions(No_Implementation_Units);
pragma Restrictions(No_Implicit_Aliasing);
pragma Restrictions(No_Implicit_Conditionals);
pragma Restrictions(No_Implicit_Dynamic_Code);
pragma Restrictions(No_Implicit_Heap_Allocations);
pragma Restrictions(No_Implicit_Protected_Object_Allocations);
pragma Restrictions(No_Implicit_Task_Allocations);
pragma Restrictions(No_Initialize_Scalars);
pragma Restrictions(No_Local_Protected_Objects);
pragma Restrictions(No_Local_Timing_Events);
pragma Restrictions(No_Multiple_Elaboration);
pragma Restrictions(No_Nested_Finalization);
pragma Restrictions(No_Protected_Type_Allocators);
pragma Restrictions(No_Protected_Types);
pragma Restrictions(No_Relative_Delay);
pragma Restrictions(No_Requeue_Statements);
pragma Restrictions(No_Secondary_Stack);
pragma Restrictions(No_Select_Statements);
pragma Restrictions(No_Specific_Termination_Handlers);
pragma Restrictions(No_Standard_Allocators_After_Elaboration);
pragma Restrictions(No_Stream_Optimizations);
pragma Restrictions(No_Streams);
pragma Restrictions(No_Task_Allocators);
pragma Restrictions(No_Task_At_Interrupt_Priority);
pragma Restrictions(No_Task_Attributes_Package);
pragma Restrictions(No_Task_Hierarchy);
pragma Restrictions(No_Tasking);
pragma Restrictions(No_Task_Termination);
pragma Restrictions(No_Terminate_Alternatives);
pragma Restrictions(No_Unchecked_Access);
pragma Restrictions(No_Unchecked_Conversion);
pragma Restrictions(No_Unchecked_Deallocation);
pragma Restrictions(No_Wide_Characters);
pragma Restrictions(Pure_Barriers);
pragma Restrictions(Simple_Barriers);
pragma Restrictions(Static_Priorities);
pragma Restrictions(Static_Storage_Size);
pragma Validity_Checks(ALL_CHECKS);

Towards the end of this article series, we will investigate the impact of such “fascism” on program performance, and the possibility of cautiously replacing certain bounds checks with formal proof of nonoverflowability.

Let’s proceed to the builder. Observe that GnuMake was not listed in the list of required programs for this tutorial. Instead we make use of GNAT’s “GprBuild”, a substantially cleaner and better-behaved item:

ffa.gpr:

project FFA is
 
  for Object_Dir use "obj";
 
  type Mode_Type is ("debug", "release");
  Mode : Mode_Type := external ("mode", "release");
 
  for Languages         use ("Ada");
  for Source_Dirs       use (".");
  for Library_Dir       use "lib";
  for Library_Name      use "FFA";
  for Library_Kind      use "static";
 
  package Binder is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-static", "-r");
     end case;
  end Binder;
 
  package Builder is
     for Switches ("Ada")
       use ("-nostdlib");
  end Builder;
 
  package Compiler is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ("-g");
        when "release" =>
           for Switches ("Ada")
             use ("-O2", "-fdump-scos", "-gnata", "-fstack-check",
                  "-fdata-sections", "-ffunction-sections",
                  "-gnatec=" & FFA'Project_Dir & "restrict.adc");
     end case;
  end Compiler;
 
end FFA;

Now let’s proceed to the only machine-dependent section of FFA.

The supplied iron.ads pre-supposes eight-bit bytes, and a 64-bit CPU:

iron.ads:

package Iron is
 
   pragma Pure;
 
   --------------------------------------
   -------- For a 64-bit system: --------
   --------------------------------------
   MachineBitness     : constant Positive := 64;
   MachineBitnessLog2 : constant Positive := 6; -- log2(64)
   --------------------------------------
 
   -- Bits per Byte
   ByteBits           : constant Positive := 8;
 
end Iron;

If this is not what you have, change the constants, e.g.:

   ------------------------------------
   ------ For a 32-bit system: --------
   ------------------------------------
     MachineBitness     : constant Positive := 32;
     MachineBitnessLog2 : constant Positive := 5; -- log2(32)
   ------------------------------------

The purpose of MachineBitnessLog2 will become apparent later; ignore it for now.

Let’s proceed to the most fundamental building block of FFA, the machine Word. A Word, from this point on, is simply a single machine word on your particular machine. It is expressed as an Ada “modular type”.

words.ads:

with Iron;
 
package Words is
 
   pragma Pure;
 
   -- The most fundamental fact about Word: width.
   Bitness     : constant Positive := Iron.MachineBitness;
 
   -- It is possible to calculate BitnessLog2 at elaboration time,
   -- but we will avoid having ~any~ elaboration at all in FFA.
   BitnessLog2 : constant Positive := Iron.MachineBitnessLog2;
 
   -- The Word width, expressed in ~bytes~:
   Byteness    : constant Positive := Bitness / Iron.ByteBits;
 
   -- What kind of words to use. Must be machine-word or smaller.
   type Word is mod 2**Bitness;
   for Word'Size use Bitness;
 
   -- The very same Word, but its only legal values are 0 and 1.
   subtype WBool is Word range 0 .. 1;
 
   -- When we must refer to individual bit positions of a machine word:
   subtype WBit_Index is Natural range 0 .. Bitness - 1;
 
end Words;

A WBool is simply a Word which is permitted to contain strictly Boolean (0, 1) values. The compiler and runtime will enforce this restriction.

GNAT needs a little bit of help in making the full set of bitwise operations available for use on Word.

For some peculiar reason, the Ada standards group made the fundamental Shift and Rotate bitwise ops into optional components.

Fortunately, on GNAT we can force them to exist (On a non-GNAT compiler, you’re on your own) :

w_shifts.ads:

with Words; use Words;
 
package W_Shifts is
 
   pragma Pure;
 
   function Shift_Left
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Shift_Left);
 
   function Shift_Right
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Shift_Right);
 
   function Rotate_Left
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Rotate_Left);
 
   function Rotate_Right
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Rotate_Right);
 
end W_Shifts;

Now let’s proceed to the most fundamental arithmetic operations on Words:

word_ops.ads:

with Words; use Words;
 
-- Fundamental Operations on Words:
package Word_Ops is
 
   pragma Pure;
 
   -- Branch-free calculation of 'carry' from a machine-word addition.
   function W_Carry(A : in Word; B : in Word; S : in Word)
                   return WBool;
 
   -- Branch-free calculation of 'borrow' from a machine-word subtraction.
   function W_Borrow(A : in Word; B : in Word; D : in Word)
                    return WBool;
 
   -- Without any branching: if Sel == 0, return A; if Sel == 1, return B.
   function W_Mux(A : in Word; B : in Word; Sel : in WBool)
                 return Word;
 
   -- Exchange A and B.
   procedure W_Swap(A : in out Word; B : in out Word);
 
end Word_Ops;

word_ops.adb:

with W_Shifts; use W_Shifts;
 
-- Fundamental Operations on Words:
package body Word_Ops is

But wait.

Ada — like C — does not give us any portable method of determining the over/underflow from output from machine arithmetic. And, it turns out, there existed in the past — and apparently exist today — CPUs made by idiots and wreckers (e.g. ‘RISC-V’) that do not have flags at all!

However, for multi-word addition, subtraction, the inner loop of Comba’s multiplication, and for a handful of other ops, we demand it.

So we derive the Carry or Borrow at the ‘eldest’ binary position, without using any inline assembly or special compiler features:

   -- Find the Carry, from an addition where it is known that A + B == S:
   function W_Carry(A : in Word; B : in Word; S : in Word)
                   return WBool is
   begin
      return WBool(Shift_Right((A and B) or ((A or B) and (not S)),
                               Bitness - 1));
   end W_Carry;
   pragma Inline_Always(W_Carry);
 
   -- Find the Borrow, from a subtraction where it is known that A - B == D:
   function W_Borrow(A : in Word; B : in Word; D : in Word)
                    return WBool is
   begin
      return WBool(Shift_Right(((not A) and B) or (((not A) or B) and D),
                               Bitness - 1));
   end W_Borrow;
   pragma Inline_Always(W_Borrow);

To derive the Carry from the addition S of words A and B, we take the highest bit ( i.e. Shift_Right( … , Bitness – 1) ) of the result of (A and B) or ((A or B) and (not S).

To derive the Borrow from the subtraction D of words A and B, we take the highest bit ( i.e. Shift_Right( … , Bitness – 1) ) of the result of ((not A) and B) or (((not A) or B) and D).

This derivation of the flags, strictly from the inputs and outputs of a machine-word addition or subtraction, is the single most tricky item in this article. Be prepared to spend some time drawing the circuit (of a full adder, with carry-in and carry-out) to verify the offered proof:

A+B+C is the output bit of 1-bit adder; C is carry-in;
A-B-C is the output bit of 1-bit subber; C is borrow-in.
Observe that A+B+C is equal to A-B-C for all A,B,C !
+-+-+-+-----+--------------+-----+----------------+
|           | 'Carry' out: |     | 'Borrow' out:  |
+-+-+-+-----+--------------+-----+----------------+
| | | |     |(a and b) or  |     |(~a and b) or   |
|A|B|C|A+B+C| ((a or b) and|A-B-C| ((~a or b) and |
| | | |     |    ~(A+B+C)) |     |    (A-B-C))    |
+-+-+-+-----+--------------+-----+----------------+
|0|0|0|  0  |      0       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|1|0|0|  1  |      0       |  1  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|1|0|  1  |      0       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|1|0|  0  |      1       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|0|1|  1  |      0       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|0|1|  0  |      1       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|1|1|  0  |      1       |  0  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|1|1|  1  |      1       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
--- This base case extends to any N bit register, since
--- both equations depend ~strictly~ on A, B, and C.

Now study how the Mux and Swap operations work. Observe that no transfer-of-control CPU instructions are required:

   -- Without any branching: if Sel == 0, return A; if Sel == 1, return B.
   function W_Mux(A : in Word; B : in Word; Sel : in WBool) return Word is
   begin
      return B xor ((Sel - 1) and (A xor B));
   end W_Mux;
   pragma Inline_Always(W_Mux);
 
   -- Exchange A and B.
   procedure W_Swap(A : in out Word; B : in out Word) is
   begin
      A := A xor B;
      B := A xor B;
      A := A xor B;
   end W_Swap;
   pragma Inline_Always(W_Swap);
 
end Word_Ops;

Now we are finally ready to define FZ — the fundamental FFA type: an unsigned integer of fixed width, i.e. a contiguous array of machine words.

fz_type.ads:

with Words; use Words;
 
package FZ_Type is
 
   pragma Pure;
 
   -- Indices of all indexable items:
   type Indices is new Natural;
 
   -- Index of a particular Word in an FZ:
   subtype Word_Index is Indices;
 
   -- The FZ, in person! I.e. a bignum of permanently fixed bitness.
   type FZ is array(Word_Index range <>) of Word;
 
   -- A count of Words in an FZ (cannot be 0):
   subtype Word_Count is Indices range 1 .. Indices'Last;
 
   -- An index of a particular ~bit~ in an FZ:
   subtype FZBit_Index is Indices;
 
end FZ_Type;

And finally, our first multi-word FFA arithmetical operations: ordinary addition and subtraction, with carry output.

Observe that a program which invokes FFA is expected to allocate memory of the proper size for each FFA integer; and that no such thing as normalization is permitted or permissible in FFA — it would result in branches-on-secrets, i.e. timing side-channels. An addition or a subtraction of a pair of FFA integers with no bits set (i.e. all bits, in each word, equal 0) takes exactly the same amount of time, and is performed via the same sequence of CPU instructions, as of two FFA integers with all bits set (i.e. all bits, in each word, equal 1), or of any other possible integers of the same bitness (i.e. number of words.) In other words, the algorithmic complexity of FFA’s arithmetical operations is always independent of the Hamming weights of the given operands.

fz_arith.ads:

with Words;   use Words;
with FZ_Type; use FZ_Type;
 
-- Fundamental Arithmetic operators on FZ:
package FZ_Arith is
 
   pragma Pure;
 
   -- Sum := X + Y; Overflow := Carry
   procedure FZ_Add(X          : in    FZ;
                    Y          : in    FZ;
                    Sum        : out   FZ;
                    Overflow   : out   WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length);
 
   -- Difference := X - Y; Underflow := Borrow
   procedure FZ_Sub(X          : in  FZ;
                    Y          : in  FZ;
                    Difference : out FZ;
                    Underflow  : out WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Difference'Length);
 
end FZ_Arith;

Observe that the preconditions for addition and subtraction demand equal bitnesses for pairwise FFA operators. This pattern will likewise hold in the algorithms we will discuss in later articles.

And, now at last, the FZ adder and FZ subtracter:

fz_arith.adb:

with Word_Ops; use Word_Ops;
 
-- Fundamental Arithmetic operators on FZ:
package body FZ_Arith is
 
   -- Sum := X + Y; Overflow := Carry
   procedure FZ_Add(X          : in  FZ;
                    Y          : in  FZ;
                    Sum        : out FZ;
                    Overflow   : out WBool) is
      Carry : WBool := 0;
   begin
      for i in X'Range loop
         declare
            A : constant Word := X(I);
            B : constant Word := Y(I);
            S : constant Word := A + B + Carry;
         begin
            Sum(i) := S;
            Carry  := W_Carry(A, B, S);
         end;
      end loop;
      Overflow := Carry;
   end FZ_Add;
   pragma Inline_Always(FZ_Add);
 
   -- Difference := X - Y; Underflow := Borrow
   procedure FZ_Sub(X          : in  FZ;
                    Y          : in  FZ;
                    Difference : out FZ;
                    Underflow  : out WBool) is
      Borrow : WBool := 0;
   begin
      for i in 0 .. Word_Index(X'Length - 1) loop
         declare
            A : constant Word := X(X'First + i);
            B : constant Word := Y(Y'First + i);
            S : constant Word := A - B - Borrow;
         begin
            Difference(Difference'First + i) := S;
            Borrow := W_Borrow(A, B, S);
         end;
      end loop;
      Underflow := Borrow;
   end FZ_Sub;
   pragma Inline_Always(FZ_Sub);
 
end FZ_Arith;

The alert reader might ask why the iteration schemes differ. This will be discussed in a subsequent chapter, when we walk through multiplication and the peculiarities of Ada’s “array slicing” construct.

Now let’s define a simple mechanism for printing FFA integers to the console:

ffa_io.ads:

with Words;   use Words;
with FZ_Type; use FZ_Type;
 
package FFA_IO is
 
   -- Character representation of a Word
   type WChars is array(1 .. 2 * Byteness) of Character;
 
   -- Obtain the WChars corresponding to the given Word
   function W_To_WChars(N : Word) return WChars;
 
   -- Display a hex representation of W to stdout
   procedure Dump(W : in Word);
 
   -- Display a hex representation of N to stdout
   procedure Dump(N : in FZ);
 
end FFA_IO;

ffa_io.adb:

with Ada.Text_IO; use Ada.Text_IO;
 
with Words;    use Words;
with W_Shifts; use W_Shifts;
with FZ_Type;  use FZ_Type;
 
package body FFA_IO is
 
   -- Obtain the WChars corresponding to the given Word
   function W_To_WChars(N : Word) return WChars is
      H      : constant array(0 .. 15) of Character := "0123456789ABCDEF";
      W      : Word := N;
      Result : WChars;
   begin
      for b in WChars'Range loop               -- From bottom to top:
         Result(B) := H(Natural(W and 16#F#)); -- Get current nibble.
         W         := Shift_Right(W, 4);       -- Get the next nibble.
      end loop;
      return Result;
   end W_To_WChars;
 
   -- Display a hex representation of W to stdout
   procedure Dump(W : in Word) is
      T : WChars := W_To_WChars(W);
   begin
      for i in reverse T'Range loop
         Put(T(i));
      end loop;
   end Dump;
 
   -- Display a hex representation of N to stdout
   procedure Dump(N : in FZ) is
   begin
      for i in reverse N'Range loop
         Dump(N(i));
      end loop;
   end Dump;
 
end FFA_IO;

Nothing particularly complicated here.

Now for Chapter 1’s demo routine itself:

demo_ch1.ads:

package Demo_Ch1 is
 
   procedure Demo_Add_Sub;
 
end Demo_Ch1;

In this Chapter, we do not yet have a means for programmatically setting the value of an FFA integer other than via Ada’s array assignment construct. But it will suffice for now, as we wish to simply explore addition and subtraction:

demo_ch1.adb:

-- From Ada:
with Ada.Text_IO; use Ada.Text_IO;
 
-- From FFA:
with Words;    use Words;
with FZ_Type;  use FZ_Type;
with FZ_Arith; use FZ_Arith;
 
-- From the Demo:
with FFA_IO;   use FFA_IO;
 
package body Demo_Ch1 is
 
   procedure Demo_Add_Sub is
 
      -- We are using 64-bit Words (see iron.ads).
      -- We'll begin with an elementary demo, using 256-bit FZ:
      X : FZ(1 .. 4) := (0,        0, 0, 0);
      Y : FZ(1 .. 4) := (16#5555#, 0, 0, 0);
      Z : FZ(1 .. 4) := (0,        0, 0, 0);
 
      -- Carry.
      C : WBool := 0;
 
   begin
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
 
      FZ_Add(X, Y, Z, C);
      Put_Line("X + Y     =");
      Dump(Z);
      New_Line;
      Put_Line("C         = " & WBool'Image(C));
 
      FZ_Sub(X, Y, Z, C);
      Put_Line("X - Y     =");
      Dump(Z);
      New_Line;
      Put_Line("C         = " & WBool'Image(C));
 
   end Demo_Add_Sub;
 
end Demo_Ch1;

Our “main” :

ffa_demo.adb:

with Demo_Ch1; use Demo_Ch1;
 
procedure FFA_Demo is
begin
 
   Demo_Add_Sub;
 
end FFA_Demo;

And finally, the “builder” in the ffademo directory.
This illustrates how the static library produced by libffa’s builder is put to use:

ffa_demo.gpr:

with "../libffa/ffa.gpr";
 
project FFA_Demo is
 
  for Object_Dir use "obj";
 
  type Mode_Type is ("debug", "release");
  Mode : Mode_Type := external ("mode", "release");
 
  for Languages   use ("Ada");
  for Source_Dirs use (".");
  for Exec_Dir    use "bin";
  for Main        use ("ffa_demo.adb");
 
  package Compiler is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ("-g");
        when "release" =>
           for Switches ("Ada")
             use ("-O2", "-fdump-scos", "-gnata", "-fstack-check",
                  "-fdata-sections", "-ffunction-sections");
     end case;
  end Compiler;
 
  package Binder is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-static");
     end case;
  end Binder;
 
  package Linker is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-Wl,--gc-sections",
                  "-static");
     end case;
  end Linker;
 
end FFA_Demo;

At this point, the reader should be reasonably confident that the Chapter 1 demo will not format his HDD; and perhaps even has a reasonable supposition as to what the output will be.

Let’s run it:

./bin/ffa_demo

And this is what you should see:

X         =
0000000000000000000000000000000000000000000000000000000000000000
Y         =
0000000000000000000000000000000000000000000000000000000000005555
X + Y     =
0000000000000000000000000000000000000000000000000000000000005555
C         =  0
X - Y     =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAAAB
C         =  1

Now it should be clear to the reader, that we are simply emulating a “stretched” version of something resembling a typical CPU’s ALU: a mechanism which arithmetizes on registers of fixed physical size.

~To be continued!~

This entry was written by Stanislav , posted on Friday December 01 2017 , filed under Bitcoin, Cold Air, Computation, Cryptography, FFA, Mathematics, NonLoper, ShouldersGiants, SoftwareSucks . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

16 Responses to ““Finite Field Arithmetic.” Chapter 1: Genesis.”

  • jooliv says:

    Hi,

    Great post, eagerly awating its sequels.
    However, given the requirement for no “perlisms”, your decision to
    implement the swap operation with xor instead of using a temporary
    variable confuses me. Is it because of “cache-bleed” shenanigans
    or am I way off the mark?

    Cheers
    João

    • Stanislav says:

      Dear jooliv,

      The fast-swap-with-xor trick dates to the 1950s, and somehow I assumed it were known — and immediately comprehensible — to everyone who has ever programmed anything. To the point where I see swaps via temp variable as peculiar and in need of justification.

      Yours,
      -S

      • apeloyee says:

        > To the point where I see swaps via temp variable as peculiar and in need of justification.

        xor swap seems to me a rake, which would hit you if the swapped values happen to be stored in the same location. This seems to be against fit-in-head philosophy.

        • Stanislav says:

          Dear apeloyee,

          I will point out that W_Mux, abused in this way, will also fail in exactly the same way. And, by extension, FZ_Mux.

          But why would you call either W_Swap(X, X) or W_Mux(X, X, …) ? Seems to me that that would be the placement of the rake on the ground and the stepping thereon at the same moment.

          I banned pointers (”Access variables”, in Ada-land), ergo the situation cannot arise spontaneously at run time. I wonder if GNAT is smart enough to take a precondition banning identical args for these without shitting out runtime checks into the built binary.

          Incidentally, after reading your comment I went to look where in FFA W_Swap is actually used, and found that it is used nowhere other than in FZ_Swap, which itself is used nowhere (in FFA, that is.)

          Possibly FFA has no business offering swap at all, the user can create it himself as FZ are simply arrays, which are assignable/copyable entities in Ada. Perhaps FZ_Swap ought to be thrown out entirely?

          Edit: and now also a thread.

          Yours,
          -S

          • apeloyee says:

            > Possibly FFA has no business offering swap at all, the user can create it himself as FZ are simply arrays, which are assignable/copyable entities in Ada. Perhaps FZ_Swap ought to be thrown out entirely?

            I agree.

            > I will point out that W_Mux, abused in this way, will also fail in exactly the same way. And, by extension, FZ_Mux.

            W_Mux is a function, therefore uses temporary vars and will, in fact, work. FZ_Mux therefore also.

  • apeloyee says:

    * a pure function

  • BT says:

    Hello Stanislav,

    I had problems compiling the code using gnat-gpl-2017 (AdaCore) from Gentoo repositories. Turned out, `pragma Inline_Always(…)` statements must be moved from .adb to .ads files. With GNAT 2016 from AdaCore (gcc 4.9) everything compiles fine.

    On the other hand, looks that GNAT 2017 actually did the inlining when compiling the fixed sources (https://pastebin.com/DMkB8q2t), unlike GNAT 2016 with the original sources (https://pastebin.com/fS8BYrZS). With the fixed sources, GNAT 2016 does inlining as well.

  • BT says:

    The pastes did not vanish, but the blog engine considers closing parentheses to be part of URL — they are not.
    To complete the report:
    1. Patch that I applied to sources: http://p.bvulpes.com/pastes/t9cyX/?raw=true
    2. Error logs from gprbuild and nm output on libFFA.a. GNAT 2017 removes bodies of inlined functions from the static library.
    2.1 Original source, GNAT 2016: http://p.bvulpes.com/pastes/D95yq/?raw=true
    2.2 Original source, GNAT 2017: http://p.bvulpes.com/pastes/yWrT8/?raw=true
    2.3 Fixed source, GNAT 2016: http://p.bvulpes.com/pastes/PHiLb/?raw=true
    2.4 Fixed source, GNAT 2017: http://p.bvulpes.com/pastes/NhkCU/?raw=true
    3. Objdump outputs for the bin/ffademo (only relevant parts):
    3.1 Original source, GNAT 2016: http://p.bvulpes.com/pastes/EBXjr/?raw=true
    3.2 Original source, GNAT 2017: build failed.
    3.3 Fixed source, GNAT 2016: http://p.bvulpes.com/pastes/yMvy1/?raw=true
    3.4 Fixed source, GNAT 2017: http://p.bvulpes.com/pastes/ryNPg/?raw=true

    • Stanislav says:

      Dear BT,

      Thank you for running this test.

      I did know that 2016 GNATs — GCC and AdaCore both — leave inlined routine bodies sitting in the binary. (They actually do the inlining, I disasmed the output and looked. But they also leave the routines sitting, as if they had not been also inlined.) What I did not know is that this has been fixed in 2017. This is actually a desirable behaviour from the point of view of a complete FFA library — the routines that actually comprise the user-callable interface, will not have inline pragmas. But for the demo articles, a GNAT that snips off inlined routines is a problem.

      I will have to think about how to handle this — but a solution that leaves the thing unbuildable or ill-behaved with GNAT-2016 will not be considered.

      Yours,
      -S

  • Nonymous says:

    Minor formatting nitpick

    Each of the source files in the genesis has a copy-pasted license header, for many composing most of the lines in the file, for almost all composing most of the words in the file. This seems somewhat obscene for a piece of text that boils down to “the concept of license headers is statist braindamage, these words cannot bind you”.

    Are there strong drawbacks to consolidating them into a single LICENSE file, at least?

  • Nonymous says:

    The files will get longer, though hopefully not too much for individual tests / iron / w_shifts / etc; the individual first-chapter patch, kind of by definition, will not.

    I will grant that as a spoonful of tar to find at the top of the file when you open it, the suggested ascii pr0n would be more distracting; and possibly useful if the browser presentation was hard to amend to display an equivalent graphic in 24-bit True Color.

  • Dave says:

    I’m curious what happens with xor swaps when the integers are close to max for the word size, is there an overflow flag in Gnat?
    This type of bootstrapping up a useful system is very useful independent of the application, as to my mind, we get to explicitly find out our all our mistaken assumptions, so I’ll be following this a closely as I can.

    • Stanislav says:

      Dear Dave,

      Try as I might, I am unable to make proper sense of your question.

      On what planet does XOR overflow ? It isn’t an arithmetic op.

      Word is a “modular type”, analogous to “unsigned int” in C. When we add and subtract, portably determining overflow requires the elaborate dance shown in this chapter. But what does this have to do with XOR ?

      Yours,
      -S

      • Dave says:

        My mistake, when I first saw this I thought that you were packing all the bits of both numbers into one location, hence the comment on overflow. Now I’ve looked it up I see how it works.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">