“Finite Field Arithmetic.” Chapter 3: Shifts.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch3_shifts.vpatch.

Just like before, you will end up with two directories, libffa and ffademo.

Now compile the demo, just as in Chapters 1 and 2:

cd ffademo
gprbuild

And, as before, do not run it quite yet.


First, let’s go through the “mail bag” from Chapter 2:

Reader Apeloyee observed : “A bit odd to see FZ_Neg to be named, unlike other SIMD bit operations, differently from the corresponding Ada operation on words. I personally would expect that “Neg” is arithmetic negation, since other operations aren’t named FZ_Conj or FZ_Disj.”

   -- NotN := ~N
   procedure FZ_Neg(N    : in FZ;
                    NotN : out FZ) is
   begin
      for i in N'Range loop
         NotN(i) := not N(i);
      end loop;
   end FZ_Neg;
   pragma Inline_Always(FZ_Neg);

He’s quite right, and henceforth the FFA ones-complement operator will be called FZ_Not.

Reader Diana Coman noticed the duplicate assignment to T in FZ_Swap:

   -- Exchange X and Y
   procedure FZ_Swap(X : in out FZ; Y : in out FZ) is
      T : FZ(X'Range) := X;
   begin
      T := X;
      X := Y;
      Y := T;
   end FZ_Swap;
   pragma Inline_Always(FZ_Swap);

The renaming of the ones-complement operator, and the removal of the redundant assignment to T in FZ_Swap appear in this Chapter’s vpatch.

I would like to thank Apeloyee and Diana Coman for their careful reading; and ask all of my readers not to hesitate in pointing out potentially troublesome (or merely confusing) algorithms, typos, obvious mistakes, etc.


Now let’s proceed to new material: shift operators for FZ integers :

fz_shift.ads:

 
with Words;   use Words;
with FZ_Type; use FZ_Type;
 
 
package FZ_Shift is
 
   pragma Pure;
 
   --------------------------------------------------------------
   -- Shift Right
   --------------------------------------------------------------
 
   -- ShiftedN := N >> Count (with Overflow Input and Output)
   procedure FZ_ShiftRight_O_I(N        : in  FZ;
                               ShiftedN : out FZ;
                               Count    : in  WBit_Index;
                               Overflow : out Word;
                               OF_in    : in  Word);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   -- ShiftedN := N >> Count (with Overflow Output only)
   procedure FZ_ShiftRight_O(N        : in  FZ;
                             ShiftedN : out FZ;
                             Count    : in  WBit_Index;
                             Overflow : out Word);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   -- ShiftedN := N >> Count (no Overflow output or input)
   procedure FZ_ShiftRight(N        : in  FZ;
                           ShiftedN : out FZ;
                           Count    : in  WBit_Index);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   --------------------------------------------------------------
   -- Shift Left
   --------------------------------------------------------------
 
   -- ShiftedN := N < < Count (with Overflow Input and Output)
   procedure FZ_ShiftLeft_O_I(N        : in  FZ;
                              ShiftedN : out FZ;
                              Count    : in  WBit_Index;
                              Overflow : out Word;
                              OF_in    : in  Word);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   -- ShiftedN := N << Count (with Overflow Output only)
   procedure FZ_ShiftLeft_O(N        : in  FZ;
                            ShiftedN : out FZ;
                            Count    : in  WBit_Index;
                            Overflow : out Word);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   -- ShiftedN := N << Count (no Overflow output or input)
   procedure FZ_ShiftLeft(N        : in  FZ;
                          ShiftedN : out FZ;
                          Count    : in  WBit_Index);
   pragma Precondition(N'Length = ShiftedN'Length);
 
end FZ_Shift;

The basic building blocks for shifts are FZ_ShiftRight_O_I and FZ_ShiftLeft_O_I. The other operators shown are simply shorthand notation for discarding the ability to take the overflow from, or place incoming shifted-in bits into, one of these operations. This convention is followed in order to avoid littering FFA and user code with declarations for unused variables and the pragma Unreferenced(…) statements to go with them.

Note that the overflow inputs and outputs are Words, rather than WBools. The reason for this will become apparent shortly.

Observe that, thus far, it is only possible to shift an FZ by a WBit_Index quantity. Recall from words.ads in Chapter 1:

   -- When we must refer to individual bit positions of a machine word:
   subtype WBit_Index is Natural range 0 .. Bitness - 1;

This means that, in the 64-bit example code provided, left- and right- shifts from 0 to 63 binary positions are permissible.

In a later chapter we will see arbitrary (i.e. up to the full width of an FZ) constant-time shifting algorithms. However for most of FFA's functionality, these are not necessary, so there is no need to introduce them to the reader quite yet.

Let's proceed to the essential moving parts of the Right and Left subword shifts:

fz_shift.adb:

with W_Shifts; use W_Shifts;
 
 
package body FZ_Shift is
 
   -- ShiftedN := N >> Count (with Overflow Input and Output)
   procedure FZ_ShiftRight_O_I(N        : in  FZ;
                               ShiftedN : out FZ;
                               Count    : in  WBit_Index;
                               Overflow : out Word;
                               OF_in    : in  Word) is
      Ni    : Word;
      Carry : Word := OF_in;
   begin
      for i in reverse N'Range loop
         Ni := N(i);
         ShiftedN(i) := Shift_Right(Ni, Count) or Carry;
         Carry := Shift_Left(Ni, Bitness - Count);
      end loop;
      Overflow := Carry;
   end FZ_ShiftRight_O_I;
   pragma Inline_Always(FZ_ShiftRight_O_I);

Shifting a FZ integer to the right by Count binary places, is done in the following way. We walk the Words comprising the FZ from highest to lowest (hence the reverse in the looping operator.)

Inside the loop, we make a temporary copy of the current word N(i), Ni. Then we shift Ni by the requested number of binary places, Count; and bitwise-OR the current Carry into its high bits. This Carry is equal to OF_in initially (typically zero, if nothing is being shifted in). The result of the OR is saved to ShiftedN(i) -- we have obtained the current Word of the final output.

For every Word after the first one operated on, Carry consists of that previous Word's "dropped" bits, shifted left so as to end up OR-ed into the upper end of the next Word. The Carry obtained from the last Word's shifting is written to Overflow.

Observe that it is entirely legal to call FZ_ShiftRight_O_I "in-place", i.e. with the input and output locations being the same FZ.

Now we'll define an operator that wraps the above, but does not require bits being "shifted in" to be provided:

   -- ShiftedN := N >> Count (with Overflow Output only)
   procedure FZ_ShiftRight_O(N        : in  FZ;
                             ShiftedN : out FZ;
                             Count    : in  WBit_Index;
                             Overflow : out Word) is
   begin
      FZ_ShiftRight_O_I(N, ShiftedN, Count, Overflow, 0);
   end FZ_ShiftRight_O;
   pragma Inline_Always(FZ_ShiftRight_O);

And here is one that gives neither Overflow output nor demands shifted-in input:

   -- ShiftedN := N >> Count (no Overflow output or input)
   procedure FZ_ShiftRight(N        : in  FZ;
                           ShiftedN : out FZ;
                           Count    : in  WBit_Index) is
      Overflow : Word;
      pragma Unreferenced(Overflow);
   begin
      FZ_ShiftRight_O_I(N, ShiftedN, Count, Overflow, 0);
   end FZ_ShiftRight;
   pragma Inline_Always(FZ_ShiftRight);

The process of shifting a FZ integer to the left by Count binary places, is exactly analogous to the right-shift, with one exception: we begin with the lowest Word in the FZ, and proceed "up" :

   -- ShiftedN := N < < Count (with Overflow Input and Output)
   procedure FZ_ShiftLeft_O_I(N        : in  FZ;
                              ShiftedN : out FZ;
                              Count    : in  WBit_Index;
                              Overflow : out Word;
                              OF_in    : in  Word) is
      Ni    : Word;
      Carry : Word := OF_in;
   begin
      for i in N'Range loop
         Ni := N(i);
         ShiftedN(i) := Shift_Left(Ni, Count) or Carry;
         Carry := Shift_Right(Ni, Bitness - Count);
      end loop;
      Overflow := Carry;
   end FZ_ShiftLeft_O_I;
   pragma Inline_Always(FZ_ShiftLeft_O_I);

It is very much a "mirror image" of the FZ_ShiftRight_O_I algorithm.

Now for the "wrappers", in exactly the same spirit as earlier:

   -- ShiftedN := N < < Count (with Overflow Output only)
   procedure FZ_ShiftLeft_O(N        : in  FZ;
                            ShiftedN : out FZ;
                            Count    : in  WBit_Index;
                            Overflow : out Word) is
   begin
      FZ_ShiftLeft_O_I(N, ShiftedN, Count, Overflow, 0);
   end FZ_ShiftLeft_O;
   pragma Inline_Always(FZ_ShiftLeft_O);
 
 
   -- ShiftedN := N << Count (no Overflow output or input)
   procedure FZ_ShiftLeft(N        : in  FZ;
                          ShiftedN : out FZ;
                          Count    : in  WBit_Index) is
      Overflow : Word;
      pragma Unreferenced(Overflow);
   begin
      FZ_ShiftLeft_O_I(N, ShiftedN, Count, Overflow, 0);
   end FZ_ShiftLeft;
   pragma Inline_Always(FZ_ShiftLeft);
 
end FZ_Shift;

Just as in prior Chapters, I advise the reader to make abundant use of grid paper and pen, to properly fit the mechanics of the given algorithms into his head -- until they are as comfortably understood as a favourite armchair.


Now, let's make a demo for the routines in this Chapter:

demo_ch3.ads:

package Demo_Ch3 is
 
   procedure Demo_Shifts;
 
end Demo_Ch3;

demo_ch3.adb:

-- From Ada:
with Ada.Text_IO; use Ada.Text_IO;
 
-- From FFA:
with Words;    use Words;
with FZ_Type;  use FZ_Type;
 
-- FFA Ch. 3
with FZ_Shift; use FZ_Shift;
 
-- From the Demo:
with FFA_IO;   use FFA_IO;
 
 
package body Demo_Ch3 is
 
   procedure Demo_Shifts is
 
      X : FZ(1 .. 4) := ( 16#083e16f27091f65f#,  16#74c01a9c3ce54f80#,
                          16#9fd0913737c3fcbe#,  16#fa55f3f5459a9e79# );
 
      Z : FZ(1 .. 4) := ( 0,  0,  0,  0 );
 
      -- Overflow
      O : Word := 0;
 
   begin
 
      Put_Line("~~~ Ch. 3 : Shifts ~~~");
      New_Line;
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
      New_Line;
 
      -- Left Shifts:
 
      FZ_ShiftLeft_O(X, Z, 1, O);
      Put_Line("X < < 1    =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      FZ_ShiftLeft_O(X, Z, 13, O);
      Put_Line("X << 13   =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      FZ_ShiftLeft_O(X, Z, 40, O);
      Put_Line("X << 40   =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      -- Right Shifts:
 
      FZ_ShiftRight_O(X, Z, 1, O);
      Put_Line("X >> 1    =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      FZ_ShiftRight_O(X, Z, 13, O);
      Put_Line("X >> 13   =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      FZ_ShiftRight_O(X, Z, 40, O);
      Put_Line("X >> 40   =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
   end Demo_Shifts;
 
end Demo_Ch3;

ffa_demo.adb:

with Demo_Ch1; use Demo_Ch1;
with Demo_Ch2; use Demo_Ch2;
with Demo_Ch3; use Demo_Ch3;
 
procedure FFA_Demo is
begin
 
   -- Ch. 1
   Demo_Add_Sub;
 
   -- Ch. 2
   Demo_Word_Preds;
   Demo_FZ_Basics;
   Demo_FZ_Various_Ops;
 
   -- Ch. 3
   Demo_Shifts;
 
end FFA_Demo;

Finally, let's run it:

./bin/ffa_demo

And this is what you should see:

~~~ ... Output from earlier Chapters snipped ... ~~~
~~~ ............................................ ~~~
 
~~~ Ch. 3 : Shifts ~~~
 
X         =
FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F
 
X < < 1    =
F4ABE7EA8B353CF33FA1226E6F87F97CE980353879CA9F00107C2DE4E123ECBE
Overflow  = 
0000000000000001
X << 13   =
BE7EA8B353CF33FA1226E6F87F97CE980353879CA9F00107C2DE4E123ECBE000
Overflow  = 
0000000000001F4A
X << 40   =
9A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F0000000000
Overflow  = 
000000FA55F3F545
X >> 1    =
7D2AF9FAA2CD4F3CCFE8489B9BE1FE5F3A600D4E1E72A7C0041F0B793848FB2F
Overflow  = 
8000000000000000
X >> 13   =
0007D2AF9FAA2CD4F3CCFE8489B9BE1FE5F3A600D4E1E72A7C0041F0B793848F
Overflow  = 
B2F8000000000000
X >> 40   =
0000000000FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16
Overflow  = 
F27091F65F000000

~To be continued!~

“Finite Field Arithmetic.” Chapter 2: Logical and Bitwise Operations.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch2_logicals.vpatch.

Just as in Chapter 1, you will end up with two directories, libffa and ffademo, but this time with somewhat different contents. (If you are new to V, read the vpatch with your naked eye; you will notice that it is almost exactly the same thing as an ordinary Unix diff, and quite readable.)

Now compile the demo, just as in Chapter 1:

cd ffademo
gprbuild

And, as before, do not run it quite yet.


First, let’s go through the “mail bag” from Chapter 1:

Reader Apeloyee observed that “xor swap seems to me a rake, which would hit you if the swapped values happen to be stored in the same location.”

   -- Exchange A and B.
   procedure W_Swap(A : in out Word; B : in out Word) is
   begin
      A := A xor B;
      B := A xor B;
      A := A xor B;
   end W_Swap;
   pragma Inline_Always(W_Swap);

Naturally, it is possible to answer this “Doctor, it hurts when I do that!” with the proverbial “So stop doing that.” But “rakes” are not acceptable in safety-critical systems.

Reader Mircea Popescu proposed to insert a stern warning, e.g.

   pragma Restrictions(No_Implicit_Heap_Allocations);
   -- Removing this may render the functioning of W_Swap and W_Mux dangerous,
   -- see http://btcbase.org/log/2017-12-02#1745513 discussion for details

Unfortunately, a dig through the GNAT docs revealed that there is apparently no reliable way to enforce a permanent guarantee that W_Swap will not be under any circumstances called on overlapping objects allocated explicitly; or used on pairs of variables made to refer to the same memory location via the renames construct.

Therefore W_Swap has been removed; this is reflected in the new vpatch; and FZ_Swap, introduced in this chapter, uses a temporary copy, as seen in kindergarten schoolbooks.

I would like to thank Apeloyee and Mircea Popescu for their careful reading; and ask all of my readers not to hesitate in pointing out potentially troublesome (or merely confusing) algorithms, typos, obvious mistakes, etc.


Now let’s proceed to some new material.

There are certain predicates we will need later for use on Words:

w_pred.ads:

with Words; use Words;
 
package W_Pred is
 
   pragma Pure;
 
   -- Return 1 if N is equal to 0; otherwise return 0.
   function W_ZeroP(N : in Word) return WBool;
 
   -- Return 1 if N is unequal to 0; otherwise return 0.
   function W_NZeroP(N : in Word) return WBool;
 
   -- Return 1 if N is odd; otherwise return 0.
   function W_OddP(N : in Word) return WBool;
 
   -- Return 1 if A is equal to B ; otherwise return 0.
   function W_EqP(A : in Word; B : in Word) return WBool;
 
end W_Pred;

w_pred.adb:

with Word_Ops; use Word_Ops;
 
package body W_Pred is
 
   -- Return 1 if N is equal to 0; otherwise return 0.
   function W_ZeroP(N : in Word) return WBool is
   begin
      return W_Borrow(N, 1, N - 1);
   end W_ZeroP;
   pragma Inline_Always(W_ZeroP);
 
 
   -- Return 1 if N is unequal to 0; otherwise return 0.
   function W_NZeroP(N : in Word) return WBool is
   begin
      return 1 xor W_ZeroP(N);
   end W_NZeroP;
   pragma Inline_Always(W_NZeroP);
 
 
   -- Return 1 if N is odd; otherwise return 0.
   function W_OddP(N : in Word) return WBool is
   begin
      return 1 and N;
   end W_OddP;
   pragma Inline_Always(W_OddP);
 
 
   -- Return 1 if A is equal to B ; otherwise return 0.
   function W_EqP(A : in Word; B : in Word) return WBool is
   begin
      return W_ZeroP(A xor B);
   end W_EqP;
   pragma Inline_Always(W_EqP);
 
end W_Pred;

Observe that conditional branches are avoided. W_ZeroP is the fundamental operation here; expand the equation on a sheet of paper, referring to the W_Borrow from Chapter 1 if you do not remember how the latter works.


Now let’s discuss a few more fundamental operations on FZ integers:

fz_basic.ads:

with Words; use Words;
with FZ_Type; use FZ_Type;
 
 
package FZ_Basic is
 
   pragma Pure;
 
   -- N := 0
   procedure FZ_Clear(N : out FZ);
 
   -- First word of N := Source
   procedure FZ_Set_Head(N : out FZ; Source : in Word);
 
   -- First word of N
   function FZ_Get_Head(N : in FZ) return Word;
 
   -- Exchange X and Y
   procedure FZ_Swap(X : in out FZ; Y : in out FZ);
   pragma Precondition(X'Length = Y'Length);
 
   -- Constant-time MUX: Sel = 0: Result := X; Sel = 1: Result := Y
   procedure FZ_Mux(X : in FZ; Y : in FZ; Result : out FZ; Sel : in WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Result'Length);
 
end FZ_Basic;

We will go over these one at a time:

fz_basic.adb:

with Word_Ops; use Word_Ops;
 
 
package body FZ_Basic is

The FZ_Clear operation illustrates the use of Ada’s array assignment construct. It simply sets a given FZ to zero.

   -- N := 0
   procedure FZ_Clear(N : out FZ) is
   begin
      N := (others => 0);
   end FZ_Clear;
   pragma Inline_Always(FZ_Clear);

The FZ_Set_Head operation changes only the head, or “youngest” Word, of a given FZ, to the given Word value.

   -- First word of N := Source
   procedure FZ_Set_Head(N : out FZ; Source : in Word) is
   begin
      N(N'First) := Source;
   end FZ_Set_Head;
   pragma Inline_Always(FZ_Set_Head);

FZ_Get_Head returns the “youngest” (lowest, bottom-most) word of an FZ.

   -- First word of N
   function FZ_Get_Head(N : in FZ) return Word is
   begin
      return N(N'First);
   end FZ_Get_Head;
   pragma Inline_Always(FZ_Get_Head);

This is the Swap operation, as discussed previously in the “mail bag” section of this article.

   -- Exchange X and Y
   procedure FZ_Swap(X : in out FZ; Y : in out FZ) is
      T : FZ(X'Range) := X;
   begin
      T := X;
      X := Y;
      Y := T;
   end FZ_Swap;
   pragma Inline_Always(FZ_Swap);

It is absolutely vital to understand exactly how and why FZ_Mux works. This operation takes two FZ inputs, X and Y, and a WBool selector, Sel; it assigns X to the Result if Sel equals 0; alternatively, Y if Sel equals 1. Refer to the W_Mux material in Chapter 1 to understand how this is done. The Mux operation is a fundamental building block of constant-time (i.e. branch-free) arithmetic as seen in FFA.

   -- Constant-time MUX: Sel = 0: Result := X; Sel = 1: Result := Y
   procedure FZ_Mux(X : in FZ; Y : in FZ; Result : out FZ; Sel : in WBool) is
   begin
      for i in X'Range loop
         Result(i) := W_Mux(X(i), Y(i), Sel);
      end loop;
   end FZ_Mux;
   pragma Inline_Always(FZ_Mux);
 
end FZ_Basic;

Now let’s introduce some unary predicate operators that work on FZ integers:

fz_pred.ads:

with Words; use Words;
with FZ_Type; use FZ_Type;
 
 
package FZ_Pred is
 
   pragma Pure;
 
   -- 1 iff N == 0 (branch-free); else 0
   function FZ_ZeroP(N : in FZ) return WBool;
 
   -- 1 iff N is odd
   function FZ_OddP(N : in FZ) return WBool;
 
end FZ_Pred;

fz_pred.adb:

with W_Pred; use W_Pred;
 
 
package body FZ_Pred is
 
   -- 1 iff N == 0 (branch-free); else 0
   function FZ_ZeroP(N : in FZ) return WBool is
      A : WBool := 1;
   begin
      for i in N'Range loop
         A := A and W_ZeroP(N(i));
      end loop;
      return A;
   end FZ_ZeroP;
   pragma Inline_Always(FZ_ZeroP);
 
 
   -- 1 iff N is odd
   function FZ_OddP(N : in FZ) return WBool is
   begin
      return W_OddP(N(N'First));
   end FZ_OddP;
   pragma Inline_Always(FZ_OddP);
 
end FZ_Pred;

As before, draw this on a sheet of paper if the mechanism does not immediately stand up in front of your mind’s eye.


Now for some binary comparison predicates on FZ:

fz_cmp.ads:

with Words; use Words;
with FZ_Type; use FZ_Type;
 
 
package FZ_Cmp is
 
   pragma Pure;
 
   -- 1 iff X == Y (branch-free); else 0
   function FZ_EqP(X : in FZ; Y: in FZ) return WBool;
   pragma Precondition(X'Length = Y'Length);
 
   -- 1 iff X < Y (branch-free); else 0
   function FZ_LessThanP(X : in FZ; Y : in FZ) return WBool;
   pragma Precondition(X'Length = Y'Length);
 
   -- 1 iff X > Y (branch-free); else 0
   function FZ_GreaterThanP(X : in FZ; Y : in FZ) return WBool;
   pragma Precondition(X'Length = Y'Length);
 
end FZ_Cmp;

fz_cmp.adb:

with W_Pred;   use W_Pred;
with FZ_Arith; use FZ_Arith;
 
 
package body FZ_Cmp is
 
   -- 1 iff X == Y (branch-free); else 0
   function FZ_EqP(X : in FZ; Y : in FZ) return WBool is
      A : WBool := 1;
   begin
      for i in X'Range loop
         A := A and W_EqP(X(i), Y(i));
      end loop;
      return A;
   end FZ_EqP;
   pragma Inline_Always(FZ_EqP);
 
 
   -- 1 iff X < Y (branch-free); else 0
   function FZ_LessThanP(X : in FZ; Y : in FZ) return WBool is
      Scratch : FZ(X'Range);
      Borrow  : WBool := 0;
   begin
      FZ_Sub(X, Y, Scratch, Borrow);
      return Borrow;
   end FZ_LessThanP;
   pragma Inline_Always(FZ_LessThanP);
 
 
   -- 1 iff X > Y (branch-free); else 0
   function FZ_GreaterThanP(X : in FZ; Y: in FZ) return WBool is
      Scratch : FZ(X'Range);
      Borrow  : WBool := 0;
   begin
      FZ_Sub(Y, X, Scratch, Borrow);
      return Borrow;
   end FZ_GreaterThanP;
   pragma Inline_Always(FZ_GreaterThanP);
 
end FZ_Cmp;

Here we make use of W_EqP from earlier in this chapter.

Observe that these work in exactly the same way as a commonplace CPU’s ALU does. Refer to the material in Chapter 1 concerning branch-free subtraction.


Now for some bitwise operators on FZ integers:

fz_bitop.ads:

with FZ_Type; use FZ_Type;
with Words; use Words;
 
 
package FZ_BitOp is
 
   pragma Pure;
 
   -- Result := X & Y
   procedure FZ_And(X : in FZ; Y : in FZ; Result : out FZ);
   pragma Precondition(X'Length = Y'Length and X'Length = Result'Length);
 
   -- N := N & W, W is a word
   procedure FZ_And_W(N : in out FZ; W : in Word);
 
   -- Result := X | Y
   procedure FZ_Or(X : in FZ; Y : in FZ; Result : out FZ);
   pragma Precondition(X'Length = Y'Length and X'Length = Result'Length);
 
   -- N := N | W, W is a word
   procedure FZ_Or_W(N : in out FZ; W : in Word);
 
   -- Result := X ^ Y
   procedure FZ_Xor(X : in FZ; Y : in FZ; Result : out FZ);
   pragma Precondition(X'Length = Y'Length and X'Length = Result'Length);
 
   -- N := N ^ W, W is a word
   procedure FZ_Xor_W(N : in out FZ; W : in Word);
 
   -- NotN := ~N
   procedure FZ_Neg(N : in FZ; NotN : out FZ);
   pragma Precondition(N'Length = NotN'Length);
 
end FZ_BitOp;

fz_bitop.adb:

package body FZ_BitOp is
 
   -- Result := X & Y
   procedure FZ_And(X : in FZ; Y : in FZ; Result : out FZ) is
   begin
      for i in X'Range loop
         Result(i) := X(i) and Y(i);
      end loop;
   end FZ_And;
   pragma Inline_Always(FZ_And);
 
 
   -- N := N & W, W is a word
   procedure FZ_And_W(N : in out FZ; W : in Word) is
   begin
      N(N'First) := N(N'First) and W;
   end FZ_And_W;
   pragma Inline_Always(FZ_And_W);
 
 
   -- Result := X | Y
   procedure FZ_Or(X : in FZ; Y : in FZ; Result : out FZ) is
   begin
      for i in X'Range loop
         Result(i) := X(i) or Y(i);
      end loop;
   end FZ_Or;
   pragma Inline_Always(FZ_Or);
 
 
   -- N := N | W, W is a word
   procedure FZ_Or_W(N : in out FZ; W : in Word) is
   begin
      N(N'First) := N(N'First) or W;
   end FZ_Or_W;
   pragma Inline_Always(FZ_Or_W);
 
 
   -- Result := X ^ Y
   procedure FZ_Xor(X : in FZ; Y : in FZ; Result : out FZ) is
   begin
      for i in X'Range loop
         Result(i) := X(i) xor Y(i);
      end loop;
   end FZ_Xor;
   pragma Inline_Always(FZ_Xor);
 
 
   -- N := N ^ W, W is a word
   procedure FZ_Xor_W(N : in out FZ; W : in Word) is
   begin
      N(N'First) := N(N'First) xor W;
   end FZ_Xor_W;
   pragma Inline_Always(FZ_Xor_W);
 
 
   -- NotN := ~N
   procedure FZ_Neg(N    : in FZ;
                    NotN : out FZ) is
   begin
      for i in N'Range loop
         NotN(i) := not N(i);
      end loop;
   end FZ_Neg;
   pragma Inline_Always(FZ_Neg);
 
end FZ_BitOp;

None of these make use of any kind of “cleverness”.

All but the unary FZ_Neg operator are defined both for pairs of FZ integers and also for an FZ and Word pairing; in the latter case, the operation is performed against the “head” of the FZ.


Now, let’s make a demo for the routines in this Chapter:

demo_ch2.ads:

package Demo_Ch2 is
 
   procedure Demo_Word_Preds;
   procedure Demo_FZ_Basics;
   procedure Demo_FZ_Various_Ops;
 
end Demo_Ch2;

demo_ch2.adb:

-- From Ada:
with Ada.Text_IO; use Ada.Text_IO;
 
-- From FFA:
with Words;    use Words;
with FZ_Type;  use FZ_Type;
with FZ_Arith; use FZ_Arith;
 
-- FFA Ch. 2
with W_Pred;   use W_Pred;
with FZ_Basic; use FZ_Basic;
with FZ_Pred;  use FZ_Pred;
with FZ_BitOp; use FZ_BitOp;
with FZ_Cmp;   use FZ_Cmp;
 
-- From the Demo:
with FFA_IO;   use FFA_IO;
 
 
package body Demo_Ch2 is
 
   procedure Demo_Word_Preds is
 
   begin
 
      Put_Line("~~~ Ch. 2 : Word Predicates ~~~");
      New_Line;
 
      Put_Line("W_ZeroP(0) :");
      Dump(W_ZeroP(0));
      New_Line;
      New_Line;
 
      Put_Line("W_ZeroP(1) :");
      Dump(W_ZeroP(1));
      New_Line;
      New_Line;
 
      Put_Line("W_NZeroP(1) :");
      Dump(W_NZeroP(1));
      New_Line;
      New_Line;
 
      Put_Line("W_OddP(3) :");
      Dump(W_OddP(3));
      New_Line;
      New_Line;
 
      Put_Line("W_EqP(1, 1) :");
      Dump(W_EqP(1, 1));
      New_Line;
      New_Line;
 
      Put_Line("W_EqP(1, 0) :");
      Dump(W_EqP(1, 0));
      New_Line;
      New_Line;
 
   end Demo_Word_Preds;
 
 
   procedure Demo_FZ_Basics is
 
      X : FZ(1 .. 4) := ( 0,  0,  0,  0);
 
      -- Shorthand for 'maxint'
      Y : FZ(1 .. 4) := (-1, -1, -1, -1);
 
      Z : FZ(1 .. 4) := ( 0,  0,  0,  0);
 
      -- Flag.
      F : WBool := 0;
 
   begin
 
      Put_Line("~~~ Ch. 2 : FZ Basics ~~~");
      New_Line;
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
      New_Line;
 
      Put_Line("FZ_ZeroP(X) :");
      Dump(FZ_ZeroP(X));
      New_Line;
      New_Line;
 
      Put_Line("FZ_ZeroP(Y) :");
      Dump(FZ_ZeroP(Y));
      New_Line;
      New_Line;
 
      FZ_Swap(X, Y);
      Put_Line("After FZ_Swap(X, Y) :");
      Put_Line("X         =");
      Dump(X);
      New_Line;
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
      New_Line;
 
      F := 0;
      FZ_Mux(X, Y, Z, F);
      Put_Line("After F := 0 , FZ_Mux(X, Y, Z, F) : ");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      F := 1;
      FZ_Mux(X, Y, Z, F);
      Put_Line("After F := 1 , FZ_Mux(X, Y, Z, F) : ");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
   end Demo_FZ_Basics;
 
 
   procedure Demo_FZ_Various_Ops is
 
      X : FZ(1 .. 4) := ( 16#083e16f27091f65f#,  16#74c01a9c3ce54f80#,
                          16#9fd0913737c3fcbe#,  16#fa55f3f5459a9e79# );
 
      Y : FZ(1 .. 4) := ( 16#d16b9d65bf3991d0#,  16#8a509a858786b366#,
                          16#d39d23682728f4f8#,  16#6c571dbc646bf513# );
 
      Z : FZ(1 .. 4) := ( 0,  0,  0,  0 );
 
   begin
 
      Put_Line("~~~ Ch. 2 : FZ Bit Ops ~~~");
      New_Line;
      New_Line;
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
      New_Line;
 
      FZ_Neg(X, Z);
      Put_Line("After FZ_Neg(X, Z):");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      FZ_And(X, Y, Z);
      Put_Line("After FZ_And(X, Y, Z) :");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      FZ_Or(X, Y, Z);
      Put_Line("After FZ_Or(X, Y, Z) :");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      FZ_Xor(X, Y, Z);
      Put_Line("After FZ_Xor(X, Y, Z) :");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      Put_Line("FZ_EqP(X, X) :");
      Dump(FZ_EqP(X, X));
      New_Line;
      New_Line;
 
      Put_Line("FZ_EqP(X, Y) :");
      Dump(FZ_EqP(X, Y));
      New_Line;
      New_Line;
 
      Put_Line("FZ_LessThanP(X, Y) :");
      Dump(FZ_LessThanP(X, Y));
      New_Line;
      New_Line;
 
      Put_Line("FZ_LessThanP(Y, X) :");
      Dump(FZ_LessThanP(Y, X));
      New_Line;
      New_Line;
 
      Put_Line("FZ_GreaterThanP(X, Y) :");
      Dump(FZ_GreaterThanP(X, Y));
      New_Line;
      New_Line;
 
      Put_Line("FZ_GreaterThanP(Y, X) :");
      Dump(FZ_GreaterThanP(Y, X));
      New_Line;
      New_Line;
 
   end Demo_FZ_Various_Ops;
 
 
end Demo_Ch2;

ffa_demo.adb:

with Demo_Ch1; use Demo_Ch1;
with Demo_Ch2; use Demo_Ch2;
 
procedure FFA_Demo is
begin
 
   -- Ch. 1
   Demo_Add_Sub;
 
   -- Ch. 2
   Demo_Word_Preds;
   Demo_FZ_Basics;
   Demo_FZ_Various_Ops;
 
end FFA_Demo;

Finally, let’s run it:

./bin/ffa_demo

And this is what you should see:

~~~ Ch. 1 : Add, Sub ~~~
 
X         =
0000000000000000000000000000000000000000000000000000000000000000
Y         =
0000000000000000000000000000000000000000000000000000000000005555
X + Y     =
0000000000000000000000000000000000000000000000000000000000005555
C         =  0
X - Y     =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAAAB
C         =  1
 
~~~ Ch. 2 : Word Predicates ~~~
 
W_ZeroP(0) :
0000000000000001
 
W_ZeroP(1) :
0000000000000000
 
W_NZeroP(1) :
0000000000000001
 
W_OddP(3) :
0000000000000001
 
W_EqP(1, 1) :
0000000000000001
 
W_EqP(1, 0) :
0000000000000000
 
~~~ Ch. 2 : FZ Basics ~~~
 
X         =
0000000000000000000000000000000000000000000000000000000000000000
 
Y         =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 
FZ_ZeroP(X) :
0000000000000001
 
FZ_ZeroP(Y) :
0000000000000000
 
After FZ_Swap(X, Y) :
X         =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 
Y         =
0000000000000000000000000000000000000000000000000000000000000000
 
After F := 0 , FZ_Mux(X, Y, Z, F) : 
Z         =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 
After F := 1 , FZ_Mux(X, Y, Z, F) : 
Z         =
0000000000000000000000000000000000000000000000000000000000000000
 
~~~ Ch. 2 : FZ Bit Ops ~~~
 
 
X         =
FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F
 
Y         =
6C571DBC646BF513D39D23682728F4F88A509A858786B366D16B9D65BF3991D0
 
After FZ_Neg(X, Z):
Z         =
05AA0C0ABA656186602F6EC8C83C03418B3FE563C31AB07FF7C1E90D8F6E09A0
 
After FZ_And(X, Y, Z) :
Z         =
685511B4440A9411939001202700F4B800401A8404840300002A146030119050
 
After FZ_Or(X, Y, Z) :
Z         =
FE57FFFD65FBFF7BDFDDB37F37EBFCFEFED09A9DBFE7FFE6D97F9FF7FFB9F7DF
 
After FZ_Xor(X, Y, Z) :
Z         =
9602EE4921F16B6A4C4DB25F10EB0846FE908019BB63FCE6D9558B97CFA8678F
 
FZ_EqP(X, X) :
0000000000000001
 
FZ_EqP(X, Y) :
0000000000000000
 
FZ_LessThanP(X, Y) :
0000000000000000
 
FZ_LessThanP(Y, X) :
0000000000000001
 
FZ_GreaterThanP(X, Y) :
0000000000000001
 
FZ_GreaterThanP(Y, X) :
0000000000000000

~To be continued!~

“Finite Field Arithmetic.” Chapter 1: Genesis.

FFA, or the Finite Field Arithmetic library, differs from the typical Open Sores abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute.

This article is the first in a series, at the end of which the patient reader will have glued together, with his own hands — and fully fit into his head — a demonstrably-correct set of routines for multiple-precision arithmetic, sufficient for carrying out RSA and many other forms of number-theoretical cryptography.

The “bignum” arithmetic libraries currently in common use were, near as I can tell, all and without exception written by NSA assets and their various “useful idiots”. The conclusion is difficult to avoid, given as every example known to me suffers from one or more of the following ills:

FFA, on the other hand:

  • Written in a heavily-restricted subset of the Ada programming language — the only currently-existing nonproprietary statically-compiled language which permits fully bounds-checked, pointerolade-free code and practically-auditable binaries. We will be using GNAT, which relies on the GCC backend.
  • No external dependencies or libraries are made use of; suitable for embedded systems with limited memory. Language features which bloat the generated binary (e.g. OOP) are avoided.
  • No use of heap allocators. (In fact, wholly allocator-agnostic.) Memory requirement for all operations is readily calculable in advance from the width of the inputs.
  • The total set of components needed for RSA occupies around 3000 lines, incl. comments, of clear and unidiomatic (i.e. containing no “perlisms”) code.
  • Contains no inline assembler and is married to no particular machine (i.e. can be built for microcontrollers, MS-DOS boxes, CPUs of virtually any “bitness”)
  • Does not branch-on-secret-bits
  • Does not address memory by secret values, i.e. is cachebleed-proof
  • Fully unrollable, deterministic (and ergo auditable) control flow. I.e. all RSA modular-exponentiation operations will take exactly the same amount of time, via exactly the same flow of instructions; all algorithmic-complexity attack vectors are ruled out.

FFA will be presented to the reader in small pieces. Each should take more than one hour, but probably fewer than two, to fully study and understand.

You will need:

Just as “theater begins with the coat room”, each FFA article begins with a V-press. Set up your Vtron if you have not already done so, and press the “genesis” — “ffa_ch1_genesis”.

You will end up with two directories, libffa and ffademo. The former contains the subset of FFA discussed in this chapter; the latter — the demo routines for same. The reader is invited to modify the demo routines, experiment, learn.

Each subsequent chapter of the FFA series will contain a V-patch on the previous set, and the next — another, until the final article in the series: where “P” — an FFA-tronic GPG replacement — is introduced.

First, verify that your V-Tron, GNAT, and genesis press are in working order, by compiling the introductory demo:

cd ffademo
gprbuild

The output should resemble:

using project file ffa_demo.gpr
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections ffa_demo.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc word_ops.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc fz_type.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc fz_arith.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc w_shifts.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc iron.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc words.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections demo_ch1.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections ffa_io.adb
gprlib FFA.lexch
ar cr /home/you/ffa/libffa/lib/libFFA.a /home/you/ffa/libffa/obj/word_ops.o
/home/you/ffa/libffa/obj/fz_type.o ...
ranlib libFFA.a
gprbind ffa_demo.bexch
gnatbind ffa_demo.ali
gcc -c b__ffa_demo.adb
gcc ffa_demo.o -Wl,--gc-sections -static -o ffa_demo

Observe that the contents of the ‘libffa’ directory build recursively.

If you are an Apple victim, your OS prohibits static linking, and you may see barf instead. You may use the debug (dynamically linked) variant:

gprbuild -Xmode=debug

Do not run the demo yet. (you read programs before running, don’t you?) Instead, let’s take a careful look at what is inside.

Let’s begin with the restriction pragmas. These define a particular, constrained subset of the Ada programming language, by prohibiting the use of certain features. GNAT will balk if it encounters a prohibited feature in FFA or the demo routines.

The logic behind the restrictions was a combination of “guilty until proven innocent” — if we can do without a construct, we will — and “enable all safety features”:

restrict.adc:

pragma Restrictions(Immediate_Reclamation);
pragma Restrictions(Max_Asynchronous_Select_Nesting => 0);
pragma Restrictions(Max_Protected_Entries => 0);
pragma Restrictions(Max_Select_Alternatives => 0);
pragma Restrictions(Max_Task_Entries => 0);
pragma Restrictions(Max_Tasks => 0);
pragma Restrictions(No_Abort_Statements);
pragma Restrictions(No_Access_Parameter_Allocators);
pragma Restrictions(No_Allocators);
pragma Restrictions(No_Asynchronous_Control);
pragma Restrictions(No_Calendar);
pragma Restrictions(No_Coextensions);
pragma Restrictions(No_Default_Stream_Attributes);
pragma Restrictions(No_Delay);
pragma Restrictions(No_Dispatch);
pragma Restrictions(No_Dispatching_Calls);
pragma Restrictions(No_Dynamic_Attachment);
pragma Restrictions(No_Dynamic_Priorities);
pragma Restrictions(No_Entry_Calls_In_Elaboration_Code);
pragma Restrictions(No_Entry_Queue);
pragma Restrictions(No_Enumeration_Maps);
pragma Restrictions(No_Exception_Propagation);
pragma Restrictions(No_Exception_Registration);
pragma Restrictions(No_Finalization);
pragma Restrictions(No_Fixed_Io);
pragma Restrictions(No_Floating_Point);
pragma Restrictions(No_Implementation_Aspect_Specifications);
pragma Restrictions(No_Implementation_Units);
pragma Restrictions(No_Implicit_Aliasing);
pragma Restrictions(No_Implicit_Conditionals);
pragma Restrictions(No_Implicit_Dynamic_Code);
pragma Restrictions(No_Implicit_Heap_Allocations);
pragma Restrictions(No_Implicit_Protected_Object_Allocations);
pragma Restrictions(No_Implicit_Task_Allocations);
pragma Restrictions(No_Initialize_Scalars);
pragma Restrictions(No_Local_Protected_Objects);
pragma Restrictions(No_Local_Timing_Events);
pragma Restrictions(No_Multiple_Elaboration);
pragma Restrictions(No_Nested_Finalization);
pragma Restrictions(No_Protected_Type_Allocators);
pragma Restrictions(No_Protected_Types);
pragma Restrictions(No_Relative_Delay);
pragma Restrictions(No_Requeue_Statements);
pragma Restrictions(No_Secondary_Stack);
pragma Restrictions(No_Select_Statements);
pragma Restrictions(No_Specific_Termination_Handlers);
pragma Restrictions(No_Standard_Allocators_After_Elaboration);
pragma Restrictions(No_Stream_Optimizations);
pragma Restrictions(No_Streams);
pragma Restrictions(No_Task_Allocators);
pragma Restrictions(No_Task_At_Interrupt_Priority);
pragma Restrictions(No_Task_Attributes_Package);
pragma Restrictions(No_Task_Hierarchy);
pragma Restrictions(No_Tasking);
pragma Restrictions(No_Task_Termination);
pragma Restrictions(No_Terminate_Alternatives);
pragma Restrictions(No_Unchecked_Access);
pragma Restrictions(No_Unchecked_Conversion);
pragma Restrictions(No_Unchecked_Deallocation);
pragma Restrictions(No_Wide_Characters);
pragma Restrictions(Pure_Barriers);
pragma Restrictions(Simple_Barriers);
pragma Restrictions(Static_Priorities);
pragma Restrictions(Static_Storage_Size);
pragma Validity_Checks(ALL_CHECKS);

Towards the end of this article series, we will investigate the impact of such “fascism” on program performance, and the possibility of cautiously replacing certain bounds checks with formal proof of nonoverflowability.

Let’s proceed to the builder. Observe that GnuMake was not listed in the list of required programs for this tutorial. Instead we make use of GNAT’s “GprBuild”, a substantially cleaner and better-behaved item:

ffa.gpr:

project FFA is
 
  for Object_Dir use "obj";
 
  type Mode_Type is ("debug", "release");
  Mode : Mode_Type := external ("mode", "release");
 
  for Languages         use ("Ada");
  for Source_Dirs       use (".");
  for Library_Dir       use "lib";
  for Library_Name      use "FFA";
  for Library_Kind      use "static";
 
  package Binder is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-static", "-r");
     end case;
  end Binder;
 
  package Builder is
     for Switches ("Ada")
       use ("-nostdlib");
  end Builder;
 
  package Compiler is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ("-g");
        when "release" =>
           for Switches ("Ada")
             use ("-O2", "-fdump-scos", "-gnata", "-fstack-check",
                  "-fdata-sections", "-ffunction-sections",
                  "-gnatec=" & FFA'Project_Dir & "restrict.adc");
     end case;
  end Compiler;
 
end FFA;

Now let’s proceed to the only machine-dependent section of FFA.

The supplied iron.ads pre-supposes eight-bit bytes, and a 64-bit CPU:

iron.ads:

package Iron is
 
   pragma Pure;
 
   --------------------------------------
   -------- For a 64-bit system: --------
   --------------------------------------
   MachineBitness     : constant Positive := 64;
   MachineBitnessLog2 : constant Positive := 6; -- log2(64)
   --------------------------------------
 
   -- Bits per Byte
   ByteBits           : constant Positive := 8;
 
end Iron;

If this is not what you have, change the constants, e.g.:

   ------------------------------------
   ------ For a 32-bit system: --------
   ------------------------------------
     MachineBitness     : constant Positive := 32;
     MachineBitnessLog2 : constant Positive := 5; -- log2(32)
   ------------------------------------

The purpose of MachineBitnessLog2 will become apparent later; ignore it for now.

Let’s proceed to the most fundamental building block of FFA, the machine Word. A Word, from this point on, is simply a single machine word on your particular machine. It is expressed as an Ada “modular type”.

words.ads:

with Iron;
 
package Words is
 
   pragma Pure;
 
   -- The most fundamental fact about Word: width.
   Bitness     : constant Positive := Iron.MachineBitness;
 
   -- It is possible to calculate BitnessLog2 at elaboration time,
   -- but we will avoid having ~any~ elaboration at all in FFA.
   BitnessLog2 : constant Positive := Iron.MachineBitnessLog2;
 
   -- The Word width, expressed in ~bytes~:
   Byteness    : constant Positive := Bitness / Iron.ByteBits;
 
   -- What kind of words to use. Must be machine-word or smaller.
   type Word is mod 2**Bitness;
   for Word'Size use Bitness;
 
   -- The very same Word, but its only legal values are 0 and 1.
   subtype WBool is Word range 0 .. 1;
 
   -- When we must refer to individual bit positions of a machine word:
   subtype WBit_Index is Natural range 0 .. Bitness - 1;
 
end Words;

A WBool is simply a Word which is permitted to contain strictly Boolean (0, 1) values. The compiler and runtime will enforce this restriction.

GNAT needs a little bit of help in making the full set of bitwise operations available for use on Word.

For some peculiar reason, the Ada standards group made the fundamental Shift and Rotate bitwise ops into optional components.

Fortunately, on GNAT we can force them to exist (On a non-GNAT compiler, you’re on your own) :

w_shifts.ads:

with Words; use Words;
 
package W_Shifts is
 
   pragma Pure;
 
   function Shift_Left
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Shift_Left);
 
   function Shift_Right
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Shift_Right);
 
   function Rotate_Left
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Rotate_Left);
 
   function Rotate_Right
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Rotate_Right);
 
end W_Shifts;

Now let’s proceed to the most fundamental arithmetic operations on Words:

word_ops.ads:

with Words; use Words;
 
-- Fundamental Operations on Words:
package Word_Ops is
 
   pragma Pure;
 
   -- Branch-free calculation of 'carry' from a machine-word addition.
   function W_Carry(A : in Word; B : in Word; S : in Word)
                   return WBool;
 
   -- Branch-free calculation of 'borrow' from a machine-word subtraction.
   function W_Borrow(A : in Word; B : in Word; D : in Word)
                    return WBool;
 
   -- Without any branching: if Sel == 0, return A; if Sel == 1, return B.
   function W_Mux(A : in Word; B : in Word; Sel : in WBool)
                 return Word;
 
   -- Exchange A and B.
   procedure W_Swap(A : in out Word; B : in out Word);
 
end Word_Ops;

word_ops.adb:

with W_Shifts; use W_Shifts;
 
-- Fundamental Operations on Words:
package body Word_Ops is

But wait.

Ada — like C — does not give us any portable method of determining the over/underflow from output from machine arithmetic. And, it turns out, there existed in the past — and apparently exist today — CPUs made by idiots and wreckers (e.g. ‘RISC-V’) that do not have flags at all!

However, for multi-word addition, subtraction, the inner loop of Comba’s multiplication, and for a handful of other ops, we demand it.

So we derive the Carry or Borrow at the ‘eldest’ binary position, without using any inline assembly or special compiler features:

   -- Find the Carry, from an addition where it is known that A + B == S:
   function W_Carry(A : in Word; B : in Word; S : in Word)
                   return WBool is
   begin
      return WBool(Shift_Right((A and B) or ((A or B) and (not S)),
                               Bitness - 1));
   end W_Carry;
   pragma Inline_Always(W_Carry);
 
   -- Find the Borrow, from a subtraction where it is known that A - B == D:
   function W_Borrow(A : in Word; B : in Word; D : in Word)
                    return WBool is
   begin
      return WBool(Shift_Right(((not A) and B) or (((not A) or B) and D),
                               Bitness - 1));
   end W_Borrow;
   pragma Inline_Always(W_Borrow);

To derive the Carry from the addition S of words A and B, we take the highest bit ( i.e. Shift_Right( … , Bitness – 1) ) of the result of (A and B) or ((A or B) and (not S).

To derive the Borrow from the subtraction D of words A and B, we take the highest bit ( i.e. Shift_Right( … , Bitness – 1) ) of the result of ((not A) and B) or (((not A) or B) and D).

This derivation of the flags, strictly from the inputs and outputs of a machine-word addition or subtraction, is the single most tricky item in this article. Be prepared to spend some time drawing the circuit (of a full adder, with carry-in and carry-out) to verify the offered proof:

A+B+C is the output bit of 1-bit adder; C is carry-in;
A-B-C is the output bit of 1-bit subber; C is borrow-in.
Observe that A+B+C is equal to A-B-C for all A,B,C !
+-+-+-+-----+--------------+-----+----------------+
|           | 'Carry' out: |     | 'Borrow' out:  |
+-+-+-+-----+--------------+-----+----------------+
| | | |     |(a and b) or  |     |(~a and b) or   |
|A|B|C|A+B+C| ((a or b) and|A-B-C| ((~a or b) and |
| | | |     |    ~(A+B+C)) |     |    (A-B-C))    |
+-+-+-+-----+--------------+-----+----------------+
|0|0|0|  0  |      0       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|1|0|0|  1  |      0       |  1  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|1|0|  1  |      0       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|1|0|  0  |      1       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|0|1|  1  |      0       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|0|1|  0  |      1       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|1|1|  0  |      1       |  0  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|1|1|  1  |      1       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
--- This base case extends to any N bit register, since
--- both equations depend ~strictly~ on A, B, and C.

Now study how the Mux and Swap operations work. Observe that no transfer-of-control CPU instructions are required:

   -- Without any branching: if Sel == 0, return A; if Sel == 1, return B.
   function W_Mux(A : in Word; B : in Word; Sel : in WBool) return Word is
   begin
      return B xor ((Sel - 1) and (A xor B));
   end W_Mux;
   pragma Inline_Always(W_Mux);
 
   -- Exchange A and B.
   procedure W_Swap(A : in out Word; B : in out Word) is
   begin
      A := A xor B;
      B := A xor B;
      A := A xor B;
   end W_Swap;
   pragma Inline_Always(W_Swap);
 
end Word_Ops;

Now we are finally ready to define FZ — the fundamental FFA type: an unsigned integer of fixed width, i.e. a contiguous array of machine words.

fz_type.ads:

with Words; use Words;
 
package FZ_Type is
 
   pragma Pure;
 
   -- Indices of all indexable items:
   type Indices is new Natural;
 
   -- Index of a particular Word in an FZ:
   subtype Word_Index is Indices;
 
   -- The FZ, in person! I.e. a bignum of permanently fixed bitness.
   type FZ is array(Word_Index range <>) of Word;
 
   -- A count of Words in an FZ (cannot be 0):
   subtype Word_Count is Indices range 1 .. Indices'Last;
 
   -- An index of a particular ~bit~ in an FZ:
   subtype FZBit_Index is Indices;
 
end FZ_Type;

And finally, our first multi-word FFA arithmetical operations: ordinary addition and subtraction, with carry output.

Observe that a program which invokes FFA is expected to allocate memory of the proper size for each FFA integer; and that no such thing as normalization is permitted or permissible in FFA — it would result in branches-on-secrets, i.e. timing side-channels. An addition or a subtraction of a pair of FFA integers with no bits set (i.e. all bits, in each word, equal 0) takes exactly the same amount of time, and is performed via the same sequence of CPU instructions, as of two FFA integers with all bits set (i.e. all bits, in each word, equal 1), or of any other possible integers of the same bitness (i.e. number of words.) In other words, the algorithmic complexity of FFA’s arithmetical operations is always independent of the Hamming weights of the given operands.

fz_arith.ads:

with Words;   use Words;
with FZ_Type; use FZ_Type;
 
-- Fundamental Arithmetic operators on FZ:
package FZ_Arith is
 
   pragma Pure;
 
   -- Sum := X + Y; Overflow := Carry
   procedure FZ_Add(X          : in    FZ;
                    Y          : in    FZ;
                    Sum        : out   FZ;
                    Overflow   : out   WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length);
 
   -- Difference := X - Y; Underflow := Borrow
   procedure FZ_Sub(X          : in  FZ;
                    Y          : in  FZ;
                    Difference : out FZ;
                    Underflow  : out WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Difference'Length);
 
end FZ_Arith;

Observe that the preconditions for addition and subtraction demand equal bitnesses for pairwise FFA operators. This pattern will likewise hold in the algorithms we will discuss in later articles.

And, now at last, the FZ adder and FZ subtracter:

fz_arith.adb:

with Word_Ops; use Word_Ops;
 
-- Fundamental Arithmetic operators on FZ:
package body FZ_Arith is
 
   -- Sum := X + Y; Overflow := Carry
   procedure FZ_Add(X          : in  FZ;
                    Y          : in  FZ;
                    Sum        : out FZ;
                    Overflow   : out WBool) is
      Carry : WBool := 0;
   begin
      for i in X'Range loop
         declare
            A : constant Word := X(I);
            B : constant Word := Y(I);
            S : constant Word := A + B + Carry;
         begin
            Sum(i) := S;
            Carry  := W_Carry(A, B, S);
         end;
      end loop;
      Overflow := Carry;
   end FZ_Add;
   pragma Inline_Always(FZ_Add);
 
   -- Difference := X - Y; Underflow := Borrow
   procedure FZ_Sub(X          : in  FZ;
                    Y          : in  FZ;
                    Difference : out FZ;
                    Underflow  : out WBool) is
      Borrow : WBool := 0;
   begin
      for i in 0 .. Word_Index(X'Length - 1) loop
         declare
            A : constant Word := X(X'First + i);
            B : constant Word := Y(Y'First + i);
            S : constant Word := A - B - Borrow;
         begin
            Difference(Difference'First + i) := S;
            Borrow := W_Borrow(A, B, S);
         end;
      end loop;
      Underflow := Borrow;
   end FZ_Sub;
   pragma Inline_Always(FZ_Sub);
 
end FZ_Arith;

The alert reader might ask why the iteration schemes differ. This will be discussed in a subsequent chapter, when we walk through multiplication and the peculiarities of Ada’s “array slicing” construct.

Now let’s define a simple mechanism for printing FFA integers to the console:

ffa_io.ads:

with Words;   use Words;
with FZ_Type; use FZ_Type;
 
package FFA_IO is
 
   -- Character representation of a Word
   type WChars is array(1 .. 2 * Byteness) of Character;
 
   -- Obtain the WChars corresponding to the given Word
   function W_To_WChars(N : Word) return WChars;
 
   -- Display a hex representation of W to stdout
   procedure Dump(W : in Word);
 
   -- Display a hex representation of N to stdout
   procedure Dump(N : in FZ);
 
end FFA_IO;

ffa_io.adb:

with Ada.Text_IO; use Ada.Text_IO;
 
with Words;    use Words;
with W_Shifts; use W_Shifts;
with FZ_Type;  use FZ_Type;
 
package body FFA_IO is
 
   -- Obtain the WChars corresponding to the given Word
   function W_To_WChars(N : Word) return WChars is
      H      : constant array(0 .. 15) of Character := "0123456789ABCDEF";
      W      : Word := N;
      Result : WChars;
   begin
      for b in WChars'Range loop               -- From bottom to top:
         Result(B) := H(Natural(W and 16#F#)); -- Get current nibble.
         W         := Shift_Right(W, 4);       -- Get the next nibble.
      end loop;
      return Result;
   end W_To_WChars;
 
   -- Display a hex representation of W to stdout
   procedure Dump(W : in Word) is
      T : WChars := W_To_WChars(W);
   begin
      for i in reverse T'Range loop
         Put(T(i));
      end loop;
   end Dump;
 
   -- Display a hex representation of N to stdout
   procedure Dump(N : in FZ) is
   begin
      for i in reverse N'Range loop
         Dump(N(i));
      end loop;
   end Dump;
 
end FFA_IO;

Nothing particularly complicated here.

Now for Chapter 1’s demo routine itself:

demo_ch1.ads:

package Demo_Ch1 is
 
   procedure Demo_Add_Sub;
 
end Demo_Ch1;

In this Chapter, we do not yet have a means for programmatically setting the value of an FFA integer other than via Ada’s array assignment construct. But it will suffice for now, as we wish to simply explore addition and subtraction:

demo_ch1.adb:

-- From Ada:
with Ada.Text_IO; use Ada.Text_IO;
 
-- From FFA:
with Words;    use Words;
with FZ_Type;  use FZ_Type;
with FZ_Arith; use FZ_Arith;
 
-- From the Demo:
with FFA_IO;   use FFA_IO;
 
package body Demo_Ch1 is
 
   procedure Demo_Add_Sub is
 
      -- We are using 64-bit Words (see iron.ads).
      -- We'll begin with an elementary demo, using 256-bit FZ:
      X : FZ(1 .. 4) := (0,        0, 0, 0);
      Y : FZ(1 .. 4) := (16#5555#, 0, 0, 0);
      Z : FZ(1 .. 4) := (0,        0, 0, 0);
 
      -- Carry.
      C : WBool := 0;
 
   begin
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
 
      FZ_Add(X, Y, Z, C);
      Put_Line("X + Y     =");
      Dump(Z);
      New_Line;
      Put_Line("C         = " & WBool'Image(C));
 
      FZ_Sub(X, Y, Z, C);
      Put_Line("X - Y     =");
      Dump(Z);
      New_Line;
      Put_Line("C         = " & WBool'Image(C));
 
   end Demo_Add_Sub;
 
end Demo_Ch1;

Our “main” :

ffa_demo.adb:

with Demo_Ch1; use Demo_Ch1;
 
procedure FFA_Demo is
begin
 
   Demo_Add_Sub;
 
end FFA_Demo;

And finally, the “builder” in the ffademo directory.
This illustrates how the static library produced by libffa’s builder is put to use:

ffa_demo.gpr:

with "../libffa/ffa.gpr";
 
project FFA_Demo is
 
  for Object_Dir use "obj";
 
  type Mode_Type is ("debug", "release");
  Mode : Mode_Type := external ("mode", "release");
 
  for Languages   use ("Ada");
  for Source_Dirs use (".");
  for Exec_Dir    use "bin";
  for Main        use ("ffa_demo.adb");
 
  package Compiler is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ("-g");
        when "release" =>
           for Switches ("Ada")
             use ("-O2", "-fdump-scos", "-gnata", "-fstack-check",
                  "-fdata-sections", "-ffunction-sections");
     end case;
  end Compiler;
 
  package Binder is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-static");
     end case;
  end Binder;
 
  package Linker is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-Wl,--gc-sections",
                  "-static");
     end case;
  end Linker;
 
end FFA_Demo;

At this point, the reader should be reasonably confident that the Chapter 1 demo will not format his HDD; and perhaps even has a reasonable supposition as to what the output will be.

Let’s run it:

./bin/ffa_demo

And this is what you should see:

X         =
0000000000000000000000000000000000000000000000000000000000000000
Y         =
0000000000000000000000000000000000000000000000000000000000005555
X + Y     =
0000000000000000000000000000000000000000000000000000000000005555
C         =  0
X - Y     =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAAAB
C         =  1

Now it should be clear to the reader, that we are simply emulating a “stretched” version of something resembling a typical CPU’s ALU: a mechanism which arithmetizes on registers of fixed physical size.

~To be continued!~

The Peculiarly Quiet Decline and Fall of the KVM

Yes, that familiar little PC peripheral. The one that toggles a single keyboard/mouse/display ensemble between two or more connected machines.

They’ve quietly vanished from the market. And the fact appears to be discussed nowhere on the Net, at all. So I have seen it fit to make a note of it here.

What’s this, you say, they are still available? I’ll bite: who and where sells a KVM box that reliably works with recent ( 3840 × 2160 x 60 Hz ) display, and costs less than simply adding another such display would cost?

Every single claimed counterexample I’ve turned up, either has a kilometer of “this is rubbish, doesn’t work at all at the claimed 4K res” reviews hanging from it; or no reviews at all, on account of no one being stupid enough to purchase a KVM that costs more than adding two or three additional high-quality displays to the workstation.

I have no idea re: just why the KVM has gone into extinction. But it seems to have! And said demise is rather difficult to explain in purely technical terms, because signal switches with arbitrary bandwidth remain available: analogue relays (as found in, e.g., inexpensive oscilloscopes and elsewhere.) So what gives?


Edit: To all of the “switching an analogue 500MHz takes exotic and expensive FETs” people: somehow a mid-range — 300-400 $ — LCD quite routinely takes input from 2 (sometimes more) selectable Displayport jacks. But to get exactly the same functionality in an external KVM, is 700-1000 ? Why?

Posted in: Chumpatronics, Hardware, NonLoper by Stanislav 74 Comments

Sage SmartProbe FAQ


What’s a Sage SmartProbe?

See previous articles,
A Complete Pill for the Sage SmartProbe.
The Care and Feeding of the Sage SmartProbe.


Where do I get a Sage SmartProbe?

Possibly here — while supplies last. (Warning: I have no relation with this vendor and have no idea whether they still include the probe with “Gizmo 1″ kit. I do know for a fact that their latest “Gizmo 2″ product does not include it.)

SmartProbe is no longer made and at some point will only be available secondhand.


Who is selling devices comparable to SmartProbe ?

AFAIK nobody! Several vendors offer similar products but they do not support plain old GDB interface, and instead require you to buy multi-thousand-$ MS Windows closed shitware — sometimes with a mandatory maintenance contract !


Can anyone help me with sage_pill.py? When I run it, it gets through programming the FPGA, but then dies

This is the correct behaviour. The probe resets itself after it is cured by the pill and your serial connection to it will break.


How do I verify that the pill worked?

screen /dev/ttyACM0
(Or wherever your probe is connected) and hit return key a few times. You will see the string SmartProbe: 3.2_3743 (the last firmware revision released before the manufacturer went out of business.) And your probe will no longer die after N hours of use.


Do you have a pill for the Sage EDK software?

The EDK is rubbish, Java shitware, forget about it; use plain old GDB.


I got GDB working with the probe, but I can’t seem to get the probe to work with the board. If I plug in the probe to the board, the board hangs. If I try to continue I just get a bunch of ffffffs. If I do “monitor run” it says the probe is not connected ?

You plugged the cable into your motherboard backwards!


Help! GDB does strange things instead of working !

You must use version 7.8 of GDB or above, for the probe to work. Earlier versions had problems with the switches between 16, 32, and 64-bit modes of the AMD CPU.


Will Sage SmartProbe work with a PcEngines APU1 computer ?

Yes. Though you will have to solder the debug connector on yourself. (It is a 5 minute job. The header is a metric-spaced one, however, make sure to buy the right kind.)


Will Sage SmartProbe work with the recently-released PcEngines APU2 computer ?

No! on account of AMD’s PSP NSA-mandated Fritz Chip.


Does the Sage SmartProbe work with Intel CPU and motherboard?

No.


Will you help me with Intel’s XDP debugger for Intel CPU and motherboard?

No.


Why did Sage go out of business?

I don’t know. But AMD’s successful campaign of docs-withholding, stonewalling, and Fritz-chipping, to strangle Coreboot — probably did not help Sage’s life expectancy. Sage was a consultancy mainly specializing in Coreboot and custom third-party BIOS work.


Were you affiliated with Sage? Can you put me in touch with ex-Sage people ?

No. And no.


I have questions!

Try asking — politely — here.


Practical Lead to Gold Conversion; or an Intro to Opteron Resurrection.

It so happens that AMD resisted the informal NSA ban on the manufacture of LinuxBIOS-capable x86 CPUs slightly longer than Intel did. Or perhaps they were simply slower to succumb to Microsoft’s Fritz chip decree. Whichever may be the case, vintage (pre-2011, with some caveats, inquire within) AMD Opteron motherboards are a precious and increasingly scarce non-renewable resource. Is it possible to prolong the life of these devices? And when they finally die, to bring them back to life? Turns out, in some cases, the answer is yes!

Our patient today is a Tyan S2915 with nearly 11 years of continuous service. A true champ — but the sad day came: it would not come back up after a scheduled reboot; then, a week later, already staring at the dumpster, it deigned to switch on if “asked nicely”, sporadically. The culprit, in this case, was the same as in nearly every other “smoked” mobo: aluminum electrolytic caps. But which?

One useful fact to remember: wet-electrolytic capacitors age in direct proportion to temperature.

So we take a cheap thermal camera:

And, unsurprisingly, we find a bulging and slightly-leaky electrolytic near the hot (~80C, if running on a desk, 70C or so in well-ventilated case) northbridge:

And another, similarly-sad one near the PCI-E slot where the video card was mounted:

Now, this board is a vintage one, but unfortunately the lead-free solder scam was already in full swing at the time it was made:

And so what would otherwise have been a kindergarten-level repair job — desoldering and replacing two large capacitors — turns into a somewhat tricky and quite time-consuming exercise.

Even setting the iron to its maximum temperature (~250C):

… I found that the solder on the leads would not visibly begin to melt at all !

The melting point of Pb-free solder is considerably higher (how much higher — varies, by particular composition) than of traditional alloys, and a commonplace adjustable soldering iron simply will not work. (And if you were to use a specially-made, hotter iron, you would quite possibly damage the surrounding tracks on the PCB.)

So now let’s try a cheap but effective pill against the Eurocrat idiocy. We take a tube of good, old-fashioned pre-ban leaded solder, some wick, new capacitors:

Now add leaded solder, repeatedly, then wick it away. Again, and again. We end up slowly adding lead to the alloy until the melting point is manageable. It took me four shots of this, per capacitor leg, before either of the two sad caps would budge.

Now it is possible to install the new ones:

And we’re back in business! Here’s to another 11 years, old friend.

Still Tenser, Said the Censor.

Stay classy, YCombinator.

The Care and Feeding of the Sage SmartProbe.

Note: Please read the FAQ!!!

If you cured your Sage SmartProbe of its congenital disease as per the last article on the subject, you may now be wondering what to do with it.

The vendor supplied a massive Java shitware with the thing, which does not merit any discussion whatsoever. Instead, we will use the probe’s very spiffy GDB-compatible interface. Configure your GDB as follows:

gdbinit.txt:

### log all instructions
set logging on
set logging file gdb_out.txt
 
### only if you want to see the raw gdb packets...
#set debug remote 1
 
### if you're debugging the BIOS
set architecture i386
 
### if you are debugging a warmed-up OS
# set architecture i386:x86-64
 
### where the probe is:
target remote /dev/ttyACM0
 
### or, if it is connected to your LAN,
### let's say at 192.168.1.111,
# target remote 192.168.1.111:2159
 
# Show instructions on single-step
set disassemble-next-line on
 
# Disable evil, heretical GAS syntax
#set disassembly-flavor intel
 
### if you want the ncurses gui in gdb
# layout asm

So, for instance, let’s connect to a freshly-booted AMD G-series box spinning in Coreboot’s boot selector menu,

$ gdb --command=gdbinit.txt
GNU gdb (Gentoo 7.8.1 vanilla) 7.8.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word".
The target architecture is assumed to be i386
(gdb) si
0x001015ae in ?? ()
=&gt; 0x001015ae:  c3      ret
(gdb)
0x001035a9 in ?? ()
=&gt; 0x001035a9:  29 f0   sub    %esi,%eax
(gdb)
0x001035ab in ?? ()
=&gt; 0x001035ab:  19 fa   sbb    %edi,%edx
(gdb)
0x001035ad in ?? ()
=&gt; 0x001035ad:  39 ea   cmp    %ebp,%edx
(gdb)
0x001035af in ?? ()
=&gt; 0x001035af:  72 f3   jb     0x1035a4
(gdb)
0x001035b1 in ?? ()
=&gt; 0x001035b1:  77 04   ja     0x1035b7
(gdb)
0x001035b3 in ?? ()
=&gt; 0x001035b3:  39 d8   cmp    %ebx,%eax
(gdb)
0x001035b5 in ?? ()
=&gt; 0x001035b5:  72 ed   jb     0x1035a4
(gdb)
0x001035b7 in ?? ()
=&gt; 0x001035b7:  83 c4 0c        add    $0xc,%esp
(gdb)
0x001035ba in ?? ()
=&gt; 0x001035ba:  5b      pop    %ebx
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x001035b5 in ?? ()
=&gt; 0x001035b5:  72 ed   jb     0x1035a4
(gdb) q
A debugging session is active.
 
        Inferior 1 [Remote target] will be killed.
 
Quit anyway? (y or n) y

I will add that the probe also works great with IDA Pro’s GDB interface. Or whatever other, similar front-end you might fancy.

But! In order to make proper use of the probe, you will need the vendor-specific command set for manipulating the DC power, PCI bus, JTAG chain, and so forth. These were at one point published on the vendor’s site, which has vanished without a trace. I have made a cleaned-up HTML version:

Sage Smartprobe GDB Command Manual.

Note that I have discovered certain undocumented commands. These will be the subject of a later article!

How to Make Your Own Lamport Parachute from Common Household Materials.

The thought began, as many good things begin, in #trilema.

Users of the WOT, of V, and other systems where your cryptographic identity is wholly in your own hands1 live with a certain risk of “cryptographic death” – i.e. the compromise of one’s signing key. A conscientious user of public key crypto might keep the thought of this calamity somewhere in the back of his head, filed right next to “piano falling on my head” and other misfortunes from which there can be no return.

A correctly-generated RSA key is unlikely to be broken via cryptoanalytic means within the owner’s lifetime. But “never say never!” And there is always the possibility of a mass slaughter of keys, a “cryptocalypse”, where the public key cryptosystem of your choice is broken (or, via whatever mathematical advances, weakens, and the cost of deriving your private key from your public key becomes tractably low.)

The popular notion of key revocation is a questionable business because it attempts to impose a political structure onto an uncooperative mathematical reality — in the jargon of #trilema, it is “promisetronic, not protocolic.” Someone who takes possession of your signing key can walk around and befoul your reputation, ‘Who steals my purse steals trash; ’tis something, nothing; ‘Twas mine, ’tis his, and has been slave to thousands; But he that filches from me my good name…’ without regard to time or space – he can sign whatever he likes and attribute it to you. And those who have not received your revocation signal will accept the forgeries as the genuine article. Nothing whatsoever can be done to rescue a compromised signing key. But might it be possible to rescue its owner from WOT death by establishing an unforgeable continuity of identity ?

The answer is: “possibly.” The Bitcoin blockchain2 gives us a very strong mechanism for cryptographic timestamping3 – i.e. the act of certifying, to a skeptical observer, that some particular string of bits had indeed been in existence at a particular point in time. The current state-of-the-art implementation of this concept is called Deedbot, and the reader is invited to become familiar with it. If you make an arrangement with your associates in advance that a certain public key Deedbotted at a certain time is to be regarded as your emergency fallback, you have a kind of “parachute” on which you may conceivably float safely to the ground in the event of your plane’s wings, if you will, breaking off — i.e. a signing key compromise.4

However, the obvious caveat is that the “parachute” keypair must exist at the time of the Deedbotting, and its public key must be, well, public. This can be a problem in the scenario where the cryptosystem itself- e.g., RSA – is publicly broken. The enemy can now do as he pleases with your primary and fallback keys, and you – the cryptographic you – have died a permanent death. And all known public key signature schemes rely on unproven5 number-theoretic conjectures, and may conceivably fall this very night. All except for one…

L. Lamport’s 1979 “one time signature” algorithm rests on no number-theoretic conjecture, but merely on the strength of a collision-resistant hash of the operator’s choosing. On account of certain disadvantages which will soon become apparent to the reader, it is used virtually nowhere. But it so happens that it is entirely perfect for our “parachute” scenario.

A complete and usable implementation of this “Lamport parachute” is presented in this article. It is written in 68 69 lines of Bash and makes use strictly of commonplace userland utilities, such as may be met with on a typical Linux box. This somewhat strange choice of implementation language (it certainly does not shine, for instance, speedwise) is deliberate: the user is expected to understand each individual part, so that the function of the whole becomes unambiguously apparent, like 2+2. As the field of cryptography is slowly reconquered from the Schneiers and other scumbags presently maggoting on top of it, expect this type of didactic presentation to become ordinary practice.

A Lamport public key consists of two lists, let’s call them A and B, of randomly-generated bitstrings that have been hashed with a collision-resistant hash and thereafter published. To make use of such a key, the owner hashes the payload being signed, and for each bit that is equal to 0 in the resulting hash, reveals the original pre-hash string from column A, whereas if it is equal to 1 he reveals a pre-hash string from column B. For reasons which I hope are quite obvious to the alert reader, a Lamport public key is to be made use of only once. And such a key is quite bulky, as is the signature resulting from its use.6

But enough words! It is time for the deeds!

We shall generate7 our Lamport Parachute like this:

lamport_mkpriv.sh

#!/bin/bash
 
if [ $# -lt 2 ]
then
  echo "Usage: ./`basename $0` PAYLOADBITS STRENGTHBITS"
  exit 1
fi
 
for i in $(seq 1 $(($1 * 2))) ; do
    echo `od -N $(($2 / 8)) -An -t x1 -v /dev/random | tr -d " \n"` ;
done | xargs -n 2

All this does is to give us two columns of STRENGTHBITS-sized random integers, times PAYLOADBITS rows.

Specifically:

$ ./lamport_mkpriv.sh 256 512 > privkey.txt

This will create a parachute private key suitable for use with sha256 – i.e. it is able to sign a 256-bit payload. The strength – or length of the random strings – in this example is 512 bits. It is pointlessly suicidal to use a strength value below the output length of your chosen hash algorithm, but otherwise the choice is unconstrained, so long as it byte-aligns.8

Now we would like to generate the corresponding public key:

lamport_priv2pub.sh

#!/bin/bash
 
if [ $# -lt 1 ]
then
  echo "Usage: ./`basename $0` HASHUTIL < PRIVKEY.TXT > PUBKEY.TXT"
  exit 1
fi
 
for x in $(cat) ; do
    echo -n $x | xxd -r -p | $1 | cut -d ' ' -f1 ;
done | xargs -n 2

We simply apply the HASHUTIL of choice to the bitstrings produced by lamport_mkpriv.sh.

And it is done like this:

$ ./lamport_priv2pub.sh sha256sum < privkey.txt > pubkey.txt

Now we have the public key.

If you are generating a “battlefield”9 key, it is now time to consider its intended usage scenario. If the payload of the signature is known in advance – e.g., a “warrant canary”, or some other such thing – you will generate its signature immediately, and the private key may then be destroyed immediately.10

Otherwise the private key is to be retained, shielded stegatronically from prying eyes, and hidden safely far away from your usual places of business or habitation, or other locations liable to be searched by the enemy.

Let’s make a second public/private keypair of the same type:

$ ./lamport_mkpriv.sh 256 512 > another_privkey.txt
$ ./lamport_priv2pub.sh sha256sum < another_privkey.txt > another_pubkey.txt

And now we should like to make use of our Lamport Parachute.

Let’s sign a message:

lamport_encode.sh

#!/bin/bash
 
if [ $# -lt 1 ]
then
  echo "Usage: ./`basename $0` PRIVKEY.TXT < HEXPAYLOAD > ENCODED.TXT"
  exit 1
fi
 
payload=$(cat | tr '[:lower:]' '[:upper:]')
len=$((${#payload} * 4))
bits=$(printf "%*s" $len $(echo "ibase=16;obase=2;$payload" | bc | tr -d '\\\n') | tr ' ' 0)
 
while IFS= read -r p; do
    bit=${bits:0:1};
    bits=${bits:1};
    echo $p | cut -d ' ' -f$(($bit + 1));
done < "$1"

Here we simply iterate over the bits of the payload, choosing the unhashed original private strings from column A or B as discussed earlier.

Remember, we generated a private key that can carry a 256-bit payload. It does not necessarily have to be a hash, but in this example we will take one, e.g.:

$ echo "Attack at dawn." | sha256sum | cut -d ' ' -f1
 
1d6c270d7cc7e82a816ffb7bc3797d213b24d9d17af48f4b3b8d01fb43ed15c3
 
echo "Attack at dawn." | sha256sum | cut -d ' ' -f1 | ./lamport_encode.sh privkey.txt > encoded.txt

And we get a signature.

NOW YOU MUST DESTROY THE PRIVATE KEY! IT SHALL NEVER BE USED AGAIN!

Quite analogously to a Vernam "One Time Pad" -- the re-use of a Lamport private key is cryptographically suicidal.

And now some counterparty, with whom you have taken care to share your parachute's public key, wishes to verify the signature. It happens like this:

lamport_decode.sh

#!/bin/bash
 
if [ $# -lt 2 ]
then
  echo "Usage: ./`basename $0` HASHUTIL PUBKEY.TXT < ENCODED.TXT > HEXPAYLOAD"
  exit 1
fi
 
bits=""
while read -u 3 pkl; do
    if ! read el
    then
        break
    fi
    bits="$bits$(
        case $(echo -n $el | xxd -r -p | $1 | cut -d ' ' -f1) in
            $(echo $pkl | cut -d ' ' -f1)) echo "0" ;;
            $(echo $pkl | cut -d ' ' -f2)) echo "1" ;;
            *) exit 1; break ;;
            esac)"
    if [[ $? == 1 ]]
    then
        echo False >&2;
        exit 1
    fi
done 3< $2
 
padlen=$(od -N 1 /dev/zero | $1 | cut -d ' ' -f1 | tr -d " \n" | wc -c)
printf "%*s\n" $padlen $(echo "ibase=2;obase=10000;$bits" | bc  | tr -d '\\\n' | tr '[:upper:]' '[:lower:]') | tr ' ' 0

Here we simply take each bitstring from an encoded parachute message (the signature) -- hash it with the pre-agreed hashing algo; and determine which column of each respective row of the public key the result is found in. (If the answer is "neither", the decoder terminates and warns us.)

So, for instance:

$ ./lamport_decode.sh sha256sum pubkey.txt < encoded.txt
 
1d6c270d7cc7e82a816ffb7bc3797d213b24d9d17af48f4b3b8d01fb43ed15c3

We have decoded a payload. If we had agreed, in advance, that this payload is the sha256 hash of a document, e.g.:

$ diff < (./lamport_decode.sh sha256sum pubkey.txt < encoded.txt) <(echo "Attack at dawn." | sha256sum | cut -d ' ' -f1)

We are able to verify it.

Now let's try the same, but with the wrong public key:

$ diff < (./lamport_decode.sh sha256sum another_pubkey.txt < encoded.txt) <(echo "Attack at dawn." | sha256sum | cut -d ' ' -f1)
 
False
0a1
> 1d6c270d7cc7e82a816ffb7bc3797d213b24d9d17af48f4b3b8d01fb43ed15c3

Observe that it is possible to make use of any hashing algorithm11 you may happen to like, so long as you have it available as a command-line program.


Epilogue.

A V-genesis for these didactic examples is available12 here; Seal - here..



  1. As opposed to... the other kind. Where your self-appointed masters could impersonate you at their leisure if they so choose. 
  2. There exists, and can exist, precisely one blockchain.
  3. All other cryptographic timestamp schemes currently known rely on a "trusted third party." And a sane user trusts "third parties" no further than he can throw them! 
  4. If it so happens that you learn of your private key having been compromised before the consequences become irreparably grave, you should consider yourself extraordinarily lucky. Just ask, e.g., Admiral Isoroku Yamamoto. 
  5. The "hardness" -- i.e. the amount of computation required to break an RSA key, of whatever length, is unknown. And if your university professor taught you that it is known, or -- more egregiously -- that "it is equivalent to integer-factoring", then he is not merely an ignoramus but a particular kind of liar
  6. There exist certain questionable schemes to somewhat reduce this bulk, but I have deliberately eschewed them in light of the gravity of the task at hand. Good men may well die from your "reasonable" optimizations, your "negligible" losses of strength, loathesome hucksters
  7. Chances are that Lamport key generation, being an entropy-hungry affair, will take an unacceptably long time on your box. The temptation to replace "/dev/random" with "/dev/urandom" will be great. Needless to say that if the key is being generated for battlefield use, such a substitution is impermissible. But if you are following along simply for study, go ahead. Just remember to switch it back before crafting that missile launch key... 
  8. I.e. a multiple of 8. 
  9. I.e. for use in practice. 
  10. Along, if your situation is sufficiently grave to merit this, you will also cremate the very machine used in these operations! 
  11. One interesting tidbit is that public key signatures are only possible if trapdoor functions exist -- see Rompel, 1990. So, today, or 100 years from now, you're stuck choosing a hash algo. 
  12. Strictly under Popescu's License

Phuctored SSH Public Keys.

Phuctored SSH Public Keys to date.

Keys were obtained from a scan of the complete IPv4 space. We have gone approximately 20% of the way through the data set at the time of this writing.

Click on the IP addresses to view a key in Phuctor, or on the SSH hello string to view pertinent discussions.

A snapshot of ALL Phuctored keys at the time of writing can be viewed here, if the main site is slow under load.

(Feel free to ask questions !)


IP SSH Hello SSL Hello

112.16.4.21

SSH-1.99-Comware-5.20

112.16.65.245

112.16.65.247

112.16.95.66

SSH-1.99-RGOS_SSH_1.0

177.234.0.97

SSH-1.99-HUAWEI-1.5

177.234.11.229

SSH-1.99-HUAWEI-1.5

177.234.13.179

SSH-1.99-DOPRA-1.5

177.234.13.241

SSH-1.99-HUAWEI-1.5

177.234.15.27

177.234.1.73

SSH-1.99-HUAWEI-1.5

177.234.1.91

SSH-1.99-DOPRA-1.5

177.234.4.97

SSH-1.99-HUAWEI-1.5

177.234.6.21

SSH-1.99-HUAWEI-1.5

177.234.9.13

SSH-1.99-HUAWEI-1.5

183.246.69.24

SSH-1.99-Comware-5.20

187.188.126.28

187.72.155.221

SSH-2.0-AudioCodes
subject=/CN=ACL_3353352

187.72.216.78

SSH-2.0-AudioCodes
subject=/CN=ACL_3353177

189.112.138.254

189.203.181.149

SSH-1.99-DOPRA-1.5

189.203.181.7

SSH-1.99-DOPRA-1.5

189.203.72.147

SSH-1.99-DOPRA-1.5

190.39.56.107

197.221.61.38

SSH-2.0-OpenSSH_3.8.1p1 Debian-8.sarge.4

197.221.63.150

200.146.242.241

SSH-2.0-AudioCodes
subject=/CN=ACL_3348823

201.101.37.44

SSH-2.0-OpenSSH_6.0p1 Debian-4+deb7u2

201.16.189.38

201.249.207.150

SSH-1.99-HUAWEI-1.5
subject=/name=AR157-Self-Signed-Certificate-210235384810E4001320

202.166.221.118

2.116.209.1

SSH-1.99-DOPRA-1.5

217.57.196.177

SSH-1.99-DOPRA-1.5

217.57.196.179

SSH-1.99-DOPRA-1.5

223.94.78.217

SSH-1.99-Comware-5.20

59.21.182.242

SSH-2.0-OpenSSH_5.9
subject=/C=US/ST=California/L=Sunnyvale/O=Aerohive/OU=Default/CN=HiveAP

66.233.213.117

SSH-2.0-OpenSSH_5.2

68.14.242.210

SSH-2.0-OpenSSH_6.6
subject=/C=US/ST=Florida/L=Orlando/O=Creative Manager – AZ Office/CN=gnatbox.creative-manager.com/emailAddress=support@iccsllc.com

71.39.252.162

74.45.0.60

SSH-2.0-OpenSSH_4.1
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.228.159

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.228.160

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.228.49

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.228.86

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.228.97

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.229.217

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.231.125

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.231.136

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

74.45.231.156

SSH-2.0-OpenSSH_5.2
subject=/C=US/ST=California/L=Sunnyvale/O=Tropos Networks/OU=Manufacturing/CN=Tropos Router/emailAddress=support@tropos.com

78.188.162.184

SSH-2.0-ROSSSH

85.14.248.152

SSH-2.0-OpenSSH_6.0p1 Debian-4+deb7u3

85.218.41.103

SSH-2.0-OpenSSH_5.9
subject=/C=US/ST=California/L=Sunnyvale/O=Aerohive/OU=Default/CN=HiveAP

91.229.251.116

SSH-2.0-ROSSSH

94.141.150.121

SSH-2.0-ROSSSH

96.24.7.172

SSH-2.0-OpenSSH_5.2