“Finite Field Arithmetic.” Chapter 6: “Geological” RSA.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch6_simplest_rsa.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


The “mail bag” from Chapter 5… is empty. Congratulations, however, to the successful solvers of the Chapter 4 puzzle!


Now, errata! In the FFACalc extensions of Chapter 5, we have:

ffa_calc.adb:

               -- Multiply, give bottom and top halves
            when '*' =>
               Want(2);
               MustNotZero(Stack(SP));
               -- Ch5: slow and simple 'Egyptological' method:
               FZ_Mul_Egyptian(X     => Stack(SP - 1),
                               Y     => Stack(SP),
                               XY_Lo => Stack(SP - 1),
                               XY_Hi => Stack(SP));

After removal of the obvious bug, this becomes:

ffa_calc.adb:

               -- Multiply, give bottom and top halves
            when '*' =>
               Want(2);
               -- Ch5: slow and simple 'Egyptological' method:
               FZ_Mul_Egyptian(X     => Stack(SP - 1),
                               Y     => Stack(SP),
                               XY_Lo => Stack(SP - 1),
                               XY_Hi => Stack(SP));

In this Chapter, we will add a modular multiplication routine to FFA. This in turn will immediately let us write a modular exponentiation routine, which is the fundamental operation of the RSA system of asymmetric cryptography.

Concretely:

FZ_ModEx.ads:

with FZ_Type; use FZ_Type;
 
 
package FZ_ModEx is
 
   pragma Pure;
 
   -- Modular Multiply: Product := X*Y mod Modulus
   procedure FZ_Mod_Mul(X        : in  FZ;
                        Y        : in  FZ;
                        Modulus  : in  FZ;
                        Product  : out FZ);
   pragma Precondition(X'Length = Y'Length and
                         Modulus'Length = X'Length and
                         Product'Length = Modulus'Length);
 
   -- Modular Exponent: Result := Base^Exponent mod Modulus
   procedure FZ_Mod_Exp(Base     : in  FZ;
                        Exponent : in  FZ;
                        Modulus  : in  FZ;
                        Result   : out FZ);
   pragma Precondition(Base'Length = Exponent'Length and
                         Base'Length = Result'Length and
                         Base'Length = Modulus'Length);
 
end FZ_ModEx;

Because of the extremely simplistic and wholly-unoptimized way in which we implemented Multiplication and Division in Chapter 5, this will be a “geological-time” RSAtron, rather than a battlefield-ready one (though I can conceive of a few speed-insensitive applications where even an algorithm such as this one could suffice.) And at the end of this Chapter, we will answer the question of “exactly how geological” ?

The bulk of the material in the remaining Chapters in this series will concern itself with demonstrably-correct and safe (i.e. constant-spacetimeness-preserving) methods for speeding up Multiplication and Division — and consequently Modular Exponentiation — so as to transform this “geological” RSA into a generally-applicable one.

And so, without further delay, let’s define the new modular multiplication and modular exponentiation operators in the FFACalc system:

ffa_calc.adb:

               -- Modular Multiplication
            when 'M' =>
               Want(3);
               MustNotZero(Stack(SP));
               FZ_Mod_Mul(X       => Stack(SP - 2),
                          Y       => Stack(SP - 1),
                          Modulus => Stack(SP),
                          Product => Stack(SP - 2));
               Drop;
               Drop;
 
               -- Modular Exponentiation
            when 'X' =>
               Want(3);
               MustNotZero(Stack(SP));
               FZ_Mod_Exp(Base     => Stack(SP - 2),
                          Exponent => Stack(SP - 1),
                          Modulus  => Stack(SP),
                          Result   => Stack(SP - 2));
               Drop;
               Drop;

And now let’s implement the new unit, FZ_ModEx:

FZ_ModEx.adb:

with FZ_Basic; use FZ_Basic;
with FZ_Pred;  use FZ_Pred;
with FZ_Shift; use FZ_Shift;
with FZ_Mul;   use FZ_Mul;
with FZ_Divis; use FZ_Divis;
 
package body FZ_ModEx is

We will begin with the modular multiplication algorithm. It is deliberately made in the most straightforward way I could think of, making use only of “schoolboy” methods:

   -- Modular Multiply: Product := X*Y mod Modulus
   procedure FZ_Mod_Mul(X        : in  FZ;
                        Y        : in  FZ;
                        Modulus  : in  FZ;
                        Product  : out FZ) is
 
      -- The wordness of all three operands is equal:
      L     : constant Indices := X'Length;
 
      -- Double-width register for multiplication and modulus operations
      XY    : FZ(1 .. L * 2);
 
      -- To refer to the lower and upper halves of the working register:
      XY_Lo : FZ renames XY(1     .. L);
      XY_Hi : FZ renames XY(L + 1 .. XY'Last);
 
      -- A zero-padded double-wide copy of Modulus, to satisfy Ch.5's FZ_Mod
      M     : FZ(XY'Range);
 
   begin
      -- Place the Modulus in a double-width M, as FZ_Mod currently demands
      M(Modulus'Range)   := Modulus;
      M(L + 1 .. M'Last) := (others => 0);
 
      -- XY_Lo:XY_Hi := X * Y
      FZ_Mul_Egyptian(X, Y, XY_Lo, XY_Hi);
 
      -- XY := XY mod M
      FZ_Mod(XY, M, XY);
 
      -- The bottom half of XY is our modular product; top half is always 0
      Product := XY_Lo;
   end FZ_Mod_Mul;
   pragma Inline_Always(FZ_Mod_Mul);

This FZ_Mod_Mul multiplies X and Y using the barbaric FZ_Mul_Egyptian algorithm from Chapter 5; then subsequently divides XY — their product — by Modulus, and returns the remainder — using the FZ_Mod from Chapter 5.

Some of the wastefulness of the above FZ_Mod_Mul will immediately stand out to the reader. In subsequent Chapters, we will methodically strip it away. For now, we are concerned strictly with obtaining a minimal and correct baseline against which to work.

Now we can define a modular exponentiation algorithm:

   -- Modular Exponent: Result := Base^Exponent mod Modulus
   procedure FZ_Mod_Exp(Base     : in  FZ;
                        Exponent : in  FZ;
                        Modulus  : in  FZ;
                        Result   : out FZ) is
 
      -- Working register for the squaring
      B : FZ(Base'Range)     := Base;
 
      -- Register for cycling through the bits of E
      E : FZ(Exponent'Range) := Exponent;
 
      -- Register for the Mux operation
      T : FZ(Result'Range);
 
   begin
      -- Result := 1
      WBool_To_FZ(1, Result);
 
      -- For each bit of Result width:
      for i in 1 .. FZ_Bitness(Result) loop
 
         -- T := Result * B mod Modulus
         FZ_Mod_Mul(X => Result, Y => B, Modulus => Modulus,
                    Product => T);
 
         -- Sel is the current low bit of E;
         --    When Sel=0 -> Result := Result;
         --    When Sel=1 -> Result := T
         FZ_Mux(X => Result, Y => T, Result => Result,
                Sel => FZ_OddP(E));
 
         -- Advance to the next bit of E
         FZ_ShiftRight(E, E, 1);
 
         -- B := B*B mod Modulus
         FZ_Mod_Mul(X => B, Y => B, Modulus => Modulus,
                    Product => B);
 
      end loop;
 
   end FZ_Mod_Exp;
   pragma Inline_Always(FZ_Mod_Exp);

This is the traditional right-to-left binary method, as seen in e.g. Vol. 2 of D. E. Knuth’s AOCP. The only substantial departure from the schoolbook algorithm is the elimination of branching on secret bits. The multiplication step is always carried out, and FZ_Mux is used to either keep, or throw away the answer.

Specifically: Result is initially equal to 1. B initially equals the Base; E — the Exponent. Now we iterate for a fixed number of cycles, equal to the bitness of Result (which is also equal to the bitness of all three operands.)

In each iteration:

… We multiply Result with the current B, modulo the Modulus, and assign the result to temporary register T.

… And now, if the current bottom bit (determined via FZ_OddP) of E is a zero, we end up discarding (in constant-time) T — the output of the modular multiplication in the previous step, via FZ_Mux (See Chapter 1). But if this bottom bit is a one, then T is not discarded, and becomes the new value of Result.

… Now E, where we had placed the Exponent at the beginning, is cranked right-wise by one bit (i.e. divided by 2). The next highest bit will now be found at the bottom position, for use in the next iteration.

… Finally, the register B is squared modulo the Modulus.

After the final iteration, Result will contain Base raised to the Exponent power modulo the Modulus.

And that’s all for the new unit:

 
end FZ_ModEx;

Now let’s remember that, as in every other FFA operation, the CPU time consumed by our new modular multiplication and modular exponentiation algorithms is independent of the Hamming weights of the operands. Concretely, e.g., 1 to the 1st power mod 1:

time echo -E ".~.~.~.1.1.1X" | ./bin/ffa_calc 1024 32

… will take — and for any given bitness — the same, within the margin of error of your Unix time utility, amount of time to compute, as maxint to the maxint-th power modulo maxint:

time echo -E ".1.1.1.~.~.~X" | ./bin/ffa_calc 1024 32

Verify this to your satisfaction.

And now, for some empirical data. On a certain 3 GHz Opteron, timed and plotted logarithmically:


Cost of Ch.6 'X' Operation, vs FFA Bitness.
(Click here to see the raw numbers)

Perhaps not quite as “geological” as the reader expected! We can actually fire this primitive RSAtron in anger! Let’s verify the RSA seal of ffa_ch6_simplest_rsa.vpatch, the Chapter 6 code itself, using itself:

First, take the seal:

$ pgpdump -i ffa_ch6_simplest_rsa.vpatch.asciilifeform.sig

… and dump it to a human-readable form:

Old: Signature Packet(tag 2)(284 bytes)
	Ver 4 - new
	Sig type - Signature of a binary document(0x00).
	Pub alg - RSA Encrypt or Sign(pub 1)
	Hash alg - SHA512(hash 10)
	Hashed Sub: signature creation time(sub 2)(4 bytes)
		Time - Sat Jan 6 11:47:07 EST 2018
	Sub: issuer key ID(sub 16)(8 bytes)
		Key ID - 0xB98228A001ABFFC7
	Hash left 2 bytes - 62 55
	RSA m^d mod n(2047 bits) - 5b 6a 8a 0a cf 4f 4d b3 f8 2e ac 2d
20 25 5e 4d f3 e4 b7 c7 99 60 32 10 76 6f 26 ef 87 c8 98 0e 73 75 79
ec 08 e6 50 5a 51 d1 96 54 c2 6d 80 6b af 1b 62 f9 c0 32 e0 b1 3d 02
af 99 f7 31 3b fc fd 68 da 46 83 6e ca 52 9d 73 60 94 85 50 f9 82 c6
47 6c 05 4a 97 fd 01 63 5a b4 4b fb db e2 a9 0b e0 6f 79 84 ac 85 34
c3 86 13 74 7f 34 0c 18 17 6e 6d 5f 0c 10 24 6a 2f ce 3a 66 8e ac b6
16 5c 20 52 49 7c a2 ee 48 3f 4f d8 d0 6a 99 11 bd 97 e9 b6 72 05 21
d8 72 bd 08 ff 8d a1 1a 1b 8d b1 47 f2 52 e4 e6 9a e6 20 1d 3b 37 4b
17 1d f4 45 ef 2b f5 09 d4 68 fd 57 ce b5 84 03 49 b1 4c 6e 2a aa 19
4d 95 31 d2 38 b8 5b 8f 0d d3 52 d1 e5 96 71 53 9b 42 98 49 e5 d9 65
e4 38 bf 9e ff c3 38 df 9a ad f3 04 c4 13 0d 5a 05 e0 06 ed 85 5f 37
a0 62 42 28 09 7e f9 2f 6e 78 ca e0 cb 97
		-> PKCS-1

Now, obtain my public modulus:

$ pgpdump -i ~/.wot/asciilifeform.asc
Old: Public Key Packet(tag 6)(269 bytes)
	Ver 4 - new
	Public key creation time - Thu Dec 20 12:49:24 EST 2012
	Pub alg - RSA Encrypt or Sign(pub 1)
	RSA n(2048 bits) - cd d4 9a 67 4b af 76 d3 b7 3e 25 bc 6d f6
6e f3 ab ed dc a4 61 d3 cc b6 41 67 93 e3 43 7c 78 06 56 26 94 73 c2
21 2d 5f d5 ee d1 7a a0 67 fe c0 01 d8 e7 6e c9 01 ed ed f9 60 30 4f
89 1b d3 ca d7 f9 a3 35 d1 a2 ec 37 ea be ff 3f be 6d 3c 72 6d c6 8e
59 9e bf e5 45 6e f1 98 13 39 8c d7 d5 48 d7 46 a3 0a a4 7d 42 93 96
8b fb af cb f6 5a 90 df fc 87 81 6f ee 2a 01 e1 dc 69 9f 4d da bb 84
96 55 14 c0 d9 09 d5 4f da 70 62 a2 03 7b 50 b7 71 c1 53 d5 42 9b a4
ba 33 5e ab 84 0f 95 51 e9 cd 9d f8 bb 4a 6d c3 ed 13 18 ff 39 69 f7
b9 9d 9f b9 0c ab 96 88 13 f8 ad 4f 9a 06 9c 96 39 a7 4d 70 a6 59 c6
9c 29 69 25 67 ce 86 3b 88 e1 91 cc 95 35 b9 1b 41 7d 0a f1 4b e0 9c
78 b5 3a f9 c5 f4 94 bc f2 c6 03 49 ff a9 3c 81 e8 17 ac 68 2f 00 55
a6 07 bb 56 d6 a2 81 c1 a0 4c ef e1 
	RSA e(17 bits) - 01 00 01 
 
...~~~crapola snipped~~~...

But let us now “thank” the malignant idiocy of W. Koch (see also)… Because we will have to patch GPG in order to learn the RFC4880 turd that it uses as “the signature” to compare against when verifying a seal:

diff -uNr a/gnupg-1.4.10/cipher/rsa.c b/gnupg-1.4.10/cipher/rsa.c
--- a/gnupg-1.4.10/cipher/rsa.c	2008-12-11 11:40:06.000000000 -0500
+++ b/gnupg-1.4.10/cipher/rsa.c	2018-01-06 15:15:55.000000000 -0500
@@ -444,6 +444,16 @@
     result = mpi_alloc ( mpi_nlimb_hint_from_nbits (160) );
     public( result, data[0], &pk );
     rc = mpi_cmp( result, hash )? G10ERR_BAD_SIGN:0;
+
+    /* Enable dumping of raw sig verification operands: */
+    if( DBG_CIPHER ) {
+        log_mpidump("  data  = ", data[0] );
+        log_mpidump("  pk.n  = ", pk.n );
+        log_mpidump("  pk.e  = ", pk.e );
+        log_mpidump("  hash  = ", hash );
+        log_mpidump("  result= ", result );
+    }
+
     mpi_free(result);
 
     return rc;

So we built it; and now let’s run the seal verification with this debuggism:

$ ~/gpg-patched/gnupg-1.4.10/g10/gpg --debug-all --verify ch6.vpatch.sig ch6.vpatch

And obtain:

(From this point on, lines were wrapped and reindented for clarity:)

...~~~crapola snipped~~~...
 
gpg:   
data  =
5B6A8A0ACF4F4DB3F82EAC2D20255E4DF3E4B7C799603210766F26EF87C8980E737579
EC08E6505A51D19654C26D806BAF1B62F9C032E0B13D02AF99F7313BFCFD68DA46836E
CA529D7360948550F982C6476C054A97FD01635AB44BFBDBE2A90BE06F7984AC8534C3
8613747F340C18176E6D5F0C10246A2FCE3A668EACB6165C2052497CA2EE483F4FD8D0
6A9911BD97E9B6720521D872BD08FF8DA11A1B8DB147F252E4E69AE6201D3B374B171D
F445EF2BF509D468FD57CEB5840349B14C6E2AAA194D9531D238B85B8F0DD352D1E596
71539B429849E5D965E438BF9EFFC338DF9AADF304C4130D5A05E006ED855F37A06242
28097EF92F6E78CAE0CB97
 
gpg:   pk.n  =
 
CDD49A674BAF76D3B73E25BC6DF66EF3ABEDDCA461D3CCB6416793E3437C7806562694
73C2212D5FD5EED17AA067FEC001D8E76EC901EDEDF960304F891BD3CAD7F9A335D1A2
EC37EABEFF3FBE6D3C726DC68E599EBFE5456EF19813398CD7D548D746A30AA47D4293
968BFBAFCBF65A90DFFC87816FEE2A01E1DC699F4DDABB84965514C0D909D54FDA7062
A2037B50B771C153D5429BA4BA335EAB840F9551E9CD9DF8BB4A6DC3ED1318FF3969F7
B99D9FB90CAB968813F8AD4F9A069C9639A74D70A659C69C29692567CE863B88E191CC
9535B91B417D0AF14BE09C78B53AF9C5F494BCF2C60349FFA93C81E817AC682F0055A6
07BB56D6A281C1A04CEFE1 
 
gpg:   pk.e  = 10001 
 
gpg:
hash  =
1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051300
D0609608648016503040203050004406255509399A3AF322C486C770C5F7F6E05E18FC
3E2219A03CA56C7501426A597187468B2F71B4A198C807171B73D0E7DBC3EEF6EA6AFF
693DE58E18FF84395BE 
 
gpg:
result=
1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051300
D0609608648016503040203050004406255509399A3AF322C486C770C5F7F6E05E18FC
3E2219A03CA56C7501426A597187468B2F71B4A198C807171B73D0E7DBC3EEF6EA6AFF
693DE58E18FF84395BE
 
...~~~crapola snipped~~~...

And finally, let’s turn this into a FFACalc tape:

( the seal of ffa_ch6_simplest_rsa.vpatch.asciilifeform.sig : )
.
5B6A8A0ACF4F4DB3F82EAC2D20255E4DF3E4B7C799603210766F26EF87C8980E737579
EC08E6505A51D19654C26D806BAF1B62F9C032E0B13D02AF99F7313BFCFD68DA46836E
CA529D7360948550F982C6476C054A97FD01635AB44BFBDBE2A90BE06F7984AC8534C3
8613747F340C18176E6D5F0C10246A2FCE3A668EACB6165C2052497CA2EE483F4FD8D0
6A9911BD97E9B6720521D872BD08FF8DA11A1B8DB147F252E4E69AE6201D3B374B171D
F445EF2BF509D468FD57CEB5840349B14C6E2AAA194D9531D238B85B8F0DD352D1E596
71539B429849E5D965E438BF9EFFC338DF9AADF304C4130D5A05E006ED855F37A06242
28097EF92F6E78CAE0CB97
 
( my public rsa exponent : )
.
10001
 
( my public rsa modulus : )
.
CDD49A674BAF76D3B73E25BC6DF66EF3ABEDDCA461D3CCB6416793E3437C7806562694
73C2212D5FD5EED17AA067FEC001D8E76EC901EDEDF960304F891BD3CAD7F9A335D1A2
EC37EABEFF3FBE6D3C726DC68E599EBFE5456EF19813398CD7D548D746A30AA47D4293
968BFBAFCBF65A90DFFC87816FEE2A01E1DC699F4DDABB84965514C0D909D54FDA7062
A2037B50B771C153D5429BA4BA335EAB840F9551E9CD9DF8BB4A6DC3ED1318FF3969F7
B99D9FB90CAB968813F8AD4F9A069C9639A74D70A659C69C29692567CE863B88E191CC
9535B91B417D0AF14BE09C78B53AF9C5F494BCF2C60349FFA93C81E817AC682F0055A6
07BB56D6A281C1A04CEFE1 
 
( now modularly-exponentiate it : )
X
 
( ... and print the output .)
#

… and play the tape! :

$ time  cat ch6_rsa_tape.txt | ./bin/ffa_calc 2048 32

… and unsurprisingly:

 
0001FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051
300D0609608648016503040203050004406255509399A3AF322C486C770C5F7F6E05E1
8FC3E2219A03CA56C7501426A597187468B2F71B4A198C807171B73D0E7DBC3EEF6EA6
AFF693DE58E18FF84395BE
 
 
real    0m27.956s 
user    0m27.927s 
sys     0m0.003s

Guess what? The correct answer. Took an entire half-minute, however, to complete on this (same as where the log plot was made) machine.

In the next Chapter, we will investigate some elementary improvements in efficiency!

~To be continued!~

“Finite Field Arithmetic.” Chapter 5: “Egyptological” Multiplication and Division.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.



Эй, сюда плывите, рыбки,
Сом Считает без ошибки!
Сом И делит,
Сом И множит,
Сом Сложить и вычесть Может!

Борис Заходер. СОМОМНЕНИЕ


You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch5_egypt.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


First, let’s go through the “mail bag” from Chapter 4:

Reader Apeloyee observed that FZ_Valid_Bitness_P can be made slightly shorter and clearer. I have taken the more conservative of his two suggestions.

The rewritten procedure now looks like this:

   -- Determine if a proposed FFA Bitness is valid.
   function FZ_Valid_Bitness_P(B : in Positive) return Boolean is
      Result   : Boolean := False;
      T        : Natural := B;
      PopCount : Natural := 0;
   begin
      -- Supposing we meet the minimal bitness:
      if B >= FZ_Minimal_Bitness then
         while T > 0 loop
            PopCount := PopCount + T mod 2;
            T := T / 2;
         end loop;
 
         -- Is B a power of 2?
         if PopCount = 1 then
            Result := True;
         end if;
      end if;
 
      return Result;
   end FZ_Valid_Bitness_P;

I would like to thank again the eagle-eyed Apeloyee for his attentive and skeptical reading.


And now for the solution to the Chapter 4 Puzzle:

ch4_official_solution.txt

Readers who have written solutions of their own to this puzzle, are invited to post them as comments to this post (please try to avoid spoiling it for those who have not yet reached Chapter 5.)


In Chapter 4, we wrote a small and very limited RPN calculator, FFACalc, capable only of addition, subtraction, basic I/O, and a few stack motion operations. In Chapter 5, we will give it the ability to multiply and divide.

The division and multiplication routines in FFA will extend across more than one chapter in this article series, on account of the wide spectrum of possible tradeoffs between simplicity and speed in the known approaches to both problems. We will begin at the extreme of simplicity at all cost. And, as the reader may already have guessed from the title of this Chapter, we will begin with division and multiplication algorithms that were known to the ancient Egyptians.

But first, a few small helper-function additions to LibFFA.

A convenient WBool (See Chapter 1) inverter:

w_pred.ads:

   -- Return WBool-complement of N.
   function W_Not(N : in WBool) return WBool;

w_pred.adb:

   -- Return WBool-complement of N.
   function W_Not(N : in WBool) return WBool is
   begin
      return 1 xor N;
   end W_Not;
   pragma Inline_Always(W_Not);

A convenient function for determining the bitness of an FZ integer:

fz_type.ads:

   -- A count of Bits, anywhere (cannot be 0):
   subtype Bit_Count is Positive;

fz_basic.ads:

   -- Determine the Bitness of N
   function FZ_Bitness(N : in FZ) return Bit_Count;

fz_basic.adb:

   -- Determine the Bitness of N
   function FZ_Bitness(N : in FZ) return Bit_Count is
   begin
      return N'Length * Words.Bitness;
   end FZ_Bitness;
   pragma Inline_Always(FZ_Bitness);

And finally, a mechanism for constant-time conditional addition:

fz_arith.ads:

   -- Gate = 1: Sum := X + Y; Overflow := Carry
   -- Gate = 0: Sum := X;     Overflow := 0
   procedure FZ_Add_Gated_O(X          : in  FZ;
                            Y          : in  FZ;
                            Gate       : in  WBool;
                            Sum        : out FZ;
                            Overflow   : out WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length);

fz_arith.adb:

   procedure FZ_Add_Gated_O(X          : in  FZ;
                            Y          : in  FZ;
                            Gate       : in  WBool;
                            Sum        : out FZ;
                            Overflow   : out WBool) is
      Carry : WBool := 0;
      Mask  : constant Word := 0 - Gate;
   begin
      for i in 0 .. Word_Index(X'Length - 1) loop
         declare
            A : constant Word := X(X'First + i);
            B : constant Word := Y(Y'First + i) and Mask;
            S : constant Word := A + B + Carry;
         begin
            Sum(Sum'First + i) := S;
            Carry  := W_Carry(A, B, S);
         end;
      end loop;
      Overflow := Carry;
   end FZ_Add_Gated_O;
   pragma Inline_Always(FZ_Add_Gated_O);

This is something like a hybrid between FZ_Add (See Chapter 1) and FZ_Mux. If Gate is equal to 1, Mask is set to “maxint” of our Word, i.e. 11111….1; consequently the quantity B is unmolested, and the routine behaves exactly like FZ_Add. However, if Gate is equal to 0, Mask is then also equal to 0 and B is forced to also equal 0 during each iteration of the loop; the resulting addition is a constant-time “no-op”.

Work out the behaviour of this algorithm on a sheet of paper, as you have done for the material in earlier chapters.


And at last, we are ready for “Egyptological” division:

fz_divis.ads:

with FZ_Type; use FZ_Type;
 
 
package FZ_Divis is
 
   pragma Pure;
 
   -- Dividend is divided by Divisor, producing Quotient and Remainder.
   -- WARNING: NO div0 test here! Caller must test.
   procedure FZ_IDiv(Dividend  : in  FZ;
                     Divisor   : in  FZ;
                     Quotient  : out FZ;
                     Remainder : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Quotient'Length = Remainder'Length and
                         Dividend'Length = Quotient'Length);
 
   -- Exactly same thing as IDiv, but keep only the Quotient
   procedure FZ_Div(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Quotient  : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Dividend'Length = Quotient'Length);
 
   -- Exactly same thing as IDiv, but keep only the Remainder
   procedure FZ_Mod(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Remainder : out FZ);
   pragma Precondition(Dividend'Length = Divisor'Length and
                         Dividend'Length = Remainder'Length);
 
end FZ_Divis;

This algorithm is quite exactly analogous to the “long division” everyone learns in grade school. The “Egyptian” variant is also known as “restoring division” and is discussed as such in, e.g., Vol. 2 of D. E. Knuth’s “Art of Programming”. Observe that, just like every other FFA operation, this procedure does not branch on the operand bits. This is essential in any routine intended for use in cryptographic arithmetic.

The mechanics are quite simple: the Dividend is shifted into a register R “head-first”, and the Divisor is repeatedly subtracted from R. If the subtraction results in an underflow, this is taken to signify that the Divisor “did not go in”, and the subtraction is then undone, in constant-time (via the conditional-adder introduced earlier in this Chapter.) In the latter case, a 0 is revealed as the current bit of the Quotient — whose bits are found backwards, from last (highest) to first (lowest); in the opposite case, a 1 is written. The Remainder obtained from the division is found in R at the end of the procedure, which requires a number of iterations equal to the bitness of the Dividend:

fz_divis.adb:

with Words;    use Words;
with W_Pred;   use W_Pred;
with FZ_Basic; use FZ_Basic;
with FZ_Arith; use FZ_Arith;
with FZ_BitOp; use FZ_BitOp;
with FZ_Shift; use FZ_Shift;
 
 
package body FZ_Divis is
 
   -- Dividend is divided by Divisor, producing Quotient and Remainder.
   -- WARNING: NO div0 test here! Caller must test.
   procedure FZ_IDiv(Dividend  : in  FZ;
                     Divisor   : in  FZ;
                     Quotient  : out FZ;
                     Remainder : out FZ) is
 
      -- The working register
      QR : FZ(1 .. Dividend'Length + Divisor'Length);
 
      -- Bottom seg of Z will end up containing the Quotient; top - remainder
      Q  : FZ renames QR(1                   .. Dividend'Length); -- Quotient
      R  : FZ renames QR(Dividend'Length + 1 .. QR'Last);         -- Remainder
 
      C : WBool := 0; -- Borrow, from comparator
   begin
      Q := Dividend; -- Q begins with the Dividend
      FZ_Clear(R);   -- R begins empty
 
      -- For each bit of Dividend:
      for i in 1 .. FZ_Bitness(Dividend) loop
 
         -- Advance QR by 1 bit:
         FZ_ShiftLeft(QR, QR, 1);
 
         -- Subtract Divisor from R; Underflow goes into C
         FZ_Sub(X => R, Y => Divisor, Difference => R, Underflow => C);
 
         -- If C=1, subtraction underflowed, and then Divisor gets added back:
         FZ_Add_Gated(X => R, Y => Divisor, Gate => C, Sum => R);
 
         -- Current result-bit is equal to Not-C, i.e. 1 if Divisor 'went in'
         FZ_Or_W(Q, W_Not(C));
 
      end loop;
 
      Quotient  := Q; -- Output the Quotient.
      Remainder := R; -- Output the Remainder.
 
   end FZ_IDiv;
   pragma Inline_Always(FZ_IDiv);

The above is, to my knowledge, the simplest practical integer division algorithm (simple repeated-subtraction is not practical for large numbers and will not be discussed.) There are several substantial improvements that can be made to it, and we will visit them in later chapters. However, it is worth noting that Division is the most expensive of the familiar arithmetical operations, and — unlike multiplication — no order-of-magnitude speedup for it is known.

Now we will define remainderless division, and modulus operations. These can be optimized to an extent, but their simplest possible formulation is the trivial one below, written in terms of FZ_IDiv:

   -- Exactly same thing as IDiv, but keep only the Quotient
   procedure FZ_Div(Dividend  : in  FZ;
                    Divisor   : in  FZ;
                    Quotient  : out FZ) is
      Remainder : FZ(Divisor'Range);
      pragma Unreferenced(Remainder);
   begin
      FZ_IDiv(Dividend, Divisor, Quotient, Remainder);
   end FZ_Div;
   pragma Inline_Always(FZ_Div);
 
 
   -- Exactly same thing as IDiv, but keep only the Remainder
   procedure FZ_Mod(Dividend  : in FZ;
                    Divisor   : in FZ;
                    Remainder : out FZ) is
      Quotient : FZ(Dividend'Range);
      pragma Unreferenced(Quotient);
   begin
      FZ_IDiv(Dividend, Divisor, Quotient, Remainder);
   end FZ_Mod;
   pragma Inline_Always(FZ_Mod);
 
end FZ_Divis;

And, now: “Egyptological” multiplication.

fz_mul.ads:

with FZ_Type; use FZ_Type;
 
 
package FZ_Mul is
 
   pragma Pure;
 
   -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
   procedure FZ_Mul_Egyptian(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ);
   pragma Precondition(X'Length = Y'Length and
                         XY_Lo'Length = XY_Hi'Length and
                         XY_Lo'Length = ((X'Length + Y'Length) / 2));
 
end FZ_Mul;

Observe that it is “symmetric” with the Division, i.e. the exact same process “in reverse gear”:

fz_mul.adb:

with Words;    use Words;
with FZ_Basic; use FZ_Basic;
with FZ_Pred;  use FZ_Pred;
with FZ_Arith; use FZ_Arith;
with FZ_Shift; use FZ_Shift;
 
 
package body FZ_Mul is
 
   -- 'Egyptological' multiplier. XY_Lo and XY_Hi hold result of X*Y.
   procedure FZ_Mul_Egyptian(X     : in  FZ;
                             Y     : in  FZ;
                             XY_Lo : out FZ;
                             XY_Hi : out FZ) is
 
      -- Register holding running product
      XY : FZ(1 .. X'Length + Y'Length);
 
      -- X-Slide
      XS : FZ(1 .. X'Length + Y'Length);
 
      -- Y-Slide
      YS : FZ(Y'Range) := Y;
 
   begin
      -- Product register begins empty
      FZ_Clear(XY);
 
      -- X-Slide initially equals X:
      XS(1            .. X'Length) := X;
      XS(X'Length + 1 .. XS'Last)  := (others => 0);
 
      -- For each bit of Y:
      for i in 1 .. FZ_Bitness(Y) loop
 
         -- If lowest bit of Y-Slide is 1, X-Slide is added into XY
         FZ_Add_Gated(X    => XY, Y => XS, Sum => XY,
                      Gate => FZ_OddP(YS));
 
         -- X-Slide := X-Slide * 2
         FZ_ShiftLeft(XS, XS, 1);
 
         -- Y-Slide := Y-Slide / 2
         FZ_ShiftRight(YS, YS, 1);
 
      end loop;
 
      -- Write out the Product's lower and upper FZs:
      XY_Lo := XY(1                .. XY_Lo'Length);
      XY_Hi := XY(XY_Lo'Length + 1 .. XY'Last);
 
   end FZ_Mul_Egyptian;
   pragma Inline_Always(FZ_Mul_Egyptian);
 
end FZ_Mul;

We begin with a working register XY, initially zero, wide enough to hold the product; a “X-Slide” XS, which initially equals multiplicand X; and a “Y-Slide”, YS, which initially equals multiplicand Y. XS slides “upwards”, and so is given room so as to be able to fit the uppermost possible bit of the product. YS slides “downward” and is therefore Y-sized.

The procedure requires a number of iterations equal to the bitness of Y (which, the attentive reader will observe, in current FFA will always equal the bitness of X — but why not specify the general case?) In each iteration, XS is conditionally added in constant time to the running product XY. The condition is determined by the lowest bit of current YS. At the end of each iteration, XS slides “up”, i.e. is multiplied by two, while YS slides “down”, i.e. is divided by two.

After the required number of iterations, XY will equal the product of the multiplicands X and Y. The two halves of this product, FZ integers equal in their bitness to that of the multiplicands, are finally written to XY_Lo and XY_Hi — the lower and upper halves respectively.

Observe that this multiplier runs in O(N^2) — FZ_Add being an O(N) operation, N being the current FZ bitness. This “bearded” algorithm is impractically slow for battlefield use, in e.g. RSA. However it is unbeatable in simplicity. The reader should be able to fully understand why it cannot fail, and why the result will be emitted in constant time (i.e. the required work is wholly independent of the Hamming weights of the multiplicands) and occupy exactly the allotted space.

In subsequent chapters, we will explore several other multiplication algorithms, which make tradeoffs in simplicity (but however pointedly without compromising constant-time operation!) and gain considerable improvements in speed.


Now, let’s add all of the new operations to FFACalc:

First, a helper function that prevents division by zero:

ffa_calc.adb:

      -- Ensure that a divisor is not zero
      procedure MustNotZero(D : in FZ) is
      begin
         if FZ_ZeroP(D) = 1 then
            E("Division by Zero!");
         end if;
      end MustNotZero;

The Division-with-Remainder, Integer Division, Modulus, and Multiplication operations:

First, ‘\’: expects a Dividend, and on top of it, a Divisor; and leaves the stack containing in their place a Quotient, and on top of it, a Remainder:

ffa_calc.adb:

               -- Divide and give Quotient and Remainder
            when '\' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_IDiv(Dividend  => Stack(SP - 1),
                       Divisor   => Stack(SP),
                       Quotient  => Stack(SP - 1),
                       Remainder => Stack(SP));

Then, ‘/’: exactly the same thing as the above, but leaves only the Quotient on the stack:

               -- Divide and give Quotient only
            when '/' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_Div(Dividend  => Stack(SP - 1),
                      Divisor   => Stack(SP),
                      Quotient  => Stack(SP - 1));
               Drop;

And, ‘%’: this is the familiar modulus operation; it performs division but leaves only the remainder on the stack:

               -- Divide and give Remainder only
            when '%' =>
               Want(2);
               MustNotZero(Stack(SP));
               FZ_Mod(Dividend  => Stack(SP - 1),
                      Divisor   => Stack(SP),
                      Remainder => Stack(SP - 1));
               Drop;

Finally, multiplication, using the very slow Egyptian method. Observe that multiplication of two integers results in a product having as its bitness the sum of their bitnesses. Therefore FFACalc multiplication places two FZ integers on the stack, the lower half of the result, followed by the upper half. If the latter is unwanted, it must be dropped using the Drop (”_”) operation (See Chapter 4.) The Overflow flag is in all cases unaffected, as there is room for the entire product — its lower and upper halves occupy the same places on the stack which formerly held the multiplicands.

               -- Multiply, give bottom and top halves
            when '*' =>
               Want(2);
               MustNotZero(Stack(SP));
               -- Ch5: slow and simple 'Egyptological' method:
               FZ_Mul_Egyptian(X     => Stack(SP - 1),
                               Y     => Stack(SP),
                               XY_Lo => Stack(SP - 1),
                               XY_Hi => Stack(SP));

Now let’s build the improved FFACalc:

cd ffacalc
gprbuild

And try out the new operations:

$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\##" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F
000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0/#" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000000101D96F0815F0853E1D8D883B93F2D9
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0%#" | ./bin/ffa_calc 256 32
 
000000000000000000000000000000A1398B1FBC8872F908C44763C5427EBD0F
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0\#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0/#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.0%#" | ./bin/ffa_calc 256 32
Pos: 67: Division by Zero!
 
$ echo -n -E ".FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F.F88A509A858786B366D16B9D65BF3991D0\.A1398B1FBC8872F908C44763C5427EBD0F={[R OK]}{[R SAD]}_[; ].101D96F0815F0853E1D8D883B93F2D9={[Q OK]}{[Q SAD]}_[; ]" | ./bin/ffa_calc 256 32
 
R OK; Q OK;
 
$ echo -n -E ".FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF * ##  " | ./bin/ffa_calc 256 32 
 
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE
0000000000000000000000000000000000000000000000000000000000000001

Homework Problems, in order of difficulty:

  • Write a FFACalc tape which obtains two numbers from the top of the stack, multiplies them, then alternatingly divides the obtained product by each of the original multiplicands, and thereby verifying that both the multiplier and the divider work as expected. The message “OK” should be printed in the event of successful test of all four new FFACalc functions, and “Sad” in case of the opposite. “Sabotage” and rebuild FFACalc to produce the necessary erroneous arithmetic for this test.
  • Find the elementary optimization that cuts the constant factor in the run-times of both FZ_Mul_Egyptian and FZ_IDiv by a third. Hint: one of the O(N) suboperations in both of them, can be replaced with a O(1) one! But which?
  • Show how to cut the required CPU time of the improved, from the previous problem, algorithm, by another quarter — at the expense of a small amount of complexity, but without fundamentally departing from either of the two “Egyptological” algorithms.
  • Prove that the provided Division and Multiplication operations, work.

~To be continued!~

“Finite Field Arithmetic.” Chapter 4: Interlude: FFACalc.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ch4_ffacalc.vpatch.

Just like before, you will end up with two directories, libffa and ffademo.
However you will also see a new one, ffacalc.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


As the title of this chapter suggests, it will not introduce fundamentally new FFA material. Instead, you will meet FFACalc — a program which makes practical use of the routines presented in Chapters 1, 2, and 3. Henceforth every Chapter in this series will build on FFACalc, rather than continuing to expand the rather-uninteresting ffademo. When we reach the final Chapter, the reader will notice that… but let’s not spoil it!

For now, FFACalc is exactly what the name implies: a FFAtronic RPN calculator, capable strictly of addition, subtraction, basic bitwise ops, numeric comparison, a small number of simple stack manipulations (a la Forth), and some elementary I/O.

But first, a few small helper-function additions to libFFA.

Calculators need to accept numeric input, and it is to be processed one hexadecimal digit at a time. Therefore a nibble subtype of Word is called for:

words.ads:

   -- Word, restricted to Nibble range.
   subtype Nibble is Word range 0 .. 16#F#;

Predicate operators produce strictly WBool (see Chapter 1) outputs. FFACalc will operate strictly in FZ integers. Therefore a conversion is required:

fz_basic.ads:

   -- Set given FZ to a given truth value
   procedure WBool_To_FZ(V : in WBool; N : out FZ);

fz_basic.adb:

   -- Set given FZ to a given truth value
   procedure WBool_To_FZ(V : in WBool; N : out FZ) is
   begin
      FZ_Clear(N);
      FZ_Set_Head(N, V);
   end WBool_To_FZ;
   pragma Inline_Always(WBool_To_FZ);

Sometimes, we will need to go in the other direction, and produce a WBool from an FZ, based on the Boolean meaning of its value (i.e. whether it is a nonzero.) This will look like this:

w_pred.ads:

   -- Return 1 if N is unequal to 0; otherwise return 0.
   function W_NZeroP(N : in Word) return WBool;

w_pred.adb:

   -- Return 1 if N is unequal to 0; otherwise return 0.
   function W_NZeroP(N : in Word) return WBool is
   begin
      return 1 xor W_ZeroP(N);
   end W_NZeroP;
   pragma Inline_Always(W_NZeroP);

Now since we are introducing a program where the user is able to control the width of an instantiated FFAtron, we will need a validity predicate. FFA “bitness” is not an arbitrary positive number, but must be an integer multiple of all historical machine word sizes, and additionally a power of two (the reason for the latter restriction will become apparent in the Sub-Quadratic Multiplication Chapter.) And so:

fz_lim.ads:

package FZ_Lim is
 
   pragma Pure;
 
   FZ_Minimal_Bitness : constant Positive := 256;
 
   FZ_Validity_Rule_Doc : constant String
     := "Must be greater than or equal to 256, and a power of 2.";
 
   -- Determine if a proposed FFA Bitness is valid.
   function FZ_Valid_Bitness_P(B : in Positive) return Boolean;
 
end FZ_Lim;

fz_lim.adb:

package body FZ_Lim is
 
   -- Determine if a proposed FFA Bitness is valid.
   function FZ_Valid_Bitness_P(B : in Positive) return Boolean is
      Result   : Boolean := False;
      T        : Natural := B;
      PopCount : Natural := 0;
   begin
      -- Supposing we meet the minimal bitness:
      if B >= FZ_Minimal_Bitness then
         while T > 0 loop
            if T mod 2 = 1 then
               PopCount := PopCount + 1;
            end if;
            T := T / 2;
         end loop;
 
         -- Is B a power of 2?
         if PopCount = 1 then
            Result := True;
         end if;
      end if;
 
      return Result;
   end FZ_Valid_Bitness_P;
 
end FZ_Lim;

In a power of 2, the “popcount” (number of 1s) is necessarily 1. The mechanics of this routine should be apparent to the alert reader, nothing more will be said of it.

And that’s all for LibFFA, for the time being. Now returning to FFACalc: we will shed the old ffademo’s dependence on Ada’s standard I/O library, in favour of a more compact approach:

os.ads:

with Interfaces;   use Interfaces;
with Interfaces.C; use Interfaces.C;
 
 
package OS is
 
   -- Receive a character from the TTY, and True if success (False if EOF)
   function Read_Char(C : out Character) return Boolean;
 
   -- Send a character to the TTY.
   procedure Write_Char(C : in Character);
 
   -- Send a Newline to the TTY.
   procedure Write_Newline;
 
   -- Exit with an error condition report.
   procedure Eggog(M : String);
 
   procedure Quit(Return_Code : Integer);
   pragma Import
     (Convention    => C,
      Entity        => Quit,
      External_Name => "exit");
 
private
 
   -- POSIX stdio:   
   EOF : constant int := -1;
 
   function GetChar return int;
   pragma Import(C, getchar);
 
   function PutChar(item: int) return int;
   pragma Import(C, putchar);
 
   -- GNATistic
   procedure To_Stderr(C : Character);
   pragma Import(Ada, To_Stderr, "__gnat_to_stderr_char");
 
   Sadness_Code : constant Integer := -1;
 
end OS;

os.adb:

package body OS is
 
   -- Receive a character from the TTY, and True if success (False if EOF)
   function Read_Char(C : out Character) return Boolean is
      i : int;
      Result : Boolean := False;
   begin
      i := GetChar;
      if i /= EOF then
         C := Character'Val(i);
         Result := True;
      end if;
      return Result;
   end Read_Char;
 
 
   -- Send a character to the TTY.
   procedure Write_Char(C : in Character) is
      R : int;
      pragma Unreferenced(R);
   begin
      R := PutChar(int(Character'Pos(C)));
   end Write_Char;
 
 
   -- Send a Newline to the TTY.
   procedure Write_Newline is
   begin
      Write_Char(Character'Val(16#A#));
   end Write_Newline;
 
 
   -- Exit with an error condition report.
   procedure Eggog(M : String) is
   begin
      for i in 1 .. M'Length loop
         To_Stderr(M(I));
      end loop;
 
      -- Emit LF
      To_Stderr(Character'Val(16#A#));
 
      -- Exit
      Quit(Sadness_Code);
   end;
 
end OS;

Now for a bit of surprise. Did you know that it is impossible to make use of the standard Ada.Command_Line functionality in a program where the

pragma Restrictions(No_Secondary_Stack);

… restriction is in force?

The reason for this becomes apparent in a careful reading of the Standard, where we find the following turd:

function Argument (Number : in Positive) return String;

Indeed, a completely unnecessary invocation of the secondary stack ! Why did the authors of the Standard do this ? I have no idea, but we will have to correct their mistake! Unfortunately it is quite impossible to do this without invoking some GNATisms. And so, we must:

cmdline.ads:

with System;
 
package CmdLine is
 
   -- IMHO this is reasonable.
   CmdLineArg_Length : constant Positive := 256;
 
   subtype CmdLineArg is String(1 .. CmdLineArg_Length);
 
   function Initialized return Boolean;
 
   function Arg_Count return Natural;
   pragma Import(C, Arg_Count, "__gnat_arg_count");
 
   procedure Get_Argument(Number : in  Natural;
                          Result : out String);
 
private
 
   procedure Fill_Arg (A : System.Address; Arg_Num : Integer);
   pragma Import(C, Fill_Arg, "__gnat_fill_arg");
 
   function Len_Arg (Arg_Num : Integer) return Integer;
   pragma Import(C, Len_Arg, "__gnat_len_arg");
 
end CmdLine;

cmdline.adb:

with System; use System;
 
package body CmdLine is
 
   -- Test if GNAT's cmdline mechanism is available
   function Initialized return Boolean is
      gnat_argv : System.Address;
      pragma Import (C, gnat_argv, "gnat_argv");
 
   begin
      return gnat_argv /= System.Null_Address;
   end Initialized;
 
 
   -- Fill the provided string with the text of Number-th cmdline arg
   procedure Get_Argument(Number : in  Natural;
                          Result : out String) is
   begin
      if Number >= Arg_Count or (not Initialized) then
         raise Constraint_Error;
      end if;
 
      declare
         L   : constant Integer := Len_Arg(Number);
         Arg : aliased String(1 .. L);
      begin
         -- Will it fit into the available space?
         if L > Result'Length then
            raise Constraint_Error;
         end if;
 
         -- Get this arg string from where GNAT stowed it
         Fill_Arg(Arg'Address, Number);
 
         -- Copy it to Result:
         Result := (others => ' ');
         Result(Arg'Range) := Arg;
      end;
   end Get_Argument;
 
end CmdLine;

How this is invoked, will soon become quite apparent. Let’s at last proceed to FFACalc !

We make use here of nearly everything we have seen in Chapters 1-3:

ffa_calc.adb:

-- Basics
with OS;          use OS;
with CmdLine;     use CmdLine;
 
-- FFA
with FZ_Lim;   use FZ_Lim;
with Words;    use Words;
with W_Pred;   use W_Pred;
with FZ_Type;  use FZ_Type;
with FZ_Basic; use FZ_Basic;
with FZ_Arith; use FZ_Arith;
with FZ_Cmp;   use FZ_Cmp;
with FZ_Pred;  use FZ_Pred;
with FZ_BitOp; use FZ_BitOp;
with FZ_Shift; use FZ_Shift;
 
-- For Output
with FFA_IO;   use FFA_IO;
procedure FFA_Calc is
 
   Width  : Positive; -- Desired FFA Width
   Height : Positive; -- Desired Height of Stack
 
begin
   if Arg_Count /= 3 then
      Eggog("Usage: ./ffa_calc WIDTH HEIGHT");
   end if;
 
   declare
      Arg1 : CmdLineArg;
      Arg2 : CmdLineArg;
   begin
      -- Get commandline args:
      Get_Argument(1, Arg1); -- First arg
      Get_Argument(2, Arg2); -- Second arg
 
      -- Parse into Positives:
      Width  := Positive'Value(Arg1);
      Height := Positive'Value(Arg2);
   exception
      when others =>
         Eggog("Invalid arguments!");
   end;
 
   -- Test if proposed Width is permissible:
   if not FZ_Valid_Bitness_P(Width) then
      Eggog("Invalid Width: " & FZ_Validity_Rule_Doc);
   end if;

Above, we see how our replacement for Ada’s standard command-line argument reader works. Instead of demanding the secondary stack, we make use of pre-allocated strings, into which each argument is copied. The reader is invited to try and overflow these: the resulting death is a clean one.

Now for the calculator…

   -- The Calculator itself:
   declare
 
      -- The number of Words required to make a FZ of the given Bitness.
      Wordness : Indices := Indices(Width / Bitness);
 
      --------------------------------------------------------
      -- State --
      --------------------------------------------------------
      -- The Stack:
      subtype Stack_Positions is Natural range 0 .. Height;
      type Stacks is array(Stack_Positions range <>) of FZ(1 .. Wordness);
      Stack      : Stacks(Stack_Positions'Range);
 
      -- Stack Pointer:
      SP         : Stack_Positions := Stack_Positions'First;
 
      -- Carry/Borrow Flag:
      Flag       : WBool   := 0;
 
      -- Odometer:
      Pos        : Natural := 0;
 
      -- The current levels of the three types of nestedness:
      QuoteLevel : Natural := 0;
      CommLevel  : Natural := 0;
      CondLevel  : Natural := 0;
      --------------------------------------------------------

Observe that the FORTH-like stack is allocated on the stack (your machine’s, that is), and its height is determined by the HEIGHT parameter given in the second command line argument. The width of the FZ integers comprising the elements of this stack, is in turn given by WIDTH, the first command line argument.

Now for some elementary stack-manipulation routines:

      -- Clear the stack and set SP to bottom.
      procedure Zap is
      begin
         -- Clear the stack
         for i in Stack'Range loop
            FZ_Clear(Stack(i));
         end loop;
         -- Set SP to bottom
         SP   := Stack_Positions'First;
         -- Clear Overflow flag
         Flag := 0;
      end Zap;
 
 
      -- Report a fatal error condition at the current symbol
      procedure E(S : in String) is
      begin
         Eggog("Pos:" & Natural'Image(Pos) & ": " & S);
      end E;
 
 
      -- Move SP up
      procedure Push is
      begin
         if SP = Stack_Positions'Last then
            E("Stack Overflow!");
         else
            SP := SP + 1;
         end if;
      end Push;
 
 
      -- Discard the top of the stack
      procedure Drop is
      begin
         FZ_Clear(Stack(SP));
         SP := SP - 1;
      end Drop;
 
 
      -- Check if stack has the necessary N items
      procedure Want(N : in Positive) is
      begin
         if SP < N then
            E("Stack Underflow!");
         end if;
      end Want;

Here we make use of the FZ_ShiftLeft operation we implemented in Chapter 3:

      -- Slide a new hex digit into the FZ on top of stack
      procedure Ins_Hex_Digit(N : in out FZ;
                              D : in Nibble) is
         Overflow : Word := 0;
      begin
         -- Make room in this FZ for one additional hex digit
         FZ_ShiftLeft_O(N        => N,
                        ShiftedN => N,
                        Count    => 4,
                        Overflow => Overflow);
 
         -- Constants which exceed the Width are forbidden:
         if W_NZeroP(Overflow) = 1 then
            E("Constant Exceeds Bitness!");
         end if;
 
         -- Set the new digit
         FZ_Or_W(N, D);
      end;

And now for the "opcodes" comprising our stack machine:

      -- Execute a Normal Op
      procedure Op_Normal(C : in Character) is
 
         -- Over/underflow output from certain ops
         F : Word;
 
      begin
 
         case C is
 
            --------------
            -- Stickies --
            --------------
            -- Enter Commented
            when '(' =>
               CommLevel := 1;
 
               -- Exit Commented (but we aren't in it!)
            when ')' =>
               E("Mismatched close-comment parenthesis !");
 
               -- Enter Quoted
            when '[' =>
               QuoteLevel := 1;
 
               -- Exit Quoted (but we aren't in it!)
            when ']' =>
               E("Mismatched close-quote bracket !");
 
               -- Enter a ~taken~ Conditional branch:
            when '{' =>
               Want(1);
               if FZ_ZeroP(Stack(SP)) = 1 then
                  CondLevel := 1;
               end if;
               Drop;
 
               -- Exit from a ~non-taken~ Conditional branch:
               -- ... we push a 0, to suppress the 'else' clause
            when '}' =>
               Push;
               WBool_To_FZ(0, Stack(SP));
 
               ----------------
               -- Immediates --
               ----------------
 
               -- These operate on the FZ ~currently~ at top of the stack;
               -- and this means that the stack may NOT be empty.
 
            when '0' .. '9' =>
               Want(1);
               Ins_Hex_Digit(Stack(SP),
                             Character'Pos(C) - Character'Pos('0'));
 
            when 'A' .. 'F' =>
               Want(1);
               Ins_Hex_Digit(Stack(SP),
                             10 + Character'Pos(C) - Character'Pos('A'));
 
            when 'a' .. 'f' =>
               Want(1);
               Ins_Hex_Digit(Stack(SP),
                             10 + Character'Pos(C) - Character'Pos('a'));
 
               ------------------
               -- Stack Motion --
               ------------------
 
               -- Push a 0 onto the stack
            when '.' =>
               Push;
               FZ_Clear(Stack(SP));
 
               -- Dup
            when '″' =>
               Want(1);
               Push;
               Stack(SP) := Stack(SP - 1);
 
               -- Drop
            when '_' =>
               Want(1);
               Drop;
 
               -- Swap
            when ''' =>
               Want(2);
               FZ_Swap(Stack(SP), Stack(SP - 1));
 
               -- Over
            when '`' =>
               Want(2);
               Push;
               Stack(SP) := Stack(SP - 2);
 
               ----------------
               -- Predicates --
               ----------------
 
               -- Equality
            when '=' =>
               Want(2);
               WBool_To_FZ(FZ_Eqp(X => Stack(SP),
                                  Y => Stack(SP - 1)),
                           Stack(SP - 1));
               Drop;
 
               -- Less-Than
            when '< ' =>
               Want(2);
               WBool_To_FZ(FZ_LessThanP(X => Stack(SP - 1),
                                        Y => Stack(SP)),
                           Stack(SP - 1));
               Drop;
 
               -- Greater-Than
            when '>' =>
               Want(2);
               WBool_To_FZ(FZ_GreaterThanP(X => Stack(SP - 1),
                                           Y => Stack(SP)),
                           Stack(SP - 1));
               Drop;
 
               ----------------
               -- Arithmetic --
               ----------------
 
               -- Subtract
            when '-' =>
               Want(2);
               FZ_Sub(X          => Stack(SP - 1),
                      Y          => Stack(SP),
                      Difference => Stack(SP - 1),
                      Underflow  => F);
               Flag := W_NZeroP(F);
               Drop;
 
               -- Add
            when '+' =>
               Want(2);
               FZ_Add(X        => Stack(SP - 1),
                      Y        => Stack(SP),
                      Sum      => Stack(SP - 1),
                      Overflow => F);
               Flag := W_NZeroP(F);
               Drop;
 
               -----------------
               -- Bitwise Ops --
               -----------------
 
               -- Bitwise-And
            when '&' =>
               Want(2);
               FZ_And(X      => Stack(SP - 1),
                      Y      => Stack(SP),
                      Result => Stack(SP - 1));
               Drop;
 
               -- Bitwise-Or
            when '|' =>
               Want(2);
               FZ_Or(X      => Stack(SP - 1),
                     Y      => Stack(SP),
                     Result => Stack(SP - 1));
               Drop;
 
               -- Bitwise-Xor
            when '^' =>
               Want(2);
               FZ_Xor(X      => Stack(SP - 1),
                      Y      => Stack(SP),
                      Result => Stack(SP - 1));
               Drop;
 
               -- Bitwise-Not (1s-Complement)
            when '~' =>
               Want(1);
               FZ_Not(Stack(SP), Stack(SP));
 
               -----------
               -- Other --
               -----------
 
               -- mUx
            when 'U' =>
               Want(3);
               FZ_Mux(X      => Stack(SP - 2),
                      Y      => Stack(SP - 1),
                      Result => Stack(SP - 2),
                      Sel    => FZ_NZeroP(Stack(SP)));
               Drop;
               Drop;
 
               -- Put the Overflow flag on the stack
            when 'O' =>
               Push;
               WBool_To_FZ(Flag, Stack(SP));
 
               -- Print the FZ on the top of the stack
            when '#' =>
               Want(1);
               Dump(Stack(SP));
               Drop;
 
               -- Zap (reset)
            when 'Z' =>
               Zap;
 
               -- Quit with Stack Trace
            when 'Q' =>
               for I in reverse Stack'First + 1 .. SP loop
                  Dump(Stack(I));
               end loop;
               Quit(0);
 
               ----------
               -- NOPs --
               ----------
 
               -- Ops we have not yet spoken of -- do nothing
            when others =>
               null;
 
         end case;
 
      end Op_Normal;
 
 
      -- Process a Symbol
      procedure Op(C : in Character) is
      begin
         -- First, see whether we are in a state of nestedness:
 
         -- ... in a Comment block:
         if CommLevel > 0 then
            case C is
               when ')' =>  -- Drop a nesting level:
                  CommLevel := CommLevel - 1;
               when '(' =>  -- Add a nesting level:
                  CommLevel := CommLevel + 1;
               when others =>
                  null; -- Other symbols have no effect at all
            end case;
 
            -- ... in a Quote block:
         elsif QuoteLevel > 0 then
            case C is
               when ']' =>   -- Drop a nesting level:
                  QuoteLevel := QuoteLevel - 1;
               when '[' =>   -- Add a nesting level:
                  QuoteLevel := QuoteLevel + 1;
               when others =>
                  null; -- Other symbols have no effect on the level
            end case;
 
            -- If we aren't the mode-exiting ']', print current symbol:
            if QuoteLevel > 0 then
               Write_Char(C);
            end if;
 
            --- ... in a ~taken~ Conditional branch:
         elsif CondLevel > 0 then
            case C is
               when '}' =>   -- Drop a nesting level:
                  CondLevel := CondLevel - 1;
 
                  -- If we exited the Conditional as a result,
                  -- we push a 1 to trigger the possible 'else' clause:
                  if CondLevel = 0 then
                     Push;
                     WBool_To_FZ(1, Stack(SP));
                  end if;
 
               when '{' =>   -- Add a nesting level:
                  CondLevel := CondLevel + 1;
               when others =>
                  null; -- Other symbols have no effect on the level
            end case;
         else
            -- This is a Normal Op, so proceed with the normal rules.
            Op_Normal(C);
         end if;
 
      end Op;
 
 
      -- Current Character
      C : Character;
 
   begin
      -- Reset the Calculator      
      Zap;
      -- Process characters until EOF:
      loop
         if Read_Char(C) then
            -- Execute Op:
            Op(C);
            -- Advance Odometer
            Pos := Pos + 1;
         else
            Zap;
            Quit(0); -- if EOF, we're done
         end if;
      end loop;
   end;
 
end FFA_Calc;

But rather than describing in detail the operation of FFACalc, I will invite the reader to build it, and solve the following puzzle:


Write a FFACalc tape that will take seven numbers, presumed to be on the top of the stack, and return the largest.

Your answer should work with any legal WIDTH, and any stack HEIGHT large enough to hold the working set.

For instance, suppose file numbers.txt were to contain:

.9.1.7.5.1.1.0

Then the following example invocation:

cd ffacalc
gprbuild
cat numbers.txt youranswer.txt |  ./bin/ffa_calc 256 16

... should produce the output:

0000000000000000000000000000000000000000000000000000000000000009

... and similarly for any other seven numbers.

A solution will be posted in the next Chapter.


~To be continued!~

“Finite Field Arithmetic.” Chapter 3: Shifts.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch3_shifts.vpatch.

Just like before, you will end up with two directories, libffa and ffademo.

Now compile the demo, just as in Chapters 1 and 2:

cd ffademo
gprbuild

And, as before, do not run it quite yet.


First, let’s go through the “mail bag” from Chapter 2:

Reader Apeloyee observed : “A bit odd to see FZ_Neg to be named, unlike other SIMD bit operations, differently from the corresponding Ada operation on words. I personally would expect that “Neg” is arithmetic negation, since other operations aren’t named FZ_Conj or FZ_Disj.”

   -- NotN := ~N
   procedure FZ_Neg(N    : in FZ;
                    NotN : out FZ) is
   begin
      for i in N'Range loop
         NotN(i) := not N(i);
      end loop;
   end FZ_Neg;
   pragma Inline_Always(FZ_Neg);

He’s quite right, and henceforth the FFA ones-complement operator will be called FZ_Not.

Reader Diana Coman noticed the duplicate assignment to T in FZ_Swap:

   -- Exchange X and Y
   procedure FZ_Swap(X : in out FZ; Y : in out FZ) is
      T : FZ(X'Range) := X;
   begin
      T := X;
      X := Y;
      Y := T;
   end FZ_Swap;
   pragma Inline_Always(FZ_Swap);

The renaming of the ones-complement operator, and the removal of the redundant assignment to T in FZ_Swap appear in this Chapter’s vpatch.

I would like to thank Apeloyee and Diana Coman for their careful reading; and ask all of my readers not to hesitate in pointing out potentially troublesome (or merely confusing) algorithms, typos, obvious mistakes, etc.


Now let’s proceed to new material: shift operators for FZ integers :

fz_shift.ads:

 
with Words;   use Words;
with FZ_Type; use FZ_Type;
 
 
package FZ_Shift is
 
   pragma Pure;
 
   --------------------------------------------------------------
   -- Shift Right
   --------------------------------------------------------------
 
   -- ShiftedN := N >> Count (with Overflow Input and Output)
   procedure FZ_ShiftRight_O_I(N        : in  FZ;
                               ShiftedN : out FZ;
                               Count    : in  WBit_Index;
                               Overflow : out Word;
                               OF_in    : in  Word);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   -- ShiftedN := N >> Count (with Overflow Output only)
   procedure FZ_ShiftRight_O(N        : in  FZ;
                             ShiftedN : out FZ;
                             Count    : in  WBit_Index;
                             Overflow : out Word);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   -- ShiftedN := N >> Count (no Overflow output or input)
   procedure FZ_ShiftRight(N        : in  FZ;
                           ShiftedN : out FZ;
                           Count    : in  WBit_Index);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   --------------------------------------------------------------
   -- Shift Left
   --------------------------------------------------------------
 
   -- ShiftedN := N < < Count (with Overflow Input and Output)
   procedure FZ_ShiftLeft_O_I(N        : in  FZ;
                              ShiftedN : out FZ;
                              Count    : in  WBit_Index;
                              Overflow : out Word;
                              OF_in    : in  Word);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   -- ShiftedN := N << Count (with Overflow Output only)
   procedure FZ_ShiftLeft_O(N        : in  FZ;
                            ShiftedN : out FZ;
                            Count    : in  WBit_Index;
                            Overflow : out Word);
   pragma Precondition(N'Length = ShiftedN'Length);
 
   -- ShiftedN := N << Count (no Overflow output or input)
   procedure FZ_ShiftLeft(N        : in  FZ;
                          ShiftedN : out FZ;
                          Count    : in  WBit_Index);
   pragma Precondition(N'Length = ShiftedN'Length);
 
end FZ_Shift;

The basic building blocks for shifts are FZ_ShiftRight_O_I and FZ_ShiftLeft_O_I. The other operators shown are simply shorthand notation for discarding the ability to take the overflow from, or place incoming shifted-in bits into, one of these operations. This convention is followed in order to avoid littering FFA and user code with declarations for unused variables and the pragma Unreferenced(…) statements to go with them.

Note that the overflow inputs and outputs are Words, rather than WBools. The reason for this will become apparent shortly.

Observe that, thus far, it is only possible to shift an FZ by a WBit_Index quantity. Recall from words.ads in Chapter 1:

   -- When we must refer to individual bit positions of a machine word:
   subtype WBit_Index is Natural range 0 .. Bitness - 1;

This means that, in the 64-bit example code provided, left- and right- shifts from 0 to 63 binary positions are permissible.

In a later chapter we will see arbitrary (i.e. up to the full width of an FZ) constant-time shifting algorithms. However for most of FFA's functionality, these are not necessary, so there is no need to introduce them to the reader quite yet.

Let's proceed to the essential moving parts of the Right and Left subword shifts:

fz_shift.adb:

with W_Shifts; use W_Shifts;
 
 
package body FZ_Shift is
 
   -- ShiftedN := N >> Count (with Overflow Input and Output)
   procedure FZ_ShiftRight_O_I(N        : in  FZ;
                               ShiftedN : out FZ;
                               Count    : in  WBit_Index;
                               Overflow : out Word;
                               OF_in    : in  Word) is
      Ni    : Word;
      Carry : Word := OF_in;
   begin
      for i in reverse N'Range loop
         Ni := N(i);
         ShiftedN(i) := Shift_Right(Ni, Count) or Carry;
         Carry := Shift_Left(Ni, Bitness - Count);
      end loop;
      Overflow := Carry;
   end FZ_ShiftRight_O_I;
   pragma Inline_Always(FZ_ShiftRight_O_I);

Shifting a FZ integer to the right by Count binary places, is done in the following way. We walk the Words comprising the FZ from highest to lowest (hence the reverse in the looping operator.)

Inside the loop, we make a temporary copy of the current word N(i), Ni. Then we shift Ni by the requested number of binary places, Count; and bitwise-OR the current Carry into its high bits. This Carry is equal to OF_in initially (typically zero, if nothing is being shifted in). The result of the OR is saved to ShiftedN(i) -- we have obtained the current Word of the final output.

For every Word after the first one operated on, Carry consists of that previous Word's "dropped" bits, shifted left so as to end up OR-ed into the upper end of the next Word. The Carry obtained from the last Word's shifting is written to Overflow.

Observe that it is entirely legal to call FZ_ShiftRight_O_I "in-place", i.e. with the input and output locations being the same FZ.

Now we'll define an operator that wraps the above, but does not require bits being "shifted in" to be provided:

   -- ShiftedN := N >> Count (with Overflow Output only)
   procedure FZ_ShiftRight_O(N        : in  FZ;
                             ShiftedN : out FZ;
                             Count    : in  WBit_Index;
                             Overflow : out Word) is
   begin
      FZ_ShiftRight_O_I(N, ShiftedN, Count, Overflow, 0);
   end FZ_ShiftRight_O;
   pragma Inline_Always(FZ_ShiftRight_O);

And here is one that gives neither Overflow output nor demands shifted-in input:

   -- ShiftedN := N >> Count (no Overflow output or input)
   procedure FZ_ShiftRight(N        : in  FZ;
                           ShiftedN : out FZ;
                           Count    : in  WBit_Index) is
      Overflow : Word;
      pragma Unreferenced(Overflow);
   begin
      FZ_ShiftRight_O_I(N, ShiftedN, Count, Overflow, 0);
   end FZ_ShiftRight;
   pragma Inline_Always(FZ_ShiftRight);

The process of shifting a FZ integer to the left by Count binary places, is exactly analogous to the right-shift, with one exception: we begin with the lowest Word in the FZ, and proceed "up" :

   -- ShiftedN := N < < Count (with Overflow Input and Output)
   procedure FZ_ShiftLeft_O_I(N        : in  FZ;
                              ShiftedN : out FZ;
                              Count    : in  WBit_Index;
                              Overflow : out Word;
                              OF_in    : in  Word) is
      Ni    : Word;
      Carry : Word := OF_in;
   begin
      for i in N'Range loop
         Ni := N(i);
         ShiftedN(i) := Shift_Left(Ni, Count) or Carry;
         Carry := Shift_Right(Ni, Bitness - Count);
      end loop;
      Overflow := Carry;
   end FZ_ShiftLeft_O_I;
   pragma Inline_Always(FZ_ShiftLeft_O_I);

It is very much a "mirror image" of the FZ_ShiftRight_O_I algorithm.

Now for the "wrappers", in exactly the same spirit as earlier:

   -- ShiftedN := N < < Count (with Overflow Output only)
   procedure FZ_ShiftLeft_O(N        : in  FZ;
                            ShiftedN : out FZ;
                            Count    : in  WBit_Index;
                            Overflow : out Word) is
   begin
      FZ_ShiftLeft_O_I(N, ShiftedN, Count, Overflow, 0);
   end FZ_ShiftLeft_O;
   pragma Inline_Always(FZ_ShiftLeft_O);
 
 
   -- ShiftedN := N << Count (no Overflow output or input)
   procedure FZ_ShiftLeft(N        : in  FZ;
                          ShiftedN : out FZ;
                          Count    : in  WBit_Index) is
      Overflow : Word;
      pragma Unreferenced(Overflow);
   begin
      FZ_ShiftLeft_O_I(N, ShiftedN, Count, Overflow, 0);
   end FZ_ShiftLeft;
   pragma Inline_Always(FZ_ShiftLeft);
 
end FZ_Shift;

Just as in prior Chapters, I advise the reader to make abundant use of grid paper and pen, to properly fit the mechanics of the given algorithms into his head -- until they are as comfortably understood as a favourite armchair.


Now, let's make a demo for the routines in this Chapter:

demo_ch3.ads:

package Demo_Ch3 is
 
   procedure Demo_Shifts;
 
end Demo_Ch3;

demo_ch3.adb:

-- From Ada:
with Ada.Text_IO; use Ada.Text_IO;
 
-- From FFA:
with Words;    use Words;
with FZ_Type;  use FZ_Type;
 
-- FFA Ch. 3
with FZ_Shift; use FZ_Shift;
 
-- From the Demo:
with FFA_IO;   use FFA_IO;
 
 
package body Demo_Ch3 is
 
   procedure Demo_Shifts is
 
      X : FZ(1 .. 4) := ( 16#083e16f27091f65f#,  16#74c01a9c3ce54f80#,
                          16#9fd0913737c3fcbe#,  16#fa55f3f5459a9e79# );
 
      Z : FZ(1 .. 4) := ( 0,  0,  0,  0 );
 
      -- Overflow
      O : Word := 0;
 
   begin
 
      Put_Line("~~~ Ch. 3 : Shifts ~~~");
      New_Line;
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
      New_Line;
 
      -- Left Shifts:
 
      FZ_ShiftLeft_O(X, Z, 1, O);
      Put_Line("X < < 1    =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      FZ_ShiftLeft_O(X, Z, 13, O);
      Put_Line("X << 13   =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      FZ_ShiftLeft_O(X, Z, 40, O);
      Put_Line("X << 40   =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      -- Right Shifts:
 
      FZ_ShiftRight_O(X, Z, 1, O);
      Put_Line("X >> 1    =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      FZ_ShiftRight_O(X, Z, 13, O);
      Put_Line("X >> 13   =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
      FZ_ShiftRight_O(X, Z, 40, O);
      Put_Line("X >> 40   =");
      Dump(Z);
      New_Line;
      Put_Line("Overflow  = ");
      Dump(O);
      New_Line;
 
   end Demo_Shifts;
 
end Demo_Ch3;

ffa_demo.adb:

with Demo_Ch1; use Demo_Ch1;
with Demo_Ch2; use Demo_Ch2;
with Demo_Ch3; use Demo_Ch3;
 
procedure FFA_Demo is
begin
 
   -- Ch. 1
   Demo_Add_Sub;
 
   -- Ch. 2
   Demo_Word_Preds;
   Demo_FZ_Basics;
   Demo_FZ_Various_Ops;
 
   -- Ch. 3
   Demo_Shifts;
 
end FFA_Demo;

Finally, let's run it:

./bin/ffa_demo

And this is what you should see:

~~~ ... Output from earlier Chapters snipped ... ~~~
~~~ ............................................ ~~~
 
~~~ Ch. 3 : Shifts ~~~
 
X         =
FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F
 
X < < 1    =
F4ABE7EA8B353CF33FA1226E6F87F97CE980353879CA9F00107C2DE4E123ECBE
Overflow  = 
0000000000000001
X << 13   =
BE7EA8B353CF33FA1226E6F87F97CE980353879CA9F00107C2DE4E123ECBE000
Overflow  = 
0000000000001F4A
X << 40   =
9A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F0000000000
Overflow  = 
000000FA55F3F545
X >> 1    =
7D2AF9FAA2CD4F3CCFE8489B9BE1FE5F3A600D4E1E72A7C0041F0B793848FB2F
Overflow  = 
8000000000000000
X >> 13   =
0007D2AF9FAA2CD4F3CCFE8489B9BE1FE5F3A600D4E1E72A7C0041F0B793848F
Overflow  = 
B2F8000000000000
X >> 40   =
0000000000FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16
Overflow  = 
F27091F65F000000

~To be continued!~

“Finite Field Arithmetic.” Chapter 2: Logical and Bitwise Operations.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch2_logicals.vpatch.

Just as in Chapter 1, you will end up with two directories, libffa and ffademo, but this time with somewhat different contents. (If you are new to V, read the vpatch with your naked eye; you will notice that it is almost exactly the same thing as an ordinary Unix diff, and quite readable.)

Now compile the demo, just as in Chapter 1:

cd ffademo
gprbuild

And, as before, do not run it quite yet.


First, let’s go through the “mail bag” from Chapter 1:

Reader Apeloyee observed that “xor swap seems to me a rake, which would hit you if the swapped values happen to be stored in the same location.”

   -- Exchange A and B.
   procedure W_Swap(A : in out Word; B : in out Word) is
   begin
      A := A xor B;
      B := A xor B;
      A := A xor B;
   end W_Swap;
   pragma Inline_Always(W_Swap);

Naturally, it is possible to answer this “Doctor, it hurts when I do that!” with the proverbial “So stop doing that.” But “rakes” are not acceptable in safety-critical systems.

Reader Mircea Popescu proposed to insert a stern warning, e.g.

   pragma Restrictions(No_Implicit_Heap_Allocations);
   -- Removing this may render the functioning of W_Swap and W_Mux dangerous,
   -- see http://btcbase.org/log/2017-12-02#1745513 discussion for details

Unfortunately, a dig through the GNAT docs revealed that there is apparently no reliable way to enforce a permanent guarantee that W_Swap will not be under any circumstances called on overlapping objects allocated explicitly; or used on pairs of variables made to refer to the same memory location via the renames construct.

Therefore W_Swap has been removed; this is reflected in the new vpatch; and FZ_Swap, introduced in this chapter, uses a temporary copy, as seen in kindergarten schoolbooks.

I would like to thank Apeloyee and Mircea Popescu for their careful reading; and ask all of my readers not to hesitate in pointing out potentially troublesome (or merely confusing) algorithms, typos, obvious mistakes, etc.


Now let’s proceed to some new material.

There are certain predicates we will need later for use on Words:

w_pred.ads:

with Words; use Words;
 
package W_Pred is
 
   pragma Pure;
 
   -- Return 1 if N is equal to 0; otherwise return 0.
   function W_ZeroP(N : in Word) return WBool;
 
   -- Return 1 if N is unequal to 0; otherwise return 0.
   function W_NZeroP(N : in Word) return WBool;
 
   -- Return 1 if N is odd; otherwise return 0.
   function W_OddP(N : in Word) return WBool;
 
   -- Return 1 if A is equal to B ; otherwise return 0.
   function W_EqP(A : in Word; B : in Word) return WBool;
 
end W_Pred;

w_pred.adb:

with Word_Ops; use Word_Ops;
 
package body W_Pred is
 
   -- Return 1 if N is equal to 0; otherwise return 0.
   function W_ZeroP(N : in Word) return WBool is
   begin
      return W_Borrow(N, 1, N - 1);
   end W_ZeroP;
   pragma Inline_Always(W_ZeroP);
 
 
   -- Return 1 if N is unequal to 0; otherwise return 0.
   function W_NZeroP(N : in Word) return WBool is
   begin
      return 1 xor W_ZeroP(N);
   end W_NZeroP;
   pragma Inline_Always(W_NZeroP);
 
 
   -- Return 1 if N is odd; otherwise return 0.
   function W_OddP(N : in Word) return WBool is
   begin
      return 1 and N;
   end W_OddP;
   pragma Inline_Always(W_OddP);
 
 
   -- Return 1 if A is equal to B ; otherwise return 0.
   function W_EqP(A : in Word; B : in Word) return WBool is
   begin
      return W_ZeroP(A xor B);
   end W_EqP;
   pragma Inline_Always(W_EqP);
 
end W_Pred;

Observe that conditional branches are avoided. W_ZeroP is the fundamental operation here; expand the equation on a sheet of paper, referring to the W_Borrow from Chapter 1 if you do not remember how the latter works.


Now let’s discuss a few more fundamental operations on FZ integers:

fz_basic.ads:

with Words; use Words;
with FZ_Type; use FZ_Type;
 
 
package FZ_Basic is
 
   pragma Pure;
 
   -- N := 0
   procedure FZ_Clear(N : out FZ);
 
   -- First word of N := Source
   procedure FZ_Set_Head(N : out FZ; Source : in Word);
 
   -- First word of N
   function FZ_Get_Head(N : in FZ) return Word;
 
   -- Exchange X and Y
   procedure FZ_Swap(X : in out FZ; Y : in out FZ);
   pragma Precondition(X'Length = Y'Length);
 
   -- Constant-time MUX: Sel = 0: Result := X; Sel = 1: Result := Y
   procedure FZ_Mux(X : in FZ; Y : in FZ; Result : out FZ; Sel : in WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Result'Length);
 
end FZ_Basic;

We will go over these one at a time:

fz_basic.adb:

with Word_Ops; use Word_Ops;
 
 
package body FZ_Basic is

The FZ_Clear operation illustrates the use of Ada’s array assignment construct. It simply sets a given FZ to zero.

   -- N := 0
   procedure FZ_Clear(N : out FZ) is
   begin
      N := (others => 0);
   end FZ_Clear;
   pragma Inline_Always(FZ_Clear);

The FZ_Set_Head operation changes only the head, or “youngest” Word, of a given FZ, to the given Word value.

   -- First word of N := Source
   procedure FZ_Set_Head(N : out FZ; Source : in Word) is
   begin
      N(N'First) := Source;
   end FZ_Set_Head;
   pragma Inline_Always(FZ_Set_Head);

FZ_Get_Head returns the “youngest” (lowest, bottom-most) word of an FZ.

   -- First word of N
   function FZ_Get_Head(N : in FZ) return Word is
   begin
      return N(N'First);
   end FZ_Get_Head;
   pragma Inline_Always(FZ_Get_Head);

This is the Swap operation, as discussed previously in the “mail bag” section of this article.

   -- Exchange X and Y
   procedure FZ_Swap(X : in out FZ; Y : in out FZ) is
      T : FZ(X'Range) := X;
   begin
      T := X;
      X := Y;
      Y := T;
   end FZ_Swap;
   pragma Inline_Always(FZ_Swap);

It is absolutely vital to understand exactly how and why FZ_Mux works. This operation takes two FZ inputs, X and Y, and a WBool selector, Sel; it assigns X to the Result if Sel equals 0; alternatively, Y if Sel equals 1. Refer to the W_Mux material in Chapter 1 to understand how this is done. The Mux operation is a fundamental building block of constant-time (i.e. branch-free) arithmetic as seen in FFA.

   -- Constant-time MUX: Sel = 0: Result := X; Sel = 1: Result := Y
   procedure FZ_Mux(X : in FZ; Y : in FZ; Result : out FZ; Sel : in WBool) is
   begin
      for i in X'Range loop
         Result(i) := W_Mux(X(i), Y(i), Sel);
      end loop;
   end FZ_Mux;
   pragma Inline_Always(FZ_Mux);
 
end FZ_Basic;

Now let’s introduce some unary predicate operators that work on FZ integers:

fz_pred.ads:

with Words; use Words;
with FZ_Type; use FZ_Type;
 
 
package FZ_Pred is
 
   pragma Pure;
 
   -- 1 iff N == 0 (branch-free); else 0
   function FZ_ZeroP(N : in FZ) return WBool;
 
   -- 1 iff N is odd
   function FZ_OddP(N : in FZ) return WBool;
 
end FZ_Pred;

fz_pred.adb:

with W_Pred; use W_Pred;
 
 
package body FZ_Pred is
 
   -- 1 iff N == 0 (branch-free); else 0
   function FZ_ZeroP(N : in FZ) return WBool is
      A : WBool := 1;
   begin
      for i in N'Range loop
         A := A and W_ZeroP(N(i));
      end loop;
      return A;
   end FZ_ZeroP;
   pragma Inline_Always(FZ_ZeroP);
 
 
   -- 1 iff N is odd
   function FZ_OddP(N : in FZ) return WBool is
   begin
      return W_OddP(N(N'First));
   end FZ_OddP;
   pragma Inline_Always(FZ_OddP);
 
end FZ_Pred;

As before, draw this on a sheet of paper if the mechanism does not immediately stand up in front of your mind’s eye.


Now for some binary comparison predicates on FZ:

fz_cmp.ads:

with Words; use Words;
with FZ_Type; use FZ_Type;
 
 
package FZ_Cmp is
 
   pragma Pure;
 
   -- 1 iff X == Y (branch-free); else 0
   function FZ_EqP(X : in FZ; Y: in FZ) return WBool;
   pragma Precondition(X'Length = Y'Length);
 
   -- 1 iff X < Y (branch-free); else 0
   function FZ_LessThanP(X : in FZ; Y : in FZ) return WBool;
   pragma Precondition(X'Length = Y'Length);
 
   -- 1 iff X > Y (branch-free); else 0
   function FZ_GreaterThanP(X : in FZ; Y : in FZ) return WBool;
   pragma Precondition(X'Length = Y'Length);
 
end FZ_Cmp;

fz_cmp.adb:

with W_Pred;   use W_Pred;
with FZ_Arith; use FZ_Arith;
 
 
package body FZ_Cmp is
 
   -- 1 iff X == Y (branch-free); else 0
   function FZ_EqP(X : in FZ; Y : in FZ) return WBool is
      A : WBool := 1;
   begin
      for i in X'Range loop
         A := A and W_EqP(X(i), Y(i));
      end loop;
      return A;
   end FZ_EqP;
   pragma Inline_Always(FZ_EqP);
 
 
   -- 1 iff X < Y (branch-free); else 0
   function FZ_LessThanP(X : in FZ; Y : in FZ) return WBool is
      Scratch : FZ(X'Range);
      Borrow  : WBool := 0;
   begin
      FZ_Sub(X, Y, Scratch, Borrow);
      return Borrow;
   end FZ_LessThanP;
   pragma Inline_Always(FZ_LessThanP);
 
 
   -- 1 iff X > Y (branch-free); else 0
   function FZ_GreaterThanP(X : in FZ; Y: in FZ) return WBool is
      Scratch : FZ(X'Range);
      Borrow  : WBool := 0;
   begin
      FZ_Sub(Y, X, Scratch, Borrow);
      return Borrow;
   end FZ_GreaterThanP;
   pragma Inline_Always(FZ_GreaterThanP);
 
end FZ_Cmp;

Here we make use of W_EqP from earlier in this chapter.

Observe that these work in exactly the same way as a commonplace CPU’s ALU does. Refer to the material in Chapter 1 concerning branch-free subtraction.


Now for some bitwise operators on FZ integers:

fz_bitop.ads:

with FZ_Type; use FZ_Type;
with Words; use Words;
 
 
package FZ_BitOp is
 
   pragma Pure;
 
   -- Result := X & Y
   procedure FZ_And(X : in FZ; Y : in FZ; Result : out FZ);
   pragma Precondition(X'Length = Y'Length and X'Length = Result'Length);
 
   -- N := N & W, W is a word
   procedure FZ_And_W(N : in out FZ; W : in Word);
 
   -- Result := X | Y
   procedure FZ_Or(X : in FZ; Y : in FZ; Result : out FZ);
   pragma Precondition(X'Length = Y'Length and X'Length = Result'Length);
 
   -- N := N | W, W is a word
   procedure FZ_Or_W(N : in out FZ; W : in Word);
 
   -- Result := X ^ Y
   procedure FZ_Xor(X : in FZ; Y : in FZ; Result : out FZ);
   pragma Precondition(X'Length = Y'Length and X'Length = Result'Length);
 
   -- N := N ^ W, W is a word
   procedure FZ_Xor_W(N : in out FZ; W : in Word);
 
   -- NotN := ~N
   procedure FZ_Neg(N : in FZ; NotN : out FZ);
   pragma Precondition(N'Length = NotN'Length);
 
end FZ_BitOp;

fz_bitop.adb:

package body FZ_BitOp is
 
   -- Result := X & Y
   procedure FZ_And(X : in FZ; Y : in FZ; Result : out FZ) is
   begin
      for i in X'Range loop
         Result(i) := X(i) and Y(i);
      end loop;
   end FZ_And;
   pragma Inline_Always(FZ_And);
 
 
   -- N := N & W, W is a word
   procedure FZ_And_W(N : in out FZ; W : in Word) is
   begin
      N(N'First) := N(N'First) and W;
   end FZ_And_W;
   pragma Inline_Always(FZ_And_W);
 
 
   -- Result := X | Y
   procedure FZ_Or(X : in FZ; Y : in FZ; Result : out FZ) is
   begin
      for i in X'Range loop
         Result(i) := X(i) or Y(i);
      end loop;
   end FZ_Or;
   pragma Inline_Always(FZ_Or);
 
 
   -- N := N | W, W is a word
   procedure FZ_Or_W(N : in out FZ; W : in Word) is
   begin
      N(N'First) := N(N'First) or W;
   end FZ_Or_W;
   pragma Inline_Always(FZ_Or_W);
 
 
   -- Result := X ^ Y
   procedure FZ_Xor(X : in FZ; Y : in FZ; Result : out FZ) is
   begin
      for i in X'Range loop
         Result(i) := X(i) xor Y(i);
      end loop;
   end FZ_Xor;
   pragma Inline_Always(FZ_Xor);
 
 
   -- N := N ^ W, W is a word
   procedure FZ_Xor_W(N : in out FZ; W : in Word) is
   begin
      N(N'First) := N(N'First) xor W;
   end FZ_Xor_W;
   pragma Inline_Always(FZ_Xor_W);
 
 
   -- NotN := ~N
   procedure FZ_Neg(N    : in FZ;
                    NotN : out FZ) is
   begin
      for i in N'Range loop
         NotN(i) := not N(i);
      end loop;
   end FZ_Neg;
   pragma Inline_Always(FZ_Neg);
 
end FZ_BitOp;

None of these make use of any kind of “cleverness”.

All but the unary FZ_Neg operator are defined both for pairs of FZ integers and also for an FZ and Word pairing; in the latter case, the operation is performed against the “head” of the FZ.


Now, let’s make a demo for the routines in this Chapter:

demo_ch2.ads:

package Demo_Ch2 is
 
   procedure Demo_Word_Preds;
   procedure Demo_FZ_Basics;
   procedure Demo_FZ_Various_Ops;
 
end Demo_Ch2;

demo_ch2.adb:

-- From Ada:
with Ada.Text_IO; use Ada.Text_IO;
 
-- From FFA:
with Words;    use Words;
with FZ_Type;  use FZ_Type;
with FZ_Arith; use FZ_Arith;
 
-- FFA Ch. 2
with W_Pred;   use W_Pred;
with FZ_Basic; use FZ_Basic;
with FZ_Pred;  use FZ_Pred;
with FZ_BitOp; use FZ_BitOp;
with FZ_Cmp;   use FZ_Cmp;
 
-- From the Demo:
with FFA_IO;   use FFA_IO;
 
 
package body Demo_Ch2 is
 
   procedure Demo_Word_Preds is
 
   begin
 
      Put_Line("~~~ Ch. 2 : Word Predicates ~~~");
      New_Line;
 
      Put_Line("W_ZeroP(0) :");
      Dump(W_ZeroP(0));
      New_Line;
      New_Line;
 
      Put_Line("W_ZeroP(1) :");
      Dump(W_ZeroP(1));
      New_Line;
      New_Line;
 
      Put_Line("W_NZeroP(1) :");
      Dump(W_NZeroP(1));
      New_Line;
      New_Line;
 
      Put_Line("W_OddP(3) :");
      Dump(W_OddP(3));
      New_Line;
      New_Line;
 
      Put_Line("W_EqP(1, 1) :");
      Dump(W_EqP(1, 1));
      New_Line;
      New_Line;
 
      Put_Line("W_EqP(1, 0) :");
      Dump(W_EqP(1, 0));
      New_Line;
      New_Line;
 
   end Demo_Word_Preds;
 
 
   procedure Demo_FZ_Basics is
 
      X : FZ(1 .. 4) := ( 0,  0,  0,  0);
 
      -- Shorthand for 'maxint'
      Y : FZ(1 .. 4) := (-1, -1, -1, -1);
 
      Z : FZ(1 .. 4) := ( 0,  0,  0,  0);
 
      -- Flag.
      F : WBool := 0;
 
   begin
 
      Put_Line("~~~ Ch. 2 : FZ Basics ~~~");
      New_Line;
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
      New_Line;
 
      Put_Line("FZ_ZeroP(X) :");
      Dump(FZ_ZeroP(X));
      New_Line;
      New_Line;
 
      Put_Line("FZ_ZeroP(Y) :");
      Dump(FZ_ZeroP(Y));
      New_Line;
      New_Line;
 
      FZ_Swap(X, Y);
      Put_Line("After FZ_Swap(X, Y) :");
      Put_Line("X         =");
      Dump(X);
      New_Line;
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
      New_Line;
 
      F := 0;
      FZ_Mux(X, Y, Z, F);
      Put_Line("After F := 0 , FZ_Mux(X, Y, Z, F) : ");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      F := 1;
      FZ_Mux(X, Y, Z, F);
      Put_Line("After F := 1 , FZ_Mux(X, Y, Z, F) : ");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
   end Demo_FZ_Basics;
 
 
   procedure Demo_FZ_Various_Ops is
 
      X : FZ(1 .. 4) := ( 16#083e16f27091f65f#,  16#74c01a9c3ce54f80#,
                          16#9fd0913737c3fcbe#,  16#fa55f3f5459a9e79# );
 
      Y : FZ(1 .. 4) := ( 16#d16b9d65bf3991d0#,  16#8a509a858786b366#,
                          16#d39d23682728f4f8#,  16#6c571dbc646bf513# );
 
      Z : FZ(1 .. 4) := ( 0,  0,  0,  0 );
 
   begin
 
      Put_Line("~~~ Ch. 2 : FZ Bit Ops ~~~");
      New_Line;
      New_Line;
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
      New_Line;
 
      FZ_Neg(X, Z);
      Put_Line("After FZ_Neg(X, Z):");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      FZ_And(X, Y, Z);
      Put_Line("After FZ_And(X, Y, Z) :");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      FZ_Or(X, Y, Z);
      Put_Line("After FZ_Or(X, Y, Z) :");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      FZ_Xor(X, Y, Z);
      Put_Line("After FZ_Xor(X, Y, Z) :");
      Put_Line("Z         =");
      Dump(Z);
      New_Line;
      New_Line;
 
      Put_Line("FZ_EqP(X, X) :");
      Dump(FZ_EqP(X, X));
      New_Line;
      New_Line;
 
      Put_Line("FZ_EqP(X, Y) :");
      Dump(FZ_EqP(X, Y));
      New_Line;
      New_Line;
 
      Put_Line("FZ_LessThanP(X, Y) :");
      Dump(FZ_LessThanP(X, Y));
      New_Line;
      New_Line;
 
      Put_Line("FZ_LessThanP(Y, X) :");
      Dump(FZ_LessThanP(Y, X));
      New_Line;
      New_Line;
 
      Put_Line("FZ_GreaterThanP(X, Y) :");
      Dump(FZ_GreaterThanP(X, Y));
      New_Line;
      New_Line;
 
      Put_Line("FZ_GreaterThanP(Y, X) :");
      Dump(FZ_GreaterThanP(Y, X));
      New_Line;
      New_Line;
 
   end Demo_FZ_Various_Ops;
 
 
end Demo_Ch2;

ffa_demo.adb:

with Demo_Ch1; use Demo_Ch1;
with Demo_Ch2; use Demo_Ch2;
 
procedure FFA_Demo is
begin
 
   -- Ch. 1
   Demo_Add_Sub;
 
   -- Ch. 2
   Demo_Word_Preds;
   Demo_FZ_Basics;
   Demo_FZ_Various_Ops;
 
end FFA_Demo;

Finally, let’s run it:

./bin/ffa_demo

And this is what you should see:

~~~ Ch. 1 : Add, Sub ~~~
 
X         =
0000000000000000000000000000000000000000000000000000000000000000
Y         =
0000000000000000000000000000000000000000000000000000000000005555
X + Y     =
0000000000000000000000000000000000000000000000000000000000005555
C         =  0
X - Y     =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAAAB
C         =  1
 
~~~ Ch. 2 : Word Predicates ~~~
 
W_ZeroP(0) :
0000000000000001
 
W_ZeroP(1) :
0000000000000000
 
W_NZeroP(1) :
0000000000000001
 
W_OddP(3) :
0000000000000001
 
W_EqP(1, 1) :
0000000000000001
 
W_EqP(1, 0) :
0000000000000000
 
~~~ Ch. 2 : FZ Basics ~~~
 
X         =
0000000000000000000000000000000000000000000000000000000000000000
 
Y         =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 
FZ_ZeroP(X) :
0000000000000001
 
FZ_ZeroP(Y) :
0000000000000000
 
After FZ_Swap(X, Y) :
X         =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 
Y         =
0000000000000000000000000000000000000000000000000000000000000000
 
After F := 0 , FZ_Mux(X, Y, Z, F) : 
Z         =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 
After F := 1 , FZ_Mux(X, Y, Z, F) : 
Z         =
0000000000000000000000000000000000000000000000000000000000000000
 
~~~ Ch. 2 : FZ Bit Ops ~~~
 
 
X         =
FA55F3F5459A9E799FD0913737C3FCBE74C01A9C3CE54F80083E16F27091F65F
 
Y         =
6C571DBC646BF513D39D23682728F4F88A509A858786B366D16B9D65BF3991D0
 
After FZ_Neg(X, Z):
Z         =
05AA0C0ABA656186602F6EC8C83C03418B3FE563C31AB07FF7C1E90D8F6E09A0
 
After FZ_And(X, Y, Z) :
Z         =
685511B4440A9411939001202700F4B800401A8404840300002A146030119050
 
After FZ_Or(X, Y, Z) :
Z         =
FE57FFFD65FBFF7BDFDDB37F37EBFCFEFED09A9DBFE7FFE6D97F9FF7FFB9F7DF
 
After FZ_Xor(X, Y, Z) :
Z         =
9602EE4921F16B6A4C4DB25F10EB0846FE908019BB63FCE6D9558B97CFA8678F
 
FZ_EqP(X, X) :
0000000000000001
 
FZ_EqP(X, Y) :
0000000000000000
 
FZ_LessThanP(X, Y) :
0000000000000000
 
FZ_LessThanP(Y, X) :
0000000000000001
 
FZ_GreaterThanP(X, Y) :
0000000000000001
 
FZ_GreaterThanP(Y, X) :
0000000000000000

~To be continued!~

“Finite Field Arithmetic.” Chapter 1: Genesis.

FFA, or the Finite Field Arithmetic library, differs from the typical Open Sores abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute.

This article is the first in a series, at the end of which the patient reader will have glued together, with his own hands — and fully fit into his head — a demonstrably-correct set of routines for multiple-precision arithmetic, sufficient for carrying out RSA and many other forms of number-theoretical cryptography.

The “bignum” arithmetic libraries currently in common use were, near as I can tell, all and without exception written by NSA assets and their various “useful idiots”. The conclusion is difficult to avoid, given as every example known to me suffers from one or more of the following ills:

FFA, on the other hand:

  • Written in a heavily-restricted subset of the Ada programming language — the only currently-existing nonproprietary statically-compiled language which permits fully bounds-checked, pointerolade-free code and practically-auditable binaries. We will be using GNAT, which relies on the GCC backend.
  • No external dependencies or libraries are made use of; suitable for embedded systems with limited memory. Language features which bloat the generated binary (e.g. OOP) are avoided.
  • No use of heap allocators. (In fact, wholly allocator-agnostic.) Memory requirement for all operations is readily calculable in advance from the width of the inputs.
  • The total set of components needed for RSA occupies around 3000 lines, incl. comments, of clear and unidiomatic (i.e. containing no “perlisms”) code.
  • Contains no inline assembler and is married to no particular machine (i.e. can be built for microcontrollers, MS-DOS boxes, CPUs of virtually any “bitness”)
  • Does not branch-on-secret-bits
  • Does not address memory by secret values, i.e. is cachebleed-proof
  • Fully unrollable, deterministic (and ergo auditable) control flow. I.e. all RSA modular-exponentiation operations will take exactly the same amount of time, via exactly the same flow of instructions; all algorithmic-complexity attack vectors are ruled out.

FFA will be presented to the reader in small pieces. Each should take more than one hour, but probably fewer than two, to fully study and understand.

You will need:

Just as “theater begins with the coat room”, each FFA article begins with a V-press. Set up your Vtron if you have not already done so, and press the “genesis” — “ffa_ch1_genesis”.

You will end up with two directories, libffa and ffademo. The former contains the subset of FFA discussed in this chapter; the latter — the demo routines for same. The reader is invited to modify the demo routines, experiment, learn.

Each subsequent chapter of the FFA series will contain a V-patch on the previous set, and the next — another, until the final article in the series: where “P” — an FFA-tronic GPG replacement — is introduced.

First, verify that your V-Tron, GNAT, and genesis press are in working order, by compiling the introductory demo:

cd ffademo
gprbuild

The output should resemble:

using project file ffa_demo.gpr
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections ffa_demo.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc word_ops.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc fz_type.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc fz_arith.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc w_shifts.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc iron.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections
-gnatec=/home/you/ffa/libffa/restrict.adc words.ads
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections demo_ch1.adb
gcc -c -O2 -fdump-scos -gnata -fstack-check -fdata-sections -ffunction-sections ffa_io.adb
gprlib FFA.lexch
ar cr /home/you/ffa/libffa/lib/libFFA.a /home/you/ffa/libffa/obj/word_ops.o
/home/you/ffa/libffa/obj/fz_type.o ...
ranlib libFFA.a
gprbind ffa_demo.bexch
gnatbind ffa_demo.ali
gcc -c b__ffa_demo.adb
gcc ffa_demo.o -Wl,--gc-sections -static -o ffa_demo

Observe that the contents of the ‘libffa’ directory build recursively.

If you are an Apple victim, your OS prohibits static linking, and you may see barf instead. You may use the debug (dynamically linked) variant:

gprbuild -Xmode=debug

Do not run the demo yet. (you read programs before running, don’t you?) Instead, let’s take a careful look at what is inside.

Let’s begin with the restriction pragmas. These define a particular, constrained subset of the Ada programming language, by prohibiting the use of certain features. GNAT will balk if it encounters a prohibited feature in FFA or the demo routines.

The logic behind the restrictions was a combination of “guilty until proven innocent” — if we can do without a construct, we will — and “enable all safety features”:

restrict.adc:

pragma Restrictions(Immediate_Reclamation);
pragma Restrictions(Max_Asynchronous_Select_Nesting => 0);
pragma Restrictions(Max_Protected_Entries => 0);
pragma Restrictions(Max_Select_Alternatives => 0);
pragma Restrictions(Max_Task_Entries => 0);
pragma Restrictions(Max_Tasks => 0);
pragma Restrictions(No_Abort_Statements);
pragma Restrictions(No_Access_Parameter_Allocators);
pragma Restrictions(No_Allocators);
pragma Restrictions(No_Asynchronous_Control);
pragma Restrictions(No_Calendar);
pragma Restrictions(No_Coextensions);
pragma Restrictions(No_Default_Stream_Attributes);
pragma Restrictions(No_Delay);
pragma Restrictions(No_Dispatch);
pragma Restrictions(No_Dispatching_Calls);
pragma Restrictions(No_Dynamic_Attachment);
pragma Restrictions(No_Dynamic_Priorities);
pragma Restrictions(No_Entry_Calls_In_Elaboration_Code);
pragma Restrictions(No_Entry_Queue);
pragma Restrictions(No_Enumeration_Maps);
pragma Restrictions(No_Exception_Propagation);
pragma Restrictions(No_Exception_Registration);
pragma Restrictions(No_Finalization);
pragma Restrictions(No_Fixed_Io);
pragma Restrictions(No_Floating_Point);
pragma Restrictions(No_Implementation_Aspect_Specifications);
pragma Restrictions(No_Implementation_Units);
pragma Restrictions(No_Implicit_Aliasing);
pragma Restrictions(No_Implicit_Conditionals);
pragma Restrictions(No_Implicit_Dynamic_Code);
pragma Restrictions(No_Implicit_Heap_Allocations);
pragma Restrictions(No_Implicit_Protected_Object_Allocations);
pragma Restrictions(No_Implicit_Task_Allocations);
pragma Restrictions(No_Initialize_Scalars);
pragma Restrictions(No_Local_Protected_Objects);
pragma Restrictions(No_Local_Timing_Events);
pragma Restrictions(No_Multiple_Elaboration);
pragma Restrictions(No_Nested_Finalization);
pragma Restrictions(No_Protected_Type_Allocators);
pragma Restrictions(No_Protected_Types);
pragma Restrictions(No_Relative_Delay);
pragma Restrictions(No_Requeue_Statements);
pragma Restrictions(No_Secondary_Stack);
pragma Restrictions(No_Select_Statements);
pragma Restrictions(No_Specific_Termination_Handlers);
pragma Restrictions(No_Standard_Allocators_After_Elaboration);
pragma Restrictions(No_Stream_Optimizations);
pragma Restrictions(No_Streams);
pragma Restrictions(No_Task_Allocators);
pragma Restrictions(No_Task_At_Interrupt_Priority);
pragma Restrictions(No_Task_Attributes_Package);
pragma Restrictions(No_Task_Hierarchy);
pragma Restrictions(No_Tasking);
pragma Restrictions(No_Task_Termination);
pragma Restrictions(No_Terminate_Alternatives);
pragma Restrictions(No_Unchecked_Access);
pragma Restrictions(No_Unchecked_Conversion);
pragma Restrictions(No_Unchecked_Deallocation);
pragma Restrictions(No_Wide_Characters);
pragma Restrictions(Pure_Barriers);
pragma Restrictions(Simple_Barriers);
pragma Restrictions(Static_Priorities);
pragma Restrictions(Static_Storage_Size);
pragma Validity_Checks(ALL_CHECKS);

Towards the end of this article series, we will investigate the impact of such “fascism” on program performance, and the possibility of cautiously replacing certain bounds checks with formal proof of nonoverflowability.

Let’s proceed to the builder. Observe that GnuMake was not listed in the list of required programs for this tutorial. Instead we make use of GNAT’s “GprBuild”, a substantially cleaner and better-behaved item:

ffa.gpr:

project FFA is
 
  for Object_Dir use "obj";
 
  type Mode_Type is ("debug", "release");
  Mode : Mode_Type := external ("mode", "release");
 
  for Languages         use ("Ada");
  for Source_Dirs       use (".");
  for Library_Dir       use "lib";
  for Library_Name      use "FFA";
  for Library_Kind      use "static";
 
  package Binder is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-static", "-r");
     end case;
  end Binder;
 
  package Builder is
     for Switches ("Ada")
       use ("-nostdlib");
  end Builder;
 
  package Compiler is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ("-g");
        when "release" =>
           for Switches ("Ada")
             use ("-O2", "-fdump-scos", "-gnata", "-fstack-check",
                  "-fdata-sections", "-ffunction-sections",
                  "-gnatec=" & FFA'Project_Dir & "restrict.adc");
     end case;
  end Compiler;
 
end FFA;

Now let’s proceed to the only machine-dependent section of FFA.

The supplied iron.ads pre-supposes eight-bit bytes, and a 64-bit CPU:

iron.ads:

package Iron is
 
   pragma Pure;
 
   --------------------------------------
   -------- For a 64-bit system: --------
   --------------------------------------
   MachineBitness     : constant Positive := 64;
   MachineBitnessLog2 : constant Positive := 6; -- log2(64)
   --------------------------------------
 
   -- Bits per Byte
   ByteBits           : constant Positive := 8;
 
end Iron;

If this is not what you have, change the constants, e.g.:

   ------------------------------------
   ------ For a 32-bit system: --------
   ------------------------------------
     MachineBitness     : constant Positive := 32;
     MachineBitnessLog2 : constant Positive := 5; -- log2(32)
   ------------------------------------

The purpose of MachineBitnessLog2 will become apparent later; ignore it for now.

Let’s proceed to the most fundamental building block of FFA, the machine Word. A Word, from this point on, is simply a single machine word on your particular machine. It is expressed as an Ada “modular type”.

words.ads:

with Iron;
 
package Words is
 
   pragma Pure;
 
   -- The most fundamental fact about Word: width.
   Bitness     : constant Positive := Iron.MachineBitness;
 
   -- It is possible to calculate BitnessLog2 at elaboration time,
   -- but we will avoid having ~any~ elaboration at all in FFA.
   BitnessLog2 : constant Positive := Iron.MachineBitnessLog2;
 
   -- The Word width, expressed in ~bytes~:
   Byteness    : constant Positive := Bitness / Iron.ByteBits;
 
   -- What kind of words to use. Must be machine-word or smaller.
   type Word is mod 2**Bitness;
   for Word'Size use Bitness;
 
   -- The very same Word, but its only legal values are 0 and 1.
   subtype WBool is Word range 0 .. 1;
 
   -- When we must refer to individual bit positions of a machine word:
   subtype WBit_Index is Natural range 0 .. Bitness - 1;
 
end Words;

A WBool is simply a Word which is permitted to contain strictly Boolean (0, 1) values. The compiler and runtime will enforce this restriction.

GNAT needs a little bit of help in making the full set of bitwise operations available for use on Word.

For some peculiar reason, the Ada standards group made the fundamental Shift and Rotate bitwise ops into optional components.

Fortunately, on GNAT we can force them to exist (On a non-GNAT compiler, you’re on your own) :

w_shifts.ads:

with Words; use Words;
 
package W_Shifts is
 
   pragma Pure;
 
   function Shift_Left
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Shift_Left);
 
   function Shift_Right
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Shift_Right);
 
   function Rotate_Left
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Rotate_Left);
 
   function Rotate_Right
     (Value  : Word;
      Amount : Natural)
     return    Word;
   pragma Import(Intrinsic, Rotate_Right);
 
end W_Shifts;

Now let’s proceed to the most fundamental arithmetic operations on Words:

word_ops.ads:

with Words; use Words;
 
-- Fundamental Operations on Words:
package Word_Ops is
 
   pragma Pure;
 
   -- Branch-free calculation of 'carry' from a machine-word addition.
   function W_Carry(A : in Word; B : in Word; S : in Word)
                   return WBool;
 
   -- Branch-free calculation of 'borrow' from a machine-word subtraction.
   function W_Borrow(A : in Word; B : in Word; D : in Word)
                    return WBool;
 
   -- Without any branching: if Sel == 0, return A; if Sel == 1, return B.
   function W_Mux(A : in Word; B : in Word; Sel : in WBool)
                 return Word;
 
   -- Exchange A and B.
   procedure W_Swap(A : in out Word; B : in out Word);
 
end Word_Ops;

word_ops.adb:

with W_Shifts; use W_Shifts;
 
-- Fundamental Operations on Words:
package body Word_Ops is

But wait.

Ada — like C — does not give us any portable method of determining the over/underflow from output from machine arithmetic. And, it turns out, there existed in the past — and apparently exist today — CPUs made by idiots and wreckers (e.g. ‘RISC-V’) that do not have flags at all!

However, for multi-word addition, subtraction, the inner loop of Comba’s multiplication, and for a handful of other ops, we demand it.

So we derive the Carry or Borrow at the ‘eldest’ binary position, without using any inline assembly or special compiler features:

   -- Find the Carry, from an addition where it is known that A + B == S:
   function W_Carry(A : in Word; B : in Word; S : in Word)
                   return WBool is
   begin
      return WBool(Shift_Right((A and B) or ((A or B) and (not S)),
                               Bitness - 1));
   end W_Carry;
   pragma Inline_Always(W_Carry);
 
   -- Find the Borrow, from a subtraction where it is known that A - B == D:
   function W_Borrow(A : in Word; B : in Word; D : in Word)
                    return WBool is
   begin
      return WBool(Shift_Right(((not A) and B) or (((not A) or B) and D),
                               Bitness - 1));
   end W_Borrow;
   pragma Inline_Always(W_Borrow);

To derive the Carry from the addition S of words A and B, we take the highest bit ( i.e. Shift_Right( … , Bitness – 1) ) of the result of (A and B) or ((A or B) and (not S).

To derive the Borrow from the subtraction D of words A and B, we take the highest bit ( i.e. Shift_Right( … , Bitness – 1) ) of the result of ((not A) and B) or (((not A) or B) and D).

This derivation of the flags, strictly from the inputs and outputs of a machine-word addition or subtraction, is the single most tricky item in this article. Be prepared to spend some time drawing the circuit (of a full adder, with carry-in and carry-out) to verify the offered proof:

A+B+C is the output bit of 1-bit adder; C is carry-in;
A-B-C is the output bit of 1-bit subber; C is borrow-in.
Observe that A+B+C is equal to A-B-C for all A,B,C !
+-+-+-+-----+--------------+-----+----------------+
|           | 'Carry' out: |     | 'Borrow' out:  |
+-+-+-+-----+--------------+-----+----------------+
| | | |     |(a and b) or  |     |(~a and b) or   |
|A|B|C|A+B+C| ((a or b) and|A-B-C| ((~a or b) and |
| | | |     |    ~(A+B+C)) |     |    (A-B-C))    |
+-+-+-+-----+--------------+-----+----------------+
|0|0|0|  0  |      0       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|1|0|0|  1  |      0       |  1  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|1|0|  1  |      0       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|1|0|  0  |      1       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|0|1|  1  |      0       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|0|1|  0  |      1       |  0  |       0        |
+-+-+-+-----+--------------+-----+----------------+
|0|1|1|  0  |      1       |  0  |       1        |
+-+-+-+-----+--------------+-----+----------------+
|1|1|1|  1  |      1       |  1  |       1        |
+-+-+-+-----+--------------+-----+----------------+
--- This base case extends to any N bit register, since
--- both equations depend ~strictly~ on A, B, and C.

Now study how the Mux and Swap operations work. Observe that no transfer-of-control CPU instructions are required:

   -- Without any branching: if Sel == 0, return A; if Sel == 1, return B.
   function W_Mux(A : in Word; B : in Word; Sel : in WBool) return Word is
   begin
      return B xor ((Sel - 1) and (A xor B));
   end W_Mux;
   pragma Inline_Always(W_Mux);
 
   -- Exchange A and B.
   procedure W_Swap(A : in out Word; B : in out Word) is
   begin
      A := A xor B;
      B := A xor B;
      A := A xor B;
   end W_Swap;
   pragma Inline_Always(W_Swap);
 
end Word_Ops;

Now we are finally ready to define FZ — the fundamental FFA type: an unsigned integer of fixed width, i.e. a contiguous array of machine words.

fz_type.ads:

with Words; use Words;
 
package FZ_Type is
 
   pragma Pure;
 
   -- Indices of all indexable items:
   type Indices is new Natural;
 
   -- Index of a particular Word in an FZ:
   subtype Word_Index is Indices;
 
   -- The FZ, in person! I.e. a bignum of permanently fixed bitness.
   type FZ is array(Word_Index range <>) of Word;
 
   -- A count of Words in an FZ (cannot be 0):
   subtype Word_Count is Indices range 1 .. Indices'Last;
 
   -- An index of a particular ~bit~ in an FZ:
   subtype FZBit_Index is Indices;
 
end FZ_Type;

And finally, our first multi-word FFA arithmetical operations: ordinary addition and subtraction, with carry output.

Observe that a program which invokes FFA is expected to allocate memory of the proper size for each FFA integer; and that no such thing as normalization is permitted or permissible in FFA — it would result in branches-on-secrets, i.e. timing side-channels. An addition or a subtraction of a pair of FFA integers with no bits set (i.e. all bits, in each word, equal 0) takes exactly the same amount of time, and is performed via the same sequence of CPU instructions, as of two FFA integers with all bits set (i.e. all bits, in each word, equal 1), or of any other possible integers of the same bitness (i.e. number of words.) In other words, the algorithmic complexity of FFA’s arithmetical operations is always independent of the Hamming weights of the given operands.

fz_arith.ads:

with Words;   use Words;
with FZ_Type; use FZ_Type;
 
-- Fundamental Arithmetic operators on FZ:
package FZ_Arith is
 
   pragma Pure;
 
   -- Sum := X + Y; Overflow := Carry
   procedure FZ_Add(X          : in    FZ;
                    Y          : in    FZ;
                    Sum        : out   FZ;
                    Overflow   : out   WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Sum'Length);
 
   -- Difference := X - Y; Underflow := Borrow
   procedure FZ_Sub(X          : in  FZ;
                    Y          : in  FZ;
                    Difference : out FZ;
                    Underflow  : out WBool);
   pragma Precondition(X'Length = Y'Length and X'Length = Difference'Length);
 
end FZ_Arith;

Observe that the preconditions for addition and subtraction demand equal bitnesses for pairwise FFA operators. This pattern will likewise hold in the algorithms we will discuss in later articles.

And, now at last, the FZ adder and FZ subtracter:

fz_arith.adb:

with Word_Ops; use Word_Ops;
 
-- Fundamental Arithmetic operators on FZ:
package body FZ_Arith is
 
   -- Sum := X + Y; Overflow := Carry
   procedure FZ_Add(X          : in  FZ;
                    Y          : in  FZ;
                    Sum        : out FZ;
                    Overflow   : out WBool) is
      Carry : WBool := 0;
   begin
      for i in X'Range loop
         declare
            A : constant Word := X(I);
            B : constant Word := Y(I);
            S : constant Word := A + B + Carry;
         begin
            Sum(i) := S;
            Carry  := W_Carry(A, B, S);
         end;
      end loop;
      Overflow := Carry;
   end FZ_Add;
   pragma Inline_Always(FZ_Add);
 
   -- Difference := X - Y; Underflow := Borrow
   procedure FZ_Sub(X          : in  FZ;
                    Y          : in  FZ;
                    Difference : out FZ;
                    Underflow  : out WBool) is
      Borrow : WBool := 0;
   begin
      for i in 0 .. Word_Index(X'Length - 1) loop
         declare
            A : constant Word := X(X'First + i);
            B : constant Word := Y(Y'First + i);
            S : constant Word := A - B - Borrow;
         begin
            Difference(Difference'First + i) := S;
            Borrow := W_Borrow(A, B, S);
         end;
      end loop;
      Underflow := Borrow;
   end FZ_Sub;
   pragma Inline_Always(FZ_Sub);
 
end FZ_Arith;

The alert reader might ask why the iteration schemes differ. This will be discussed in a subsequent chapter, when we walk through multiplication and the peculiarities of Ada’s “array slicing” construct.

Now let’s define a simple mechanism for printing FFA integers to the console:

ffa_io.ads:

with Words;   use Words;
with FZ_Type; use FZ_Type;
 
package FFA_IO is
 
   -- Character representation of a Word
   type WChars is array(1 .. 2 * Byteness) of Character;
 
   -- Obtain the WChars corresponding to the given Word
   function W_To_WChars(N : Word) return WChars;
 
   -- Display a hex representation of W to stdout
   procedure Dump(W : in Word);
 
   -- Display a hex representation of N to stdout
   procedure Dump(N : in FZ);
 
end FFA_IO;

ffa_io.adb:

with Ada.Text_IO; use Ada.Text_IO;
 
with Words;    use Words;
with W_Shifts; use W_Shifts;
with FZ_Type;  use FZ_Type;
 
package body FFA_IO is
 
   -- Obtain the WChars corresponding to the given Word
   function W_To_WChars(N : Word) return WChars is
      H      : constant array(0 .. 15) of Character := "0123456789ABCDEF";
      W      : Word := N;
      Result : WChars;
   begin
      for b in WChars'Range loop               -- From bottom to top:
         Result(B) := H(Natural(W and 16#F#)); -- Get current nibble.
         W         := Shift_Right(W, 4);       -- Get the next nibble.
      end loop;
      return Result;
   end W_To_WChars;
 
   -- Display a hex representation of W to stdout
   procedure Dump(W : in Word) is
      T : WChars := W_To_WChars(W);
   begin
      for i in reverse T'Range loop
         Put(T(i));
      end loop;
   end Dump;
 
   -- Display a hex representation of N to stdout
   procedure Dump(N : in FZ) is
   begin
      for i in reverse N'Range loop
         Dump(N(i));
      end loop;
   end Dump;
 
end FFA_IO;

Nothing particularly complicated here.

Now for Chapter 1’s demo routine itself:

demo_ch1.ads:

package Demo_Ch1 is
 
   procedure Demo_Add_Sub;
 
end Demo_Ch1;

In this Chapter, we do not yet have a means for programmatically setting the value of an FFA integer other than via Ada’s array assignment construct. But it will suffice for now, as we wish to simply explore addition and subtraction:

demo_ch1.adb:

-- From Ada:
with Ada.Text_IO; use Ada.Text_IO;
 
-- From FFA:
with Words;    use Words;
with FZ_Type;  use FZ_Type;
with FZ_Arith; use FZ_Arith;
 
-- From the Demo:
with FFA_IO;   use FFA_IO;
 
package body Demo_Ch1 is
 
   procedure Demo_Add_Sub is
 
      -- We are using 64-bit Words (see iron.ads).
      -- We'll begin with an elementary demo, using 256-bit FZ:
      X : FZ(1 .. 4) := (0,        0, 0, 0);
      Y : FZ(1 .. 4) := (16#5555#, 0, 0, 0);
      Z : FZ(1 .. 4) := (0,        0, 0, 0);
 
      -- Carry.
      C : WBool := 0;
 
   begin
 
      Put_Line("X         =");
      Dump(X);
      New_Line;
 
      Put_Line("Y         =");
      Dump(Y);
      New_Line;
 
      FZ_Add(X, Y, Z, C);
      Put_Line("X + Y     =");
      Dump(Z);
      New_Line;
      Put_Line("C         = " & WBool'Image(C));
 
      FZ_Sub(X, Y, Z, C);
      Put_Line("X - Y     =");
      Dump(Z);
      New_Line;
      Put_Line("C         = " & WBool'Image(C));
 
   end Demo_Add_Sub;
 
end Demo_Ch1;

Our “main” :

ffa_demo.adb:

with Demo_Ch1; use Demo_Ch1;
 
procedure FFA_Demo is
begin
 
   Demo_Add_Sub;
 
end FFA_Demo;

And finally, the “builder” in the ffademo directory.
This illustrates how the static library produced by libffa’s builder is put to use:

ffa_demo.gpr:

with "../libffa/ffa.gpr";
 
project FFA_Demo is
 
  for Object_Dir use "obj";
 
  type Mode_Type is ("debug", "release");
  Mode : Mode_Type := external ("mode", "release");
 
  for Languages   use ("Ada");
  for Source_Dirs use (".");
  for Exec_Dir    use "bin";
  for Main        use ("ffa_demo.adb");
 
  package Compiler is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ("-g");
        when "release" =>
           for Switches ("Ada")
             use ("-O2", "-fdump-scos", "-gnata", "-fstack-check",
                  "-fdata-sections", "-ffunction-sections");
     end case;
  end Compiler;
 
  package Binder is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-static");
     end case;
  end Binder;
 
  package Linker is
     case Mode is
        when "debug" =>
           for Switches ("Ada")
             use ();
        when "release" =>
           for Switches ("Ada")
             use ("-Wl,--gc-sections",
                  "-static");
     end case;
  end Linker;
 
end FFA_Demo;

At this point, the reader should be reasonably confident that the Chapter 1 demo will not format his HDD; and perhaps even has a reasonable supposition as to what the output will be.

Let’s run it:

./bin/ffa_demo

And this is what you should see:

X         =
0000000000000000000000000000000000000000000000000000000000000000
Y         =
0000000000000000000000000000000000000000000000000000000000005555
X + Y     =
0000000000000000000000000000000000000000000000000000000000005555
C         =  0
X - Y     =
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAAAB
C         =  1

Now it should be clear to the reader, that we are simply emulating a “stretched” version of something resembling a typical CPU’s ALU: a mechanism which arithmetizes on registers of fixed physical size.

~To be continued!~

The Peculiarly Quiet Decline and Fall of the KVM

Yes, that familiar little PC peripheral. The one that toggles a single keyboard/mouse/display ensemble between two or more connected machines.

They’ve quietly vanished from the market. And the fact appears to be discussed nowhere on the Net, at all. So I have seen it fit to make a note of it here.

What’s this, you say, they are still available? I’ll bite: who and where sells a KVM box that reliably works with recent ( 3840 × 2160 x 60 Hz ) display, and costs less than simply adding another such display would cost?

Every single claimed counterexample I’ve turned up, either has a kilometer of “this is rubbish, doesn’t work at all at the claimed 4K res” reviews hanging from it; or no reviews at all, on account of no one being stupid enough to purchase a KVM that costs more than adding two or three additional high-quality displays to the workstation.

I have no idea re: just why the KVM has gone into extinction. But it seems to have! And said demise is rather difficult to explain in purely technical terms, because signal switches with arbitrary bandwidth remain available: analogue relays (as found in, e.g., inexpensive oscilloscopes and elsewhere.) So what gives?


Edit: To all of the “switching an analogue 500MHz takes exotic and expensive FETs” people: somehow a mid-range — 300-400 $ — LCD quite routinely takes input from 2 (sometimes more) selectable Displayport jacks. But to get exactly the same functionality in an external KVM, is 700-1000 ? Why?

Posted in: Chumpatronics, Hardware, NonLoper by Stanislav 74 Comments

Sage SmartProbe FAQ


What’s a Sage SmartProbe?

See previous articles,
A Complete Pill for the Sage SmartProbe.
The Care and Feeding of the Sage SmartProbe.


Where do I get a Sage SmartProbe?

Possibly here — while supplies last. (Warning: I have no relation with this vendor and have no idea whether they still include the probe with “Gizmo 1″ kit. I do know for a fact that their latest “Gizmo 2″ product does not include it.)

SmartProbe is no longer made and at some point will only be available secondhand.


Who is selling devices comparable to SmartProbe ?

AFAIK nobody! Several vendors offer similar products but they do not support plain old GDB interface, and instead require you to buy multi-thousand-$ MS Windows closed shitware — sometimes with a mandatory maintenance contract !


Can anyone help me with sage_pill.py? When I run it, it gets through programming the FPGA, but then dies

This is the correct behaviour. The probe resets itself after it is cured by the pill and your serial connection to it will break.


How do I verify that the pill worked?

screen /dev/ttyACM0
(Or wherever your probe is connected) and hit return key a few times. You will see the string SmartProbe: 3.2_3743 (the last firmware revision released before the manufacturer went out of business.) And your probe will no longer die after N hours of use.


Do you have a pill for the Sage EDK software?

The EDK is rubbish, Java shitware, forget about it; use plain old GDB.


I got GDB working with the probe, but I can’t seem to get the probe to work with the board. If I plug in the probe to the board, the board hangs. If I try to continue I just get a bunch of ffffffs. If I do “monitor run” it says the probe is not connected ?

You plugged the cable into your motherboard backwards!


Help! GDB does strange things instead of working !

You must use version 7.8 of GDB or above, for the probe to work. Earlier versions had problems with the switches between 16, 32, and 64-bit modes of the AMD CPU.


Will Sage SmartProbe work with a PcEngines APU1 computer ?

Yes. Though you will have to solder the debug connector on yourself. (It is a 5 minute job. The header is a metric-spaced one, however, make sure to buy the right kind.)


Will Sage SmartProbe work with the recently-released PcEngines APU2 computer ?

No! on account of AMD’s PSP NSA-mandated Fritz Chip.


Does the Sage SmartProbe work with Intel CPU and motherboard?

No.


Will you help me with Intel’s XDP debugger for Intel CPU and motherboard?

No.


Why did Sage go out of business?

I don’t know. But AMD’s successful campaign of docs-withholding, stonewalling, and Fritz-chipping, to strangle Coreboot — probably did not help Sage’s life expectancy. Sage was a consultancy mainly specializing in Coreboot and custom third-party BIOS work.


Were you affiliated with Sage? Can you put me in touch with ex-Sage people ?

No. And no.


I have questions!

Try asking — politely — here.


Practical Lead to Gold Conversion; or an Intro to Opteron Resurrection.

It so happens that AMD resisted the informal NSA ban on the manufacture of LinuxBIOS-capable x86 CPUs slightly longer than Intel did. Or perhaps they were simply slower to succumb to Microsoft’s Fritz chip decree. Whichever may be the case, vintage (pre-2011, with some caveats, inquire within) AMD Opteron motherboards are a precious and increasingly scarce non-renewable resource. Is it possible to prolong the life of these devices? And when they finally die, to bring them back to life? Turns out, in some cases, the answer is yes!

Our patient today is a Tyan S2915 with nearly 11 years of continuous service. A true champ — but the sad day came: it would not come back up after a scheduled reboot; then, a week later, already staring at the dumpster, it deigned to switch on if “asked nicely”, sporadically. The culprit, in this case, was the same as in nearly every other “smoked” mobo: aluminum electrolytic caps. But which?

One useful fact to remember: wet-electrolytic capacitors age in direct proportion to temperature.

So we take a cheap thermal camera:

And, unsurprisingly, we find a bulging and slightly-leaky electrolytic near the hot (~80C, if running on a desk, 70C or so in well-ventilated case) northbridge:

And another, similarly-sad one near the PCI-E slot where the video card was mounted:

Now, this board is a vintage one, but unfortunately the lead-free solder scam was already in full swing at the time it was made:

And so what would otherwise have been a kindergarten-level repair job — desoldering and replacing two large capacitors — turns into a somewhat tricky and quite time-consuming exercise.

Even setting the iron to its maximum temperature (~250C):

… I found that the solder on the leads would not visibly begin to melt at all !

The melting point of Pb-free solder is considerably higher (how much higher — varies, by particular composition) than of traditional alloys, and a commonplace adjustable soldering iron simply will not work. (And if you were to use a specially-made, hotter iron, you would quite possibly damage the surrounding tracks on the PCB.)

So now let’s try a cheap but effective pill against the Eurocrat idiocy. We take a tube of good, old-fashioned pre-ban leaded solder, some wick, new capacitors:

Now add leaded solder, repeatedly, then wick it away. Again, and again. We end up slowly adding lead to the alloy until the melting point is manageable. It took me four shots of this, per capacitor leg, before either of the two sad caps would budge.

Now it is possible to install the new ones:

And we’re back in business! Here’s to another 11 years, old friend.

Still Tenser, Said the Censor.

Stay classy, YCombinator.