"The Harper."


"The Harper."
by Vlas Mikhailovich Doroshevich (1864--1922).


The Emperor Jin-La-O, may his memory be sacred to the whole braid-adorned world, was a wise and just Emperor.
One day he summoned his entourage and said to them:
- I would like to learn the name of the greatest villain in all of Beijing - in order to punish him appropriately, frighten the wicked and encourage the virtuous.
The courtiers bowed at his feet and set forth. For three days and three nights they walked around Beijing, visiting bazaars, teahouses, opium dens, temples and generally places where people crowd. They listened attentively.
And on the fourth day they came to the Emperor, bowed at his feet and said:
- We did all that our humble strengths could do to fulfill your heavenly will. And indeed they had.
- Do you now know who is the greatest villain in Beijing? - asked the Emperor.
- Yes, Master of the Universe. We know him. -- His name? - Jian-Fu.
- What has this scoundrel been up to? - exclaimed, boiling over with noble indignation, the Emperor.
- He plays the harp! - the emissaries answered.
- What crimes does this harper commit? Does he kill people? - asked the Emperor.
-- No.
- Does he rob?
-- No.
- Does he steal?
-- No.
- But what, at last, are the incredible doings of this man? - exclaimed the Emperor, lost in conjecture.
- Exactly nothing! - the emissaries answered. - He only plays the harp. And he plays splendidly, I must confess. You yourself, the Lord of the Sun and the Lord of the Universe, have repeatedly deigned to listen to his playing and even approved of it.
-- Yes Yes! Now I recall! Harper Jian-Fu! I recall. An excellent harper! But why do you consider him to be the greatest villain in Beijing?
The courtiers bowed and answered:
- Because all of Beijing scolds him. "Scoundrel Jian-Fu"! "Swindler Jian-Fu"! "Villain Jian-Fu"! - is all you hear at every step you take. We went around all the temples, all the bazaars, all the teahouses, all the places where people crowd - and everywhere everyone spoke only of Jian-Fu. And when they spoke of him, they did nothing but to scold him.
-- Strange! - exclaimed the Emperor. - No, there is surely something amiss here!
And he decided to investigate the peculiar matter himself. He disguised himself as a commoner and, accompanied by two similarly-disguised bodyguards, set off to wander the streets of Beijing. He came to the bazaar.
The morning market was over, the merchants were folding their baskets and chatting among themselves.
- This wretch Jian-Fu! - shouted one of the traders. - He played a sad song again last night at the new moon holiday. If only he would play something merry!
- Don't hold your breath! the other laughed viciously. - Can this scoundrel even play merry songs?! Making merry is for those whose souls are as white as a tea-tree flower. But this swindler has a soul as black as ink. That's why he plays only sad songs.
- How does such a villain cheat the gallows! someone in the crowd exclaimed.
- He ought to be cut in half with a blunt saw, and certainly the long way through! - corrected a neighbor.
- No, tie him to two horses by the arms and legs and tear him apart!
- Put him in a sack with cats that have not been fed for a long time!
And everyone shouted:
- Villain Jian-Fu! - Scoundrel Jian-Fu! - How does the earth stand him!
The Emperor went to the tea house.
Visitors sat on mats and drank tea from tiny cups.
- Good afternoon, good people! Let the souls of your ancestors whisper good advice to your souls! - greeted the Emperor, entering and bowing. - What's new in Beijing?
- Why, before you got here, we were just talking about the villain Jian-Fu! - said one of those present. - Has he done something? - asked the Emperor. -- What? Didn't you hear? The whole city is talking about it! - exclaimed all around. - Yesterday he accidentally hooked his fingernail on the wrong string and hit the wrong note! Scoundrel!
- What a horror it was! one of them exclaimed, pretending to writhe.
- And yet he hasn't been hanged yet! - Nor torn to pieces!
And all, indignant to the depths of their souls, exclaimed:
- Scoundrel Jian-Fu! - Swindler Jian-Fu! - Villain Jian-Fu!
The Emperor went to the opium den. There was a terrible din there.
-- What happened? - asked the Emperor.
-- A! As always! They are arguing about Jian-Fu! - the owner waved his hand.
The smokers, lying on their cots, scolded Jian-Fu through and through.
- He played five songs yesterday! one shouted. - As if two weren't enough! - He played five songs yesterday! - grumbled another. - As if he couldn't have played seven or eight!
And they scolded Jian-Fu until they fell asleep with their eyes open.
And even then, they muttered in their sleep: - That villain Jian-Fu! - Scoundrel Jian-Fu!
- Swindler to rule all swindlers, Jian-Fu!
The Emperor went to the temple.
People prayed to the gods, but when they got tired of praying, they began to exchange remarks and whispered to each other:
- And Jian-Fu, it so happens, is a scoundrel!
In short, until nightfall the Emperor walked around the whole city and everywhere heard only: - Jian-Fu! Jian-Fu! Jian-Fu! The villain! Scoundrel! Swindler! Finally, in the evening, returning home, he stopped on his way at the house of a poor coolie and, wishing the hosts a good supper, asked:
- Have you heard the harper Jian-Fu?
- How could we have! - answered the poor coolie. - Do you think we have time for entertainment or money to pay to hear harp music! We haven't even enough for rice! But we still know that Jian-Fu is a scoundrel! All of Beijing is talking about it.
And the whole family set off nitpicking the playing of a man whom they had never seen or heard, and said: - That villain Jian-Fu! - Scoundrel Jian-Fu! - Swindler Jian-Fu!
The Emperor, returning to the palace, was beside himself with amazement.
- What could all of this mean?
And, despite the late hour, he ordered that Jian-Fu be found right away and brought to him.
The harper was found and immediately brought to the Emperor.
- Hello, Jian-Fu! - said the Emperor. - Do you know that all of Beijing is scolding no one but you?
- I know, Heavenly Wisdom! - replied Jian-Fu as he bowed low.
- All they do is nitpick your playing! They pick on such trifles that it's simply awful. And they scold you for such little things through and through!
- I know, Heavenly Wisdom! - babbled Jian-Fu.
- So why is this happening?
- This happens, it turns out, for a very simple reason! - answered Jian-Fu. They are not allowed to discuss anything except for my harp-playing. So they singled me out for nitpicking and scolding.
The Emperor put his finger to his forehead and said:
-- A!
And he banned all discussion of Jian-Fu's playing as well. The Emperor Jin-La-O was a just Emperor.

"The Story of One Wetnurse."

Lately, I found myself unable to resist the temptation to translate this very pertinent classic to English. If you, reader, know of a better translation, do not hesitate to write in. And so, here goes:


"The Story of One Wetnurse."
by Vlas Mikhailovich Doroshevich (1864--1922).


Emperor Jing-Li-O, nicknamed Hao-Tu-Li-Chi-San-He-Nun, which means Justice Itself, once woke up feeling unwell.

- The Emperor has taken ill!

Rumours spread through the palace. Many stopped bowing to the first minister. The court poet wrote a welcoming ode to the crown prince.

The best doctors, pale with fear, bowed and apologetic, examined the Emperor, whispered in horror, and the chief doctor, falling at his feet, exclaimed:

- Permit me to tell the whole truth, oh consolation of humanity?

- Speak! - the Emperor allowed.

- Of course, you are the Son of Heaven! - said the chief doctor. - But by your unspeakable mercy, you sometimes come down to the people and allow yourself to contract such diseases as ordinary mortals may suffer. Today is the day of your greatest indulgence: your stomach is simply upset.

The Emperor was rather surprised:

- From what? Before bed, I drank nothing but my wetnurse's milk. Three hundred and sixty months have I reigned, and I nourish myself, as befits me, with the milk of a wetnurse. I have changed three hundred and sixty wetnurses, and nothing like this has ever happened to me. Who overfed my wetnurse, and with what?

The strictest investigation was immediately carried out - but it turned out that the wetnurse had eaten only the finest dishes, and that, moreover, they had been given to her in moderation.

- It could be that she fell ill naturally. What were those who chose her for me thinking? - the Emperor was angry. - Execute the culprits.

The culprits were executed, but according to the most thorough investigation, it turned out that they were blameless: the wetnurse was completely healthy.

Then the Emperor ordered his wetnurse to be brought to him.

- Why did your milk spoil? He asked sternly.

- Son of Heaven, Benefactor of the Universe. Justice Itself, - answered the trembling wetnurse, - you are seeking the truth not where it is hidden. Nobody overfed me and I myself did not overeat. Likewise, I have never been ill in my life. My milk has spoiled because I think about everything that is going on at home.

- What is going on at your home? - asked the Emperor.

- I come from the province of Pe-Chi-Li, which you saw it fit to to entrust to the mandarin Ki-Ni. He does terrible things, oh Joy of the Universe. He sold our house and took the money for himself, because we could not give him the bribe he demanded. He took my sister as his concubine, and beheaded her husband so that he would not complain. In addition, he executed my father and imprisoned my mother. In general, he treated us the way he treats everyone. When I remember all of this, I cry - and that's why my milk spoils.

The Emperor was terribly angry.

- Summon all of my advisers to me!

And when they appeared, he strictly ordered:

- Find me an honest man now.

They found one.

And the Emperor said to him:

- Mandarin Ki-Ni, to whom I entrusted governing the province of Pe-Chi-Li, does such things that even my wetnurse's milk has spoiled. Go there now, carry out the most severe investigation in my name and report to me. And there is to be no concealment, you will neither add nor take away - so that the truth reflects in your words, like the moon appears in a calm, sleeping lake. You know, on a quiet night, when you look and cannot make out: where is the real moon and where is the reflection - in the lake or in the sky? Go forth now.

The honest man set out at once with a hundred of the most skilled investigators.

The mandarin, terrified to death, seeing that his position was unenviable, offered the messenger a hefty bribe.

But the honest man, having been sent by the Emperor himself, did not dare to take it.

The moon in heaven went through three changes, and the honest man and his hundred investigators were still unraveling the matter. Finally, when the fourth moon was nearing its end, the honest man came to the Emperor, fell at his feet and asked:

- Is it the whole truth that I shall tell, Justice Itself?

- All of it! - ordered the Emperor.

- If there is in the whole world, which belongs to you and to no one else, a corner worthy of tears, then it is the province of Pe-Chi-Li, the Son of Heaven. Truly, it could bring tears to the eyes of the most malevolent dragon. All over the province, everyone begs for alms, and there is no one to give alms, because everyone is begging for them. Houses are ravaged, rice fields lie fallow. And all of this is not because the inhabitants are uncommonly lazy, but because the Ki-Ni mandarin takes everything from them, no matter what they earn. There is no justice in the courts, and only he who gives the most to the mandarin is in the right. As for good morals, they have forgotten about them entirely. Just as soon as Ki-Ni lays his eyes on a pretty girl, he takes her for himself and away from her father and mother. And not only girls - he takes even married women.

- How could this be! - exclaimed the Emperor.

- Not only the moon, but the sun could vouch for the truth of my words! - replied the honest man. - Everything I have spoken is true. The ornament of your power, the cream of your provinces - the province of Pe-Chi-Li is dying!

The Emperor took his head in his hands, showing deep sorrow.

- I'll have to think about what to do! I'll have to think!

He ordered all of his courtiers to wait in the great hall, while he himself, retiring to his chamber, walked from corner to corner in contemplation. The whole day went by. Before nightfall, the Emperor appeared before his courtiers, solemnly sat down under the canopy and, when all of them fell face down to the ground, announced:

- The province of Pe-Chi-Li is in a terrible state, and therefore We proclaim: never again shall wetnurses be taken from there for the Emperor.

And ever since then, they have never once taken a wetnurse from the province of Pe-Chi-Li for the Emperor.

"Pest" v. 0xFE.

This is a continuation of "Pest" v. 0xFF.

Edit: A note for the innocent -- this series of articles concerns an algorithm! It is not, at the time of writing, fully implemented anywhere! There is now a prototype! (Thank you, thimbronion!) Or even entirely complete.

Edit: There is a working draft of v. 0xFD. (Note: it may change without warning!)

Edit: Various problems and discussions.

Edit: billymg made a mirror of this page.

The document (very much a work in progress!) is available as a vtree. You will need:

Add the above vpatches and seals to your V-set, and press to pest_spec_FE.kv.vpatch.

To "compile" the document to HTML, run make (this requires the "markdown2" utility.)

Please submit any proposed changes to this spec in vpatch form.

The full text is reproduced below, and any reader able to spare the time for a careful reading is invited to comment!
This version is obsolete! Please read v. 0xFD !

Click here for a printer-friendly version of this text.


"Pest" v. 0xFF.

What follows is a long-promised detailed specification of a DDOS-proof, CPU-inexpensive, ciphered-authenticated peer-to-peer communication protocol.

I would like to thank Thimbronion for implementing a large subset of this spec already despite not having seen it yet!

The document (very much a work in progress!) is available as a vtree. You will need:

Add the above vpatches and seals to your V-set, and press to pest_spec.kv.vpatch.

To "compile" the document to HTML, run make (this requires the "markdown2" utility.)

Please submit any proposed changes to this spec in vpatch form.

The full text is reproduced below, and any reader able to spare the time for a careful reading is invited to comment!
This version is obsolete! Please read v. 0xFE !

Edit: click here for a printer-friendly version of this text.

Repair of sabotaged mathematical glyphs in certain Linux distros.

Recently, I received a tip that some Linux users saw rubbish in place of certain Unicode glyphs appearing in the FFA series (in particular, in Ch. 14.)

If the following two rows of symbols do not appear recognizably-identical in your WWW browser:


← → ⇠ ⇢ ⌊ ⌋

utf test

... the following fix is likely to cure:


... on Gentoo-derived Linuxen:

emerge media-fonts/dejavu


... on Arch-derived Linuxen:

pacman -S ttf-dejavu


Anyone who has witnessed the above dysfunction, or knows the details of when and why this astonishing breakage was introduced, the systems affected, the identities of the perpetrators, or anything else pertinent -- is invited to write in.

Defeating Vendor Lock-Ins: "Liebert GXT4."

Certain double-conversion UPS units by Liebert (in my case -- a GXT4-2000RT120) appear to contain a boobytrap whereby off-the-shelf (needs two 12v/80mm) high-efficiency/quiet fans will not operate -- the boot will abort with a Fan Out of Order error.

The pill: install a 100 ohm wirewound resistor in parallel with the front fan connector (on each side, will need to snip off the original plug and solder to a standard PC 4-pin extender.) This defeats the current sensor which expects to see the original (~0.4A max) fans; but not to the extent where the death of one or both of the new fans will fail to trigger the alarm.

The 2021 Rack.

Thank you, 2020 Rack Service subscribers! I would like to wish all of you another great year of hygienic Linux on high-quality non-Fritzed iron!

The 2021 service agreement is identical to the 2020 one. With the unfortunate exception that Deedbot Wallet is closing down on Dec. 25, 2020, and hence will no longer be accepted as payment starting at Dec. 23, 2020. On that day and henceforth, all subscribers will need to use traditional Bitcoin.

Any and all communication concerning the details of a payment, is to take place strictly via PGP. And please remember to clearsign each message prior to encrypting; and to verify the signature on any reply of mine after decryption.

Additionally, the upstream monthly cost has increased by 20 $, and this increase is reflected in the 2021 Price Calculator.

Since the start of the service in November 2019, there were exactly four days during which outages exceeding five minutes in length were reported. All four were attributable to malfunctions in upstream equipment.

All affected subscribers have been compensated, per the agreed-upon scheme (one day of pro-rated service per each such day; under the condition that a subscriber must publicly report the outage.) This is reflected in the table below.

Expiration dates of all current paid subscriptions are listed below (in strictly chronological order, and without identifying the subscriber) :

Subscriber Iron Effective Through
A Dulap-128-4TB 22 Dec 2020
B RK-256 10 Feb 2021
C Dulap-128-4TB 13 Feb 2021
D Colo (1U; 100W; 1 IP) 03 Apr 2021
E APU3-2TB 16 Aug 2021
F RK-128 21 Nov 2021

If you are a subscriber, you should be able to easily find yourself in this list.

"Finite Field Arithmetic." Chapter 21A-Ter: Fix for a False Alarm in Ch.14; "Litmus" Errata.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.


You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch21a_ter_ch14_ch20_errata.kv.vpatch.

You should end up with the same directory structure as previously.

As of Chapter 21A-Ter, the versions of Peh and FFA are 250 and 199, respectively.

Now compile Peh:

cd ffacalc
gprbuild

But do not run it quite yet.


This Chapter concerns fixes for several flaws recently reported by a careful Finnish reader known only as cgra. Thank you, cgra!

Let's begin with his first find: a false alarm bug in Chapter 14B's implementation of Barrett's Modular Reduction. (Note that the proofs given in Ch.14A and Ch.14A-Bis presently stand; the bug exists strictly in the Ada program.)

Recall Step 5 of the algorithm given in Ch.14A :

For each new input X, to compute the reduction R := X mod M:

  1. Xs := X >> JM
  2. Z  := Xs × BM
  3. Zs := Z >> SM
  4. Q  := Zs × M
  5. R  := X - Q
  6. R  := R - M, C := Borrow
  7. R  := R + (M × C)
  8. R  := R - M, C := Borrow
  9. R  := R + (M × C)
  10. R  := R - (R × DM)
  11. R is now equal to X mod M.

... and its optimization, as suggested by the physical bounds proof of Ch.14A-Bis :

2WM
Ignore X
WM - L WM + L
2WM
- Ignore Q
WM - L WM + L
= R
WM + L

... and finally, its implementation in Chapter 14B :

fz_barr.adb:

   -- Reduce X using the given precomputed Barrettoid.
   procedure FZ_Barrett_Reduce(X          : in     FZ;
                               Bar        : in     Barretoid;
                               XReduced   : in out FZ) is
 
   ..............................
 
      -- R is made one Word longer than Modulus (see proof re: why)
      Rl      : constant Indices := Ml + 1;
 
   ..............................
 
      -- Barring cosmic ray, no underflow can take place in (4) and (5)
      NoCarry : WZeroOrDie := 0;
 
   begin
 
   ..............................
 
      -- (5) R  := X - Q (we only need Rl-sized segments of X and Q here)
      FZ_Sub(X => X(1 .. Rl), Y => Q(1 .. Rl),
             Difference => R, Underflow => NoCarry);
 
   ..............................

Even though we had demonstrated that Q ≤ X, the prohibition of a nonzero subtraction borrow in (5) is fallacious.

To illustrate: this Tape, on a 256-bit run of Peh :

  .1 .FF LS .1 .3 MX # QY

... will not print the expected answer to the given modular exponentiation, i.e.:

  0000000000000000000000000000000000000000000000000000000000000002

... with a Verdict of Yes; but instead will print nothing, and yield a Verdict of EGGOG. Specifically, Peh will halt at (5) via a Constraint_Error (range check failed), when the range of NoCarry's WZeroOrDie type is violated by an assignment of 1.

This is because -- early in this modular exponentiation's sequence of Barrett reductions -- and immediately prior to (5) :

X == 0x40000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 
Q == 0x3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

... but what will be actually computed in (5) is X(1 .. Rl) - Q(1 .. Rl), i.e.:

  0x00000000000000000000000000000000000000000000000000000000000000000000000000000000 -
  0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
 
=
 
  1 (Underflow == 1)

... that is, the borrow bit is legitimately 1, in this and in a number of other readily-constructed cases. The constraints we have demonstrated for X, Q, and R do not imply that a borrow will never occur in the subtraction at (5). Therefore, the intended cosmic ray detector is strictly a source of false alarms, and we will remove it:

fz_barr.adb:

   -- Reduce X using the given precomputed Barrettoid.
   procedure FZ_Barrett_Reduce(X          : in     FZ;
                               Bar        : in     Barretoid;
                               XReduced   : in out FZ) is
 
   ..............................
 
      -- Borrow from Subtraction in (5) is meaningless, and is discarded
      IgnoreC : WBool;
      pragma Unreferenced(IgnoreC);
 
   begin
 
   ..............................
 
      -- (5) R  := X - Q (we only need Rl-sized segments of X and Q here)
      FZ_Sub(X => X(1 .. Rl), Y => Q(1 .. Rl),
             Difference => R, Underflow => IgnoreC); -- Borrow is discarded
 
   ..............................

... and that's it.


Cgra's second find concerned the Ch.20 demo script, Litmus. He had discovered that two mutually-canceling bugs exist in the program. Specifically, in :

litmus.sh:

 
..................
 
# Hashed Section Length
get_sig_bytes 2
turd+=$r
hex_to_int
sig_hashed_len=$r
 
# Hashed Section (typically: timestamp)
get_sig_bytes $sig_hashed_len
turd+=$r
sig_hashed=$r
 
# Unhashed Section Length
get_sig_bytes 1
hex_to_int
sig_unhashed_len=$r
 
# Unhashed Section (discard)
get_sig_bytes $sig_unhashed_len
 
..................
..................
 
# RSA Packet Length (how many bytes to read)
get_sig_bytes 1
hex_to_int
rsa_packet_len=$r
 
# The RSA Packet itself
get_sig_bytes $rsa_packet_len
rsa_packet=$r
 
# Digest Prefix (2 bytes)
get_sig_bytes 2
digest_prefix=$r
 
..................

... the Unhashed Section Length is erroneously treated as a 1-byte field, whereas in reality the GPG format gives 2 bytes. The script only worked (on all inputs tested to date) on account of the presence of the superfluous routine (RSA Packet reader, which remained from an early version of the demo!); in all of the test cases to date, the second byte of the Unhashed Section Length (and the unhashed section in its entirety, for so long as it does not exceed 255 bytes in length -- which it appears to never do) are consumed by get_sig_bytes $rsa_packet_len.

I am almost pleased that I had made this mistake; it is in fact a better illustration of programs which operate correctly despite erroneous logic -- as well as the unsuitability of shell script as a language for nontrivial tasks -- than anything I could readily unearth in the open literature.

And the fix is readily obvious :

 
..................
 
# Hashed Section Length
get_sig_bytes 2
turd+=$r
hex_to_int
sig_hashed_len=$r
 
# Hashed Section (typically: timestamp)
get_sig_bytes $sig_hashed_len
turd+=$r
sig_hashed=$r
 
# Unhashed Section Length
get_sig_bytes 2
hex_to_int
sig_unhashed_len=$r
 
# Unhashed Section (discard)
get_sig_bytes $sig_unhashed_len
 
..................
..................
 
# Digest Prefix (2 bytes)
get_sig_bytes 2
digest_prefix=$r
 
..................

I also incorporated cgra's earlier suggestion regarding error checking. Thank you again, cgra!

And that's it for Litmus, presently.


~ The next Chapter, 21B, will (yes!) continue the Extended-GCD sequence of Chapter 21A. ~

"Cryostat" Genesis.

Cryostat is a Fits-in-Head minimal (~700 LOC, including comments) static library for adding safe and reliable persistent storage to Ada data structures. It makes use of memory-mapped disk I/O via the MMap() system call, present in Linux (kernel 2.4 and newer) and all compatible operating systems. This mechanism permits efficient work with persistent data structures substantially larger than a machine's physical RAM.

AdaCore offers their own implementation of MMap() support in GNATColl. However, IMHO their item is an atrocity, in many ways very similarly to their GNAT Sockets pile of garbage (the near-unusability of the latter is what prompted me to write Ada-UDP in 2018.) AdaCore's MMap library is not only a behemoth replete with e.g. special cases for MS-Win support, but its use is entirely incompatible with safety-restricted compilation profiles.

Cryostat, on the other hand, does NOT require enabling the use of pointerism, unchecked conversions, the secondary stack, heap allocators, or other bulky and objectionable GNAT features, in the calling program. It does however require finalization to be enabled. This is used to guarantee the safe sync-to-disk and closing of the backing MMap when the data structure it contains goes out of scope.

Let's proceed to building Cryostat and its included demo program.

You will need:

Add the above vpatch and seal to your V-set, and press to cryostat_genesis.kv.vpatch.

Now compile the included CryoDemo:

cd demo
gprbuild

... this will build both the demo and the library.

But do not run it quite yet.


First, let's see what this demo consists of :

cryodemo.adb:

with Interfaces;  use Interfaces;
with ada.text_io; use  ada.text_io;
 
with Cryostat;
 
 
procedure CryoDemo is
 
   -- Path on disk for the example Cryostat backing file :
   File_Path : constant String := "cryotest.bin";
 
   -- Now, let's define an example data structure to place in a Cryostat :
 
   -- Example payload array's element type: byte.
   subtype ADatum is Unsigned_8;
 
   -- Let's make it 512MB - far bigger than a typical stack, to demonstrate
   -- that it will in fact reside in the Cryostat, rather than on the stack :
   A_MBytes : constant Unsigned_32 := 512;
 
   -- Example payload: an array.
   subtype ARange is Unsigned_32 range 0 .. (A_MBytes * 1024 * 1024) - 1;
 
   -- Complete the definition of the payload data structure :
   type TestArray is array(ARange) of ADatum;
 
   -- Declare a Cryostat which stores a TestArray :
   package Cryo is new Cryostat(Form     => TestArray,
                                Path     => File_Path,
                                Writable => True,  -- Permit writing
                                Create   => True); -- Create file if not exists
 
   -- Handy reference to the payload; no pointerisms needed !
   T : TestArray renames Cryo.Item;
 
   -- T can now be treated as if it lived on the stack :
 
begin
 
   Put_Line("T(0)    before :  " & ADatum'Image(T(0)));
   Put_Line("T(Last) before :  " & ADatum'Image(T(T'Last)));
 
   -- Increment each of the elements of T :
   for i in T'Range loop
      T(i) := T(i) + 1;
   end loop;
 
   Put_Line("T(0)    after  :  " & ADatum'Image(T(0)));
   Put_Line("T(Last) after  :  " & ADatum'Image(T(T'Last)));
 
   --- Optional, finalizer always syncs in this example
   --  Cryo.Sync;
 
   --- Test of Zap -- uncomment and get zeroized payload every time :
   --  Cryo.Zap;
 
   Put_Line("OK.");
 
end CryoDemo;

In the demo, we define TestArray -- a data structure consisting of a 512 megabyte array, and invoke Cryostat to create a persistent disk store for it. (When the program is first run, the array -- instantiated as T -- will contain only zeros.) After this, we increment each byte in T, and terminate. When, in the end, T goes out of scope, the finalizer kicks in and properly syncs the payload to disk. Thus, T behaves exactly like a stack-allocated variable, with the exception of the fact that its contents are loaded from disk upon its creation (on the second and subsequent runs of the program) and synced to disk upon its destruction (or if Sync were to be invoked.)

Observe that the calling code is not required to perform any file-related manipulations, or to juggle memory; all of the necessary mechanisms (including error handling) are contained in the Cryostat static library.

When we first execute the demo:

./bin/cryodemo

The following output will appear:

T(0)    before :   0
T(Last) before :   0
T(0)    after  :   1
T(Last) after  :   1
OK.

If we run it again, will see the following:

T(0)    before :   1
T(Last) before :   1
T(0)    after  :   2
T(Last) after  :   2
OK.

... and so forth. cryotest.bin, the backing file used by the Cryostat in the demo, will consist of 512 megabytes of byte value N, where N is the number of times the demo has executed. For example, after the first run, a hex dump:

hexdump -C cryotest.bin

... will yield:

00000000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................|
*
20000000

Let's use the traditional strace tool to confirm that the demo behaves as specified:

strace ./bin/cryodemo

The following output will appear:

execve("./bin/cryodemo", ["./bin/cryodemo"], [/* 84 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x644798)       = 0
set_tid_address(0x6447d0)               = 3660
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGABRT, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGFPE, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGBUS, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
sigaltstack({ss_sp=0x644a80, ss_flags=0, ss_size=16384}, NULL) = 0
rt_sigaction(SIGSEGV, {0x41c360, [], SA_RESTORER|SA_STACK|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
open("cryotest.bin", O_RDWR|O_CREAT, 0666) = 3
ftruncate(3, 536870912)                 = 0
mmap(NULL, 536870912, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f3bcc575000
writev(1, [{"", 0}, {"T(0)    before :   0\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(Last) before :   0\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(0)    after  :   1\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(Last) after  :   1\n", 21}], 2) = 21
writev(1, [{"", 0}, {"OK.\n", 4}], 2)   = 4
msync(0x7f3bcc575000, 536870912, MS_SYNC) = 0
munmap(0x7f3bcc575000, 536870912)       = 0
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

There are a few minor knobs that still ought to be added to Cryostat (See README.TXT) but even as it presently stands, it is already sufficient for basic experimentation with clean and compact databases implemented wholly in Ada.


~ To Be Continued. ~


"Finite Field Arithmetic." Chapter 21A-Bis: Fix for Lethal Flaw in Ch.15's Greatest Common Divisor.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical "Open Sores" abomination, in that -- rather than trusting the author blindly with their lives -- prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatches and seals to your V-set, and press to ffa_ch21a_bis_fix_ch15_gcd.kv.vpatch.

You should end up with the same directory structure as previously.

As of Chapter 21A-Bis, the versions of Peh and FFA are 250 and 200, respectively.

Now compile Peh:

cd ffacalc
gprbuild

But do not run it quite yet.


1. The Boojum.


strap it down

This Chapter repairs a lethal bug in Chapter 15's "Constant-Time Iterative GCD" algorithm and its implementation.

While working on Ch. 21B, I found that the algorithm used in Ch. 15 is subtly broken on a very small but readily-constructible subset of its input domain.

My original (unpublished -- it was the nominal answer to Ch. 15 Exercise 2...) proof for this algo was fallacious: the calculation as formulated there is superficially correct, but it is not guaranteed to converge in Bitness(A) + Bitness(B) iterations! It is possible to construct pairs of inputs where we end up with an incorrect GCD result, and we will discuss several examples of these in this Chapter.

Cut corners are to be paid for, and I will admit that I would rather pay for this one now, with a Vpatch, a new proof, and an apology to readers, than later -- when I put FFA to use in anger, and offer BTC bounties for reporting defects.

Several supposedly-attentive FFA students at various times reported that they have eaten Ch. 15; and at least one of last year's readers even signed it. However, no one reported the defect, which I ended up uncovering while testing the final version of the Ch. 21 Extended-GCD against Ch. 15's conventional one. The breakage is triggered by a small subset of the possible input space where one or both input FZs to GCD consist almost entirely of ones (i.e. most of the bits are set.) No such case turned up in Ch. 15's randomly-generated test battery, reinforcing Dijkstra's famous observation that testing can reveal the presence of bugs, but never their absence.

Only proof can be relied on, and the proof had better be correct.


Let's proceed with two concrete examples of inputs which break the Ch. 15 GCD:

Ch. 15 GCD Counter-Example No. 1:

This Tape:

  .FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB
  .FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB
  G
  #

... produces 4, when fed into a 256-bit-FZ run of Peh. But the two numbers are in fact co-prime, i.e. their actual GCD is 1.

Click here to see what happens when this Tape runs.


And this Tape:

Ch. 15 GCD Counter-Example No. 2:

  .FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEB
  .FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB
  G
  #

... produces 0. Which is not only not the correct answer (again 1; the given numbers are yet again co-prime) but is in fact an impossible output of any GCD invocation apart from GCD(0,0) -- and is true there only by convention.

Click here to see what happens when this Tape runs.

It is possible to generate further counter-examples, but these are quite enough to demonstrate that the Ch. 15 GCD algorithm does not work as specified.


Now let's review the broken algo:


Chapter 15 Algorithm 3: Broken Constant-Time Iterative GCD.


For FZ integers A and B:

  1. Twos := 0
  2. Iterate Bitness(A) + Bitness(B) times:
  3.    Ae   := 1 - (A AND 1)
  4.    Be   := 1 - (B AND 1)
  5.    A    := A >> Ae
  6.    B    := B >> Be
  7.    Twos := Twos + (Ae AND Be)
  8.    D    := |A - B|, C ← Borrow
  9.    B    := {C=0 → B, C=1 → A}
  10.    A    := D
  11. A := A << Twos
  12. A contains the GCD.


put it in only with tongues

To those who have read Ch. 21A, the defect may be already apparent.

Specifically, when we introduce the Iterate Bitness(A) + Bitness(B)... condition in place of the original non-constant-time algo's Iterate until B = 0..., steps 8...10 become erroneous, in two respects.

First: in a run which requires exactly Bitness(A) + Bitness(B) - 1 iterations to reach the point where A = B, A will end up equal to zero, while the GCD ends up permanently trapped in B.

The root of the problem is that D := |A - B| becomes the new value of A even when D = 0 (i.e. A and B were equal.) For virtually the whole input space of the algo, this state is temporary: the final GCD will move into A in the very next iteration, and stays there. But if there is no remaining iteration, we end up with the invalid output 0. This is illustrated by Ch. 15 GCD Counter-Example No. 2.

Second: the logic of steps 8...10 permits a condition where one or more rounds of the iteration execute without reducing the bitness of either A or B. Enough of these, and the algorithm in fact will not converge. This is illustrated by Ch. 15 GCD Counter-Example No. 1.

This time, the problem is that the subtractive step is performed without demanding (as we did starting with Algorithm 2 of Ch. 21A) that both A and B be odd. This can result in an iteration where we in fact get no closer to convergence than we were at the preceding one.

To understand exactly how the Ch.15 algo is subtly broken, let's pretend that we had a 8-bit FZ (for didactic purposes strictly; the minimal FZ bitness in FFA is 256); and then construct and illustrate a simple example, where two 8-bit quantities -- which, theoretically, would be expected to converge to their correct GCD (1) in 16 iterations -- fail to do so.


For clarity, this and all subsequent worked examples will show binary representations of A and B.

Sad_Ch15_GCD(0xfb, 0xdb) :

i Ai Bi AEven : BEven : Twos A ← |A - B| B ←?← A Ai+1 Bi+1
A ← A/2 B ← B/2
1 11111011 11011011 0 100000 11011011 100000 11011011
2 100000 11011011 10000 0 11001011 10000 11001011 10000
3 11001011 10000 1000 0 11000011 1000 11000011 1000
4 11000011 1000 100 0 10111111 100 10111111 100
5 10111111 100 10 0 10111101 10 10111101 10
6 10111101 10 1 0 10111100 1 10111100 1
7 10111100 1 1011110 0 1011101 1 1011101 1
8 1011101 1 0 1011100 1 1011100 1
9 1011100 1 101110 0 101101 1 101101 1
10 101101 1 0 101100 1 101100 1
11 101100 1 10110 0 10101 1 10101 1
12 10101 1 0 10100 1 10100 1
13 10100 1 1010 0 1001 1 1001 1
14 1001 1 0 1000 1 1000 1
15 1000 1 100 0 11 1 11 1
16 11 1 0 10 1 10 1
17 10 1 1 0 0 1 0 1
18 0 1 0 0 1 0 1 0
GCD (Incorrect) 10

The iterations marked in red, would have been necessary for successful convergence, but never execute; those marked in yellow, do not reduce the bitness of A or B.


The mine is in the fact that, if, at step 8 of the algo:

  1.    D    := |A - B|, C ← Borrow
  2.    B    := {C=0 → B, C=1 → A}
  3.    A    := D

... it so happens that A is even and B is odd, then D (and consequently the value of A in the next iteration) will be odd. And, if A ≥ B, then B will remain odd in the next iteration; and we will get an iteration cycle where the total bitness of A and B will not decrease, unless A and B happen to be close enough in value for the subtraction in step 8 to decrease it.

Thus we have seen how to construct a condition where carrying out an iteration of Ch. 15's GCD does not reduce the bitness of either A or B.


The interesting bit is that in virtually all possible inputs to GCD, this flaw does not lead to an ultimately incorrect output, because -- given sufficient iterations -- the correct answer is obtained. But in a very small number of possible input pairs, convergence will not be reached inside 2 × FZ_Bitness iterations.

It appears to be virtually impossible to arrive at the fatal condition by chance.

This kind of thing could be an ideal cryptological boobytrap, if GCD were in fact a key element in any known cryptosystem.

But AFAIK, there is no such cryptosystem. On top of this, from a cryptological point of view, the broken GCD "fails safe", i.e. it can be coaxed into reporting two co-prime inputs as being non-coprime, but not the opposite.

And, fortunately (or unfortunately, from the POV of quickly turning up possible defects) FFA does not currently use conventional-GCD inside any other internal algorithm. But let's consider where we have thus far made use of GCD.

Apart from the original Ch.15 tests, the only two places in the series where we have used GCD were the primorial generator demo depicted in Ch. 18C and the prime generator demo which used it.

The primorial generator was unaffected -- it apparently produces correct primorials for all valid FZ Widths from 256 to -- at least -- 16384.

The prime generator was also unaffected. In principle, a defective GCD would result, there, in a... slightly slower prime generator, which would attempt a larger number of doomed Miller-Rabin tests; GCD was used there strictly as speedup pre-filter for the intake candidates. But there was no detectable decrease in the number of M-R-failed runs after I put the corrected GCD into service.

The Ch. 21A material, interestingly, is also unaffected: the initially non-constant-time algo from the middle of Ch. 15 is given there as a starting point. And that algo (and all of the subsequent extensions offered) was... correct. Only the constant-time version which was used to actually write the GCD routine in Ch. 15, was not...

The bullet, it could be said, went through the hat, but not the head. Nevertheless, it is a serious defect, and will be corrected in this Chapter. And this time, the full proof of convergence will be given.


2. The Cure.


Let's rewrite the GCD algo, so that a reduction of the bitness of A, B, or both, is in fact guaranteed at every iteration.

First, we'll write a readability-optimized schematic version (expressed with branching logic) of the correct algorithm :


Chapter 21A-Bis Algorithm 1: Schematic Version of Corrected Constant-Time Iterative GCD :


For FZ integers A and B:

  1. Twos := 0
  2. Iterate Bitness(A) + Bitness(B) - 1 times:
  3.    D := |A - B|
  4.    If Odd(A) and Odd(B) :
  5.       If A < B :
  6.          B := A
  7.       A := D
  8.    If Even(A) and Even(B) :
  9.       Twos := Twos + 1
  10.    If Even(A):
  11.       A := A / 2
  12.    If Even(B):
  13.       B := B / 2
  14. A := Bitwise-OR(A, B)
  15. A := A * 2Twos
  16. A contains the GCD.


Second, and equivalently, a properly constant-time formulation of the above :

Chapter 21A-Bis Algorithm 2: Corrected Constant-Time Iterative GCD :


For FZ integers A and B:

  1. Twos := 0
  2. Iterate Bitness(A) + Bitness(B) - 1 times:
  3.    OO   := (A AND 1) AND (B AND 1)
  4.    D    := |A - B|, C ← Borrow
  5.    B    := {(OO AND C)=0 → B, (OO AND C)=1 → A}
  6.    A    := {OO=0 → A, OO=1 → D}
  7.    Ae   := 1 - (A AND 1)
  8.    Be   := 1 - (B AND 1)
  9.    A    := A >> Ae
  10.    B    := B >> Be
  11.    Twos := Twos + (Ae AND Be)
  12. A := A OR B
  13. A := A << Twos
  14. A contains the GCD.


Now, let's show, step by step, that Algorithm 1 and Algorithm 2 are arithmetically-equivalent.

Algorithm 1 Operation Description Algorithm 2 Operation
  1. Twos := 0
Initialize the common-power-of-two counter to 0.
  1. Twos := 0
  1. Iterate Bitness(A) + Bitness(B) - 1 times:
Begin a loop with exactly Bitness(A) + Bitness(B) - 1 iterations. In FFA, this is definitionally equivalent to 2×FZ_Bitness(N) - 1, where N is any of the inputs A, B or the output GCD.
  1. Iterate Bitness(A) + Bitness(B) - 1 times:
Start of Iteration
  1. D := |A - B|
Compute the absolute value of the difference A - B. In the constant-time Algorithm 2, we save the carry bit to C, which will trigger subsequent operations predicated on the condition A < B.
  1. D := |A - B|, C ← Borrow
  1. If Odd(A) and Odd(B) :
Determine whether both A and B are presently odd.
  1. OO := (A AND 1) AND (B AND 1)
  1.     If A < B :
  2.         B := A
Assign A to B if A and B were both odd, and A < B.
  1. B := {(OO AND C)=0 → B, (OO AND C)=1 → A}
  1.     A := D
Assign D to A if A and B were both odd.
  1. A := {OO=0 → A, OO=1 → D}
  1. If Even(A) and Even(B) :
  2.     Twos := Twos + 1
If both A and B are presently even, increment the common power-of-two counter.
  1. Ae := 1 - (A AND 1)
  2. Be := 1 - (B AND 1)
  1. Twos := Twos + (Ae AND Be)
  1. If Even(A):
  2.     A := A / 2
If A was found to be even, divide it by two. In Algorithm 2, this is accomplished via a Right Shift by 1.
  1. A := A >> Ae
  1. If Even(B):
  2.     B := B / 2
If B was found to be even, divide it by two. In Algorithm 2, this is accomplished via a Right Shift by 1.
  1. B := B >> Be
End of Iteration
  1. A := Bitwise-OR(A, B)
Normally, B will contain the intermediate result after convergence. However, when calculating GCD(N, 0), where N > 0, it will be found in A. The other variable will always equal 0. Hence, we obtain the final result by performing a bitwise-OR of A, B. It is assigned to A.
  1. A := A OR B
  1. A := A * 2Twos
Reintroduce the common-power-of-two factor into the intermediate GCD result. If there was none (i.e. one or both inputs were odd to begin with) this will be 20, i.e. 1, and then has no effect. In Algorithm 2, the multiplication is accomplished via a constant-time Left Shift.
  1. A := A << Twos
  1. A contains the GCD.
A now contains the actual GCD of the two input integers.
  1. A contains the GCD.
GCD is in A.

If you are entirely satisfied that Algorithm 1 and Algorithm 2 are equivalent in their effects, proceed to the proof below. For clarity, we will base it on Algorithm 1.

First, let's demonstrate that each iteration of the loop preserves the GCD of A, B :

Algorithm 1 Operation Identity
  1. D := |A - B|
  2. If Odd(A) and Odd(B) :
  3.     If A < B :
  4.         B := A
  5.     A := D

A restatement of Euclid's original :

GCD(A, B) = GCD(|A - B|, Min(A, B))

Observe that |A - B|, the new value of A, is necessarily even, while
Min(A, B), the new value of B, remains odd.

  1. If Even(A) and Even(B) :
  2.     Twos := Twos + 1

A common factor of 2 will be removed (in steps 11 and 13) from A and B, respectively, and the Twos counter is incremented, so we can multiply 2Twos back in at step 15.

GCD(2 × K, 2 × L) = 2 × GCD(K, L)

  1. If Even(A):
  2.     A := A / 2

A factor of 2 is removed from A.

GCD(2 × K, B) = GCD(K, B)

  1. If Even(B):
  2.     B := B / 2

A factor of 2 is removed from B.

GCD(A, 2 × L) = GCD(A, L)

Now, let's confirm that Algorithm 1 is in fact guaranteed to converge within at most Bitness(A) + Bitness(B) - 1 iterations. Let's exhaustively describe the effects of an iteration across the four possible combinations of parities of A and B. (Steps 8 and 9 do not affect the bitness of A or B and will be omitted.)

A is Odd, B is Odd :
Algorithm 1 Operation Effect on Bitness of A Effect on Bitness of B
  1. D := |A - B|
  2. If Odd(A) and Odd(B) :
  3.     If A < B :
  4.         B := A
  5.     A := D
No guaranteed effect (but may decrease by 1 or more bits). However, A is now necessarily even. No guaranteed effect (but may decrease by 1 or more bits). B remains odd.
  1. If Even(A):
  2.     A := A / 2
As A is guaranteed to have become even in step 7: decreases by 1 bit. None
  1. If Even(B):
  2.     B := B / 2
None None
Net Effect on Bitnesses: Decreased by at least 1 bit. May become zero if A and B had been equal. None; may decrease by 1 or more bits.

Note that in the convergence case where A=1 B=1, the above parity combination will yield A=0 B=1.

A is Odd, B is Even :
Algorithm 1 Operation Effect on Bitness of A Effect on Bitness of B
  1. D := |A - B|
  2. If Odd(A) and Odd(B) :
  3.     If A < B :
  4.         B := A
  5.     A := D
None None
  1. If Even(A):
  2.     A := A / 2
None None
  1. If Even(B):
  2.     B := B / 2
None Decreases by 1 bit, unless B = 0.
Net Effect on Bitnesses: None Decreased by 1 bit, unless B = 0.
A is Even, B is Odd :
Algorithm 1 Operation Effect on Bitness of A Effect on Bitness of B
  1. D := |A - B|
  2. If Odd(A) and Odd(B) :
  3.     If A < B :
  4.         B := A
  5.     A := D
None None
  1. If Even(A):
  2.     A := A / 2
Decreases by 1 bit, unless A = 0. None
  1. If Even(B):
  2.     B := B / 2
None None
Net Effect on Bitnesses: Decreased by 1 bit, unless A = 0. None
A is Even, B is Even :
Algorithm 1 Operation Effect on Bitness of A Effect on Bitness of B
  1. D := |A - B|
  2. If Odd(A) and Odd(B) :
  3.     If A < B :
  4.         B := A
  5.     A := D
None None
  1. If Even(A):
  2.     A := A / 2
Decreases by 1 bit, unless A = 0. None
  1. If Even(B):
  2.     B := B / 2
None Decreases by 1 bit, unless B = 0.
Net Effect on Bitnesses: Decreased by 1 bit, unless A = 0. Decreased by 1 bit, unless B = 0.

We have shown that each iteration of Algorithm 1 (and, as we have demonstrated their equivalence -- Algorithm 2) is guaranteed to reduce the bitness of A, B, or of both A and B, by at least 1 bit -- supposing we have not yet reached convergence (i.e. when nothing can be reduced because one of A or B is equal to zero, while the other is odd.)

Therefore, to compute the GCD of A, B where each is of bitness W, we in fact need at most (2 × W) - 1 iterations (i.e. to arrive at a GCD of 1, which is of bitness 1.)

Now, let's illustrate two concrete symmetric cases where the maximum permitted number of iterations is in fact required. Each of these pairs of 8-bit inputs demands 15 (i.e. ( 2 × 8 ) - 1) shots for convergence.

GCD(0x80, 0xff) :

i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1
A ← |A - B| B ← A A ← A/2 B ← B/2
1 10000000 11111111 1000000 0 1000000 11111111
2 1000000 11111111 100000 0 100000 11111111
3 100000 11111111 10000 0 10000 11111111
4 10000 11111111 1000 0 1000 11111111
5 1000 11111111 100 0 100 11111111
6 100 11111111 10 0 10 11111111
7 10 11111111 1 0 1 11111111
8 1 11111111 11111110 1 1111111 0 1111111 1
9 1111111 1 1111110 1 111111 0 111111 1
10 111111 1 111110 1 11111 0 11111 1
11 11111 1 11110 1 1111 0 1111 1
12 1111 1 1110 1 111 0 111 1
13 111 1 110 1 11 0 11 1
14 11 1 10 1 1 0 1 1
15 1 1 0 1 0 0 0 1
GCD 1

GCD(0xff, 0x80) :

i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1
A ← |A - B| B ← A A ← A/2 B ← B/2
1 11111111 10000000 1000000 0 11111111 1000000
2 11111111 1000000 100000 0 11111111 100000
3 11111111 100000 10000 0 11111111 10000
4 11111111 10000 1000 0 11111111 1000
5 11111111 1000 100 0 11111111 100
6 11111111 100 10 0 11111111 10
7 11111111 10 1 0 11111111 1
8 11111111 1 11111110 1 1111111 0 1111111 1
9 1111111 1 1111110 1 111111 0 111111 1
10 111111 1 111110 1 11111 0 11111 1
11 11111 1 11110 1 1111 0 1111 1
12 1111 1 1110 1 111 0 111 1
13 111 1 110 1 11 0 11 1
14 11 1 10 1 1 0 1 1
15 1 1 0 1 0 0 0 1
GCD 1

In both of these examples, Bitness(A)+Bitness(B) is reduced by exactly one in each iteration. And we have already demonstrated that it is impossible to construct a case -- aside from the convergence case -- where an iteration will reduce this quantity by 0. So in fact these are instances of the "worst case" number of required iterations. (Recall that we are writing a constant-time algo, and the "worst case" number of iterations will always execute.)


Now, let's illustrate what happens in the "degenerate" cases. Starting with GCD(0,0) :

GCD(0, 0) :

i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1
A ← |A - B| B ← A A ← A/2 B ← B/2
1 0 0 0 0 1 0 0
2 0 0 0 0 2 0 0
3 0 0 0 0 3 0 0
... 0 0 0 0 ... 0 0
iLast 0 0 0 0 i 0 0
GCD 0

For illustrative purposes, three iterations are shown. But the end result, without regard to FZ Width, will always be the same: 0. The Twos counter increases with each additional iteration, as both A and B remain even, but this has no effect on the final output.


Now, let's examine the case GCD(0, N) where N > 0 :

GCD(0, 3) :

i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1
A ← |A - B| B ← A A ← A/2 B ← B/2
1 0 11 0 0 0 11
2 0 11 0 0 0 11
3 0 11 0 0 0 11
... 0 11 0 0 0 11
iLast 0 11 0 0 0 11
GCD 11

It should be clear at this point that once A has become equal to 0, and B is odd, further iterations have no effect.


Lastly, let's illustrate the "interesting", from the POV of the new algo, degenerate case: GCD(N, 0) where N > 0 :

GCD(3, 0) :

i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1
A ← |A - B| B ← A A ← A/2 B ← B/2
1 11 0 0 0 11 0
2 11 0 0 0 11 0
3 11 0 0 0 11 0
... 11 0 0 0 11 0
iLast 11 0 0 0 11 0
GCD 11

In this and only this case, the intermediate result will wind up in A rather than in B, given as the subtractive step demands an odd A and odd B, and this never happens in the (N, 0) case.

This is why we need step 14: A := Bitwise-OR(A, B).

Let's conclude with a second arbitrary example of the above :

GCD(0x80, 0x0) :

i Ai Bi AOdd and BOdd : AEven : BEven : Twos Ai+1 Bi+1
A ← |A - B| B ← A A ← A/2 B ← B/2
1 10000000 0 1000000 0 1 1000000 0
2 1000000 0 100000 0 2 100000 0
3 100000 0 10000 0 3 10000 0
4 10000 0 1000 0 4 1000 0
5 1000 0 100 0 5 100 0
6 100 0 10 0 6 10 0
7 10 0 1 0 7 1 0
8 1 0 0 7 1 0
... 1 0 0 7 1 0
iLast 1 0 0 7 1 0
GCD 10000000

At this point, there is not much else left to say about the algorithm per se.


And now, let's see the implementation of the new GCD:

fz_gcd.adb:

package body FZ_GCD is
 
   -- Find Greatest Common Divisor (GCD) of X and Y.
   -- Note that by convention, GCD(0, 0) = 0.
   procedure FZ_Greatest_Common_Divisor(X      : in  FZ;
                                        Y      : in  FZ;
                                        Result : out FZ) is
 
      -- Widths of X, Y, and Result are equal
      subtype Width is Word_Index range X'Range;
 
      -- Working buffers for GCD computation, initially equal to the inputs
      A      : FZ(Width) := X;
      B      : FZ(Width) := Y;
 
      -- Evenness (negation of lowest bit) of A and B respectively
      Ae, Be : WBool;
 
      -- Common power-of-2 factor: incremented when Ae and Be are both 1
      Twos   : Word := 0;
 
      -- This flag is set when A and B are BOTH ODD
      OO     : WBool;
 
      -- |A - B|
      D      : FZ(Width);
 
      -- This flag is set iff A < B
      A_lt_B : WBool;
 
   begin
 
      -- To converge, requires number of shots equal to (2 * FZ_Bitness) - 1:
      for i in 1 .. (2 * FZ_Bitness(X)) - 1 loop
 
         -- Whether A and B are currently BOTH ODD :
         OO := FZ_OddP(A) and FZ_OddP(B);
 
         -- D := |A - B|
         FZ_Sub_Abs(X => A, Y => B, Difference => D, Underflow => A_lt_B);
 
         -- IFF A,B both ODD, and A < B : B' := A ; otherwise no change :
         FZ_Mux(X => B, Y => A, Result => B, Sel => OO and A_lt_B);
 
         -- IFF A,B both ODD: A' := |A - B| ; otherwise no change :
         FZ_Mux(X => A, Y => D, Result => A, Sel => OO);
 
         -- If A is now EVEN: A := A >> 1; otherwise no change
         Ae := 1 - FZ_OddP(A);
         FZ_ShiftRight(A, A, WBit_Index(Ae));
 
         -- If B is now EVEN: B := B >> 1; otherwise no change
         Be := 1 - FZ_OddP(B);
         FZ_ShiftRight(B, B, WBit_Index(Be));
 
         -- If both A and B were even, increment the common power-of-two
         Twos := Twos + (Ae and Be);
 
      end loop;
 
      -- Normally, B will contain the GCD, but in the (N,0) N > 0 case -- A.
      -- The other variable will always equal 0. Hence, take Bitwise-OR(A,B):
      FZ_Or(X => A, Y => B, Result => A);
 
      -- Reintroduce the common power-of-2 factor stored in 'Twos'
      FZ_Quiet_ShiftLeft(N => A, ShiftedN => A, Count => Indices(Twos));
 
      -- Output final result -- the GCD.
      Result := A;
 
   end FZ_Greatest_Common_Divisor;
 
end FZ_GCD;


It is very slightly more expensive than the Ch.15 routine: by one MUX (in the loop) and one FZ_Or (outside of the loop.) But this is not a tragedy. Overall it comes out to cost just about the same, after taking into account the reduced (by one) number of iterations.


Unsurprisingly, this corrected variant of GCD passes the Ch.15 test battery; properly swallows both of the Ch.15 counter-examples given earlier; and, of course, the...

Ch. 21A-Bis GCD "Worst Case" Tape for 256-bit FZ :

  .8000000000000000000000000000000000000000000000000000000000000000
  .FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  G
  #

... correctly produces the output 1, converging -- as predicted -- in the 511th and final iteration.

To generate this test for any FZ Width, all you need is:

Ch. 21A-Bis GCD "Worst Case" Tape for arbitrary FZ Width :

  .1 .0~W.1- LS
  .0~
  G
  #


~ The next Chapter, 21B, will continue the Extended-GCD sequence of Chapter 21A. ~