The Google H1 Fritz Chip.


Edit: Step-by-step replication instructions for the skeptical experimenter.


This article is a review of what I have been able to discover regarding the Google H1, aka Cr50, aka the “G Chip”, found in all Chromebooks of recent manufacture, including the Asus C101PA, my current candidate for a full delousing attempt.

To my knowledge, there has been no detailed public discussion of this NSA-imposed atrocity anywhere on the Net, aside from The Logs. This article is intended as a reference point for the aficionado, the casual explorer of pseudo-”open” hardware, and the merely curious. The Chromebooks are probably the closest thing to be had on the current market to an inexpensive, portable, and “cleanable” computer with reasonable performance.

However, the Cr50 — a recent addition to the product line — is explicitly designed to get in the way of a full “liberation”. It is an item quite similar, in purpose and scope, to Intel’s “glued on with broken glass, For Your Own Good!” on-die “ME” boobytrap.

Yet, unlike Intel, Google — in its fetishistic pseudo-openness — appears to have published at least a portion of the source for the device’s firmware, making it a somewhat more promising target for attack and demolition than Intel’s equivalent CPU-integrated turd. But we will dig deeper into this, further below. First, let’s review the “big picture”:

The Cr50 device is a classic “Fritz chip” — i.e. a hardware “policeman”, built into a computing device (typically, though not always, a consumer product), so as to specifically and deliberately act against the purchaser’s interests, by subverting the Laws of Sane Computing in these three ways:

  1. Prevention of the full control of the machine by its physical owner, typically by inhibiting attempts to install modified firmware. This practice is also known as “Tivoization”, and is often justified to the gullible under the banner of “security”.
  2. Enablement of one or more types of “NOBUS” back door (Official NSA terminology! “No One But US“, a kind of NATO Reich “Gott Mit Uns” motto.) In practice this means that the folks with the magic keys, can trivially circumvent any and all protection mechanisms on the box, locally and often remotely; while the uninitiated — including the person who bought and paid for the hardware — typically remain unaware of the backdoor’s very existence.) A Fritzed machine serves its true master — USG — first, and only secondarily serves the hapless purchaser.
  3. Last but not least: prevention of a clueful hardware owner’s attempts to “jailbreak” — to disable, remove, or circumvent the Fritz chip itself. Often there is an attempt to conceal the very existence of the mechanism. (Google is peculiar in that it is fond of deliberately, if subtly, taunting the purchaser of its pseudo-open devices with the existence of its Fritz chip.) Perpetrators will often deliberately litter the Net with disinformation regarding the nature and purpose of the Fritz chip, in an effort to discourage circumvention and spur on sales; the chumps will “buy their own cross” while there is still a semblance of choice on the market; afterwards, the choice evaporates, and only Fritz-laden products remain available. The latter process has already run its course in the x86 PC world; and is verging on completion in the low-power ARM portable market.

So, back to the Cr50: this device appears to be present in all of the currently-produced Chromebooks, and is — per the vendor’s published source — able to rewrite all firmware, under the control of an external debug snake, or other, yet-undiscovered triggers; start and stop the CPU; master the I2C bus, on which, among other things, are to be found the sound card’s microphone; upgrade its own firmware; and other interesting things that may or may not align with the machine owner’s wishes at a particular moment. Possible usage scenarios include, but are not limited to, enablement of “lol enforcement” surreptitious searches and buggings of “borrowed” machines (and this is merely one obvious scenario.)


b0

In re “glue with broken glass”, the machine owner cannot simply remove or cut the traces to the Cr50: it has been placed in control of power supply bringup; the valuable debug interface; and other essentials.

But it is the upgrade process in particular that interests me, as it is the locked gate to potentially neutering the boobytrap. But can the end user rewrite the Cr50 firmware?

Let’s begin with the disinfo! A Google employee informed me that “nobody has the cr50 key”. Is this actually true?

How about No?

From the horse’s mouth:


static const uint32_t LOADERKEY_A[RSA_NUM_WORDS + 1] = {
0xea54076f, 0xe986c871, 0x8cffffb4, 0xd7c50bda, 0x30700ee0, 0xc023a878,
0x30e7fdf8, 0x5bb0c06f, 0x1d25d80f, 0x18e181f7, 0xfbf7a8b0, 0x331c16d4,
0xeb042379, 0x4cef13ec, 0x5b2072e7, 0xc807b01d, 0x443fb117, 0xd2e04e5b,
0xcb984393, 0x85d90d9d, 0x0332dcb8, 0xd42ccacf, 0x787e3947, 0x1975095c,
0x2d523b0b, 0xf815be95, 0x00db9a2c, 0x6c08442b, 0x57a022bb, 0x9d5c84ed,
0x46a6d275, 0x4392dcf8, 0xfa6812e3, 0xe0f3a3e6, 0xc8ff3f61, 0xd518dbac,
0xbba7376a, 0x767a219a, 0x9d153119, 0x980b16f8, 0x79eb5078, 0xb869924d,
0x2e392cc2, 0x76c04f32, 0xe35ea788, 0xcb67fa62, 0x30efec79, 0x36f04ae0,
0x2212a5fc, 0x51c41de8, 0x2b0b84db, 0x6803ca1c, 0x39a248fd, 0xa0c31ee2,
0xb1ca22b6, 0x16e54056, 0x086f6591, 0x3825208d, 0x079c157b, 0xe51c15a6,
0x0dd1c66f, 0x8267b8ae, 0xf06b4f85, 0xc68b27ab, 0x31bcd5fc, 0x34d563b7,
0xc4d2212e, 0x1e770199, 0xaf797061, 0x824d4853, 0x526e18cd, 0x4bb8a0dc,
0xeb9377fe, 0x04fda73c, 0x2933f8a6, 0xe16c0432, 0x40ea1bd5, 0x9efcd77e,
0x92be9e55, 0x003c1128, 0x48442cf9, 0x80b4fb31, 0xfe1e3df3, 0x1d28e14d,
0xe99c0f9d, 0x521d38c2, 0x0082c4f1, 0xcff25d56, 0x0d3e7186, 0xe72b98f0,
0xefaa5689, 0x74051ed5, 0x6b7e7fff, 0x822b1944, 0x77a94732, 0x8d0b9aaf,
0x7a8ee958
};

const uint32_t LOADERKEY_B[RSA_NUM_WORDS + 1] = {
0xeea8b39f, 0xdfa457a1, 0x8b81fdc3, 0xb0204c84, 0x297b9db2, 0xaa70318d,
0x8cd41a68, 0x4aa0f9bb, 0xf63f9d69, 0xf0fe64b0, 0x96e42e2d, 0x5e494b1d,
0x066cefd0, 0xde949c16, 0xc92499ed, 0x92229990, 0x48ac3b1a, 0x1dfc2388,
0xda71d258, 0x826ddedf, 0xd0220e70, 0x6140dedf, 0x92bcdec7, 0xcdf91c22,
0xaa110aed, 0xc371c2f9, 0xa3fedf2a, 0xfd2c6a07, 0xe71aabce, 0x7f426484,
0x0ac51128, 0x4bab4ca2, 0x0162d0b9, 0x49fef7e3, 0xeda8664e, 0x14b92b7a,
0x0397dbb7, 0x5b9eb94a, 0x069b5059, 0x3851f46b, 0x45bbcaba, 0x0b812652,
0x7cd8b10b, 0xcaeccc32, 0x0ffd08e3, 0xfe6f0306, 0x8c02d5f7, 0xafdc4595,
0xe0edda47, 0x0cc821db, 0x50beeae5, 0xb9868c18, 0xefd2de11, 0xdfecd15c,
0xa8937a70, 0x223d9d95, 0x1b70848b, 0x54fa9176, 0x8bf012ef, 0xd37c1446,
0xf9a7ebeb, 0xbf2dfa9a, 0xdc6b8ea0, 0xe5f8bc4d, 0x539222b5, 0x192521e4,
0xe7088628, 0x2646bb56, 0x6fcc5d70, 0x3f1cd8e9, 0xae9cec24, 0xf53b6559,
0x6f091891, 0x5342fa61, 0xbfee50e9, 0x211ad58a, 0xd1c5aa17, 0x252dfa56,
0x17131164, 0x4630a459, 0x2f681f51, 0x3fb9ab3c, 0x6c8e0a70, 0xa34a868b,
0xe960e702, 0xa470d241, 0x00647369, 0xa4c25391, 0xd1926cf9, 0x5fce5488,
0xd171cb2e, 0x8a7c982e, 0xc89cbe39, 0xc0e019d8, 0x82cd1ebe, 0x68918fce,
0x5ec138fd
};

Anyone with the private factors to either RSA modulus, can reflash the Cr50 firmware trivially, via the debug cable. The vendor’s flash update utility accepts any candidate update that passes the board revision and version increment check; however, the update will be written to a temporary buffer, and RSA-signature-tested prior to being copied into the “read only” (i.e. active) partition of the flash. Got the key? reflash to your heart’s delight. No key? no update. Just like in other “Tivos”, e.g., the Apple iPhone, but in this case with an extra helping of Open Sores artificial flavouring!

But this is not even the only backdoor: there are at least two. The second one known to me thus far is the “RMA unlocker”. Anyone with access to a certain elliptic key can reset the Cr50 into a manufacturing test mode, and do whatever he likes.

Google even seems to offer an accidentally-”public” API for requesting this type of reset. Let’s try it and see what happens:


$ python cr50_rma_open.py -g -i "BOB E25-A6A-A7I-E9Q-A4R"
Running cr50_rma_open version 2
SERIALNAME: 02034029-90EBD060
DEVID: 0x02034029 0x90ebd060
testing /dev/ttyUSB0
sysinfo
Reset flags: 0x00000800 (hard)
Reset count: 0
Chip: g cr50 B2-C
RO keyid: 0xaa66150f(prod)
RW keyid: 0xde88588d(prod)
DEV_ID: 0x02034029 0x90ebd060
Rollback: 2/2/2

found device /dev/ttyUSB0
DEVICE: /dev/ttyUSB0
RMA support added in: 0.3.3
Running Cr50 Version: 0.3.4
RLZ: ASUO
Cr50 setup ok
ccd: Restricted
wp: enabled
rma_auth output:
rma_auth

ABXFG CMDAD UJFPQ 7J8MQ
UUSTG XGTRT VJ6Z5 48PWC
8AGMG T2QJ4 BT3TW 4HJVU
4XLPA SB4GE 78RSB KYEHC
RLZ: ASUO
CHALLENGE: ABXFGCMDADUJFPQ7J8MQUUSTGXGTRTVJ6Z548PWC8AGMGT2QJ4BT3TW4HJVU4XLPASB4GE78RSBKYEHC
HWID:BOB E25-A6A-A7I-E9Q-A4R
GOTO:
https://www.google.com/chromeos/partner/console/cr50reset?challenge=ABXFGCMDADUJFPQ7J8MQUUSTGXGTRTVJ6Z548PWC8AGMGT2QJ4BT3TW4HJVU4XLPASB4GE78RSBKYEHC&hwid=BOB
E25-A6A-A7I-E9Q-A4R
If the server fails to debug the challenge make sure the RLZ is
whitelisted

Sadly, the result of loading this URL was… a GMail login prompt. So I log in with a GMail account, and get:

Failed to start authentication.

Quod licet Iovi, non licet bovi! or what exactly did you expect?

And so, dear reader, if you know how to disable this landmine — or are merely interested in advancing the state of the art in vermin removal — join us on #trilema! (Ask one of the speaking folks, for voice.)

To be continued.

The secret of the “Debug Accessory Mode” Adapter.

The exact internals of Google’s proprietary “Suzy-Q” debugging device are, at the time of this writing, unknown.

However, I have found how to make an apparently-compatible device:

suzyq

We connect the USB-C “business end” into a Asus C101PA machine; the USB-B end into a reasonable Linux PC, where we then:


echo 18d1 5014 > /sys/bus/usb-serial/drivers/generic/new_id

…and /dev/ttyUSB0 … 5 , the UARTs of the RK3399 chip, appear.

Theoretically, there are also Google-particular “vendor” endpoints. But we will look at these later.

Example spew on boot.

The unfortunate bit is that the output is, evidently, molested between leaving the RK3399 and emerging from the USB-C debug controller, by the machine’s embedded controller Cr50 chip: observe, the typical reset output of Rockchip (e.g., DDR init info) is not seen in the spew.

Therefore the “debug accessory” cable can be used for kernel debugging, but not for bootloader debugging. Unless we diddle the EC controller firmware, to force it to relay UART output immediately from power-on.


Edit: Read here re: the USB endpoints.


(to be continued…)

Posted in: Cold Air, Hardware, NonLoper, SoftwareSucks by Stanislav No Comments

Open Problem: “Debug Accessory Mode” on the Asus C101PA


Edit #2: Aaaand it’s solved:


echo 18d1 5014 > /sys/bus/usb-serial/drivers/generic/new_id

triggers creation of /dev/ttyUSB0 … 5 , several of which spew console log…

Example spew on boot. (Looks like RK’s UART..?)


Edit: apparently they’re USB lines! When connected as D-/D+ through a USB B-connector, to a Linux box, we get a device that enumerates with this descriptor.


The Asus C101PA is supposedly able to bring out its UART via a specially-made USB-C cable.

Google’s WWW offers a “WinAmp playlist”-style taunting mention of this magical cable, but no schematics or even any info whatsoever regarding pinout.

A few folks on the Net hint at the possible use in this device of something called “debug accessory mode”, a feature of the USB-C standard.

So, looking at the standard:

b0

b0

It would appear that we have, at least, the “strapping” with which to trigger the desired “debug accessory mode.”

And so, tying contacts “CC” and “VCON” (i.e. CC1 and CC2, per the standard, but go and figure this out…) to ground via 5.1K resistors:

b0

Theoretically a UART should appear on SBU1 and SBU2 “side band” contacts given by the USB-C expander board.

And we plug this barbaric device into the C101PA (FWIW — in “dev mode”, though this should not make any difference whatsoever if the UART is indeed connected to the RK3399 CPU directly.)

But in actual practice — no dice. Logic analyzer shows a burst of something vaguely UART-like on SBU1, but only when the plug is inserted (and not, e.g., on boot, which is where one actually needs the debug UART!) And this vaguely UART-like signal, is not Rockchip’s 1500000-baud TTY (logic analyzer shows missing stop bits regardless of what baud rate is picked.) And throwing bytes into /dev/ttyS2 (Rockchip UART under the sad Chromebook linux) has no effect on SBU1/SBU2.

Nothing interesting happens on reset, either.

Dear reader, do you know the secret of Google’s “Suzy-Q” (aka “Debug Accessory Mode”) magic cable? If you do — tell me!
(it’s a USB UART! see top of page.)

Open Problem: Forcing MaskROM Mode on the Asus C101PA

The Asus C101PA is based on a Rockchip RK3399. These have a “maskrom mode”, where if the SPI EEPROM is disabled, the chip will attempt to boot from other devices: first, NAND flash, then microSD, and then finally a USB debug mode where you can attach a A-A cable and use the rkflashtool utility to manipulate the firmware.

So, quickly going from theory to practice, let’s make a SPI ROM cutoff button:

b0
The SPI EEPROM, labeled with pinout.

b1
Let’s make a toggle:

b2
#CS is an active-low chip select. Button will short it to the chip’s supply rail when pressed.

b3
Button.

b4
The button, again.

Now, according to the theory, booting up the box with the toggle held down, will send the RK3399 to look for the boot signature on the eMMC NAND; given as it will not find it there, it will then search the SD card, if inserted, and go into USB debug mode if this fails.

However, turns out: “мазать уже можно, есть пока нельзя”: the box indeed boots with black screen and running light active, if booted with the switch held down. But — it ignores a valid firmware image (copied from SPI ROM) on the SD card, and none of the 3 USB ports (either of the two USB-C ports, or the conventional USB3 A-connector, when A-A cable with PC is used) switches into debug mode. The box can be reset with the magic two-finger salute (powerswitch plus “refresh” key) and reboots normally, if switch is released. But nothing useful happened, evidently, from disabling the SPI EEPROM…

So, what gives? If you, dear reader, know — please write in!

Edit: somebody seems to know something

Wanted: Write-Once MicroSD Card !

Allegedly these exist! — though I have only been able to find them offered for sale by the railroad car.

For certain applications, nothing else will really suffice.

If any of my readers know of (or wish to become) a vendor offering, in (for starters) mid-three-digit quantities:

  • a) One Time Programmable MicroSD card
  • b) MicroSD card with a true hardware write-protect feature (specifically not the idiot floppy-style software-read tab

…please write in!

Open Problem: Delousing the Asus C101PA.

Apr. 26 update: This article is obsolete, the pill — was found; if you have this machine, scroll to the end.


The Asus C101PA Chromebook is a very interesting device: it contains a Rockchip CPU, for which we already have a working deloused Gentoo; it also contains such marvels as a non-blobulous Marvell 802.11 card, a reasonable IPS screen, a 9-hour battery, etc.

However it also contains Google’s Fritz chip boobytrap, the firmware write-protect.

And no, it does not have the screw in the same place as the visually-identical rubbish-LCD-crippled C100P. Or, apparently, anywhere at all:

Nope1

Looks like the typical Chromebook defusing screw, neh?

Well, removal does precisely nothing:

Nope1

Not even after hard-reset.

And again nothing:

Nope1

Let’s take the thing apart entirely, strip off the speakers, pull the mainboard, peel off the heat sink:

Nope1

None of the screw pads (aside from the bottom of the first “nope”) even have anything resembling multiple contact pads that could indicate a defusing screw.

No place to put crocodiles and rewrite the flash ROM, either! All BGA.

And the complete silence on, as far as I can tell, the entire Net, re: this beast, is IMHO most peculiar.

And yes I am aware of “crouton” and various other chumpatronic rubbish offered as an alternative to flensing the Google liquishit from this device (which appears to be the only laptop on the market with 9 hour battery, IPS, and a total absence of x86 crapolade).

And yes I am well aware that it is still possible to install a new OS on this box, if one is willing to tolerate the nagware screen and nag siren on boot. Which I am not.

Verdict: the C101P is a paperweight until and unless somebody reveals the secret of removing WP# from it. Do you know how? Tell me!

Update: A query in the heathen pits yielded up the pill: pulling the battery disables the WP#.

A Clean Gentoo for ‘Rockchip’.

This recipe will re-create the hygienic gentoo installed on the RK pilot plant
at Pizarro. The resulting install will also contain the full set of
distfiles used in the rebuilding of the 'world', in /usr/portage/distfiles.

No promises are made of fitness for a particular use, or of provenance,
OTHER THAN as described above.

---------
Materials
---------

0 ) My PGP key

1 ) A rk3328-roc-cc machine

2 ) microSD card of at least 256MB capacity

3 ) USB3 drive of at least 8GB capacity

4 ) The prepared gentoo root fs: rk-gentoo.tar.gz

^ This is a CONVENTIONAL (i.e. not musltronic ) gentoo for arm64 architecture.

Only the SD card will contain machine-specific bootloaderisms:

5 ) The bootloaderisms: loaders.tar.gz

6 ) OPTIONAL: vendor-patched kernel source and .config:

asciilifeform-kernel.tar.gz

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7 ) Verify the signature and hashes of above in MANIFEST.TXT !!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

----------------
SD Card for Boot
----------------

The rk3328-roc-cc can boot from either eMMC (not discussed, I do not
use it ) or microSD card (we will use the latter. ) To prepare a u-SD
card for booting, obtain a card of at least 256MB capacity, and place
into a suitable reader. The instructions below assume that this device
shows up on your system as /dev/sdb, and is writeable to by the user
as which you are working. Adjust as necessary.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
WARNING: if you are careless, you WILL nuke YOUR workstation's disk!!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

First, create the partitions:

1 ) parted -s /dev/sdb mklabel gpt

2 ) parted -s -a optimal /dev/sdb unit s mkpart l1 64 8063

3 ) parted -s -a optimal /dev/sdb unit s mkpart l2 16384 24575

4 ) parted -s -a optimal /dev/sdb unit s mkpart t 24576 32767

5 ) parted -s -a optimal /dev/sdb unit s mkpart boot fat16 32768 262143

6 ) parted -s -a optimal /dev/sdb set 4 boot on

7 ) sync

Now extract the loader images:

8 ) tar xvfz loaders.tar.gz

9 ) dd if=l1.bin of=/dev/sdb1

^ this is the vendor's RAM initializer. Recent uboot obsoletes it,
but I have not tested this yet. For now we use the original.

10 ) dd if=l2.bin of=/dev/sdb2

^ this is more or less ordinary u-boot, Rockchip's www has the
source.

11 ) dd if=t.bin of=/dev/sdb3

^ this is the ARM hypervisor turd; ARM's www has the source.
AFAIK this thing does not actually do anything without OS support.

12 ) dd if=kern.bin of=/dev/sdb4

^ this is a FAT16 partition containing my kernel ( see further
below. ) It will get mounted as /boot later .

13 ) sync

14 ) Disconnect the SD reader from your workstation !

----------------------------
USB3 Stick for '/' Partition
----------------------------

Obtain a suitable USB3 stick. I recommend Samsung's.

It is also possible to use a mechanical (or SSD) SATA drive,
via a suitable adapter cable.

Do not use USB2 disks, gentooation will be painfully slow on these.

The instructions again assume that it is seen by the machine as
/dev/sdb. If this is not the case on your system, adjust the recipe.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
WARNING: if you are careless, you WILL nuke YOUR workstation's disk!!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1 ) parted -s -a optimal /dev/sdb mklabel gpt -- mkpart primary ext4 1 -1

2 ) mkfs.ext4 /dev/sdb1

3 ) sync

4 ) mount /dev/sdb1 /mnt/usb

5 ) tar pxvfz rk-gentoo.tar.gz -C /mnt/usb

6 ) edit, with your favourite text editor, /mnt/usb/conf.d/net :

comment out, if required, the line with dhcp, and uncomment and
properly adjust the ones pertaining to static ip; or leave this
config alone for a dhcp on boot.

7 ) put your hostname in /mnt/usb/conf.d/hostname

8 ) sync

9 ) umount /mnt/usb

10 ) Insert the SD card and USB3 drive into the RK machine;
connect the network cable, but see below first:

11 ) WARNING -- DO NOT EXPOSE THE BOX ON PUBLIC NET YET:

%%% the root password is null! %%%
%%% the ssh host privkey is unsecret! %%%

To fix these, boot up on an isolated LAN first, and, via ssh,

11-1 ) set the root password using the passwd command.

11-2 ) generate a NEW SSH HOSTKEY :

ssh-keygen -t rsa -b 4096 -f /etc/ssh/ssh_host_rsa_key -N ""

12 ) The kit is now ready for use.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Be aware that my kernel does not support video, mice, or keyboards,
it is meant purely for headless machines. If this is not what you
want, unpack asciilifeform-kernel.tar.gz , edit the config, and
build a new kernel; then install it into /boot (i.e. your SD card.)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The Return of Phuctor!

I have the pleasure of informing my readers that…

Phuctor is back!

It — exactly as it was, but with a few minor fix-ups for browsing speed — now lives on a very spiffy 32-core Opteron at Pizarro, the ISP.

The WWW UI is already up; the factoring proper will resume later tonight.

“Finite Field Arithmetic.” Chapter 10: Introducing Karatsuba’s Multiplication.

This article is part of a series of hands-on tutorials introducing FFA, or the Finite Field Arithmetic library. FFA differs from the typical “Open Sores” abomination, in that — rather than trusting the author blindly with their lives — prospective users are expected to read and fully understand every single line. In exactly the same manner that you would understand and pack your own parachute. The reader will assemble and test a working FFA with his own hands, and at the same time grasp the purpose of each moving part therein.

You will need:

Add the above vpatch and seal to your V-set, and press to ffa_ch10_karatsuba.vpatch.

You should end up with the same directory structure as previously.

Now compile ffacalc:

cd ffacalc
gprbuild

But do not run it quite yet.


First things first:

Manul Inspection.

  • The mail bag from Chapter 9 will be dealt with in Chapter 11.
  • This Chapter may require a cup of coffee.
  • This Chapter is dedicated to my most dedicated readers, such that I did not even know that I had, who publicly raged when it did not appear on schedule.


Recall the algebraic identities in Mul_Word of Chapter 9, where a single Word × Word multiplication is turned into four HalfWord × HalfWord ones. Let’s consider the general case.

Suppose that b is a positive integer, and we would like to multiply a 2b-bit-wide X by a 2b-bit-wide Y. We can express the multiplicands algebraically as follows:


X = XLo + 2bXHi
Y = YLo + 2bYHi

…where NLo consists of any 2b-wide integer N’s least-significant b binary digits, while NHi consists of its b most-significant binary digits. The multiplication itself can then be written as:


XY = (XLo + 2bXHi)(YLo + 2bYHi)
.. = XLoYLo + 2bXLoYHi + 2bXHiYLo + 2bXHi2bYHi
.. = XLoYLo + 2b(XLoYHi + XHiYLo) + 22bXHiYHi

…or pictured “physically”, drawn to scale (suppose that the most junior bit position in a given register is at the left edge of the rectangle representing said register; the most senior — at the right edge) :

XLo XHi
× YLo XHi
=
XLoYLo
+ XLoYHi
+ XHiYLo
+ XHiYHi
= XY

Performing this “schoolbook” 2b × 2b multiplication costs four b × b-wide multiplications (multiplications by powers of two are performed using inexpensive position shifting and do not figure towards this count) and three 2b-wide additions. This cost, as we learned in previous chapters, is proportional to the square of b. But in fact it is possible to do better — using a subquadratic multiplication algorithm.

We will make use of the oldest, best-known, and simplest such algorithm: the classic one derived from A. A. Karatsuba’s (1937 – 2008) “Multiplication of many-digital numbers by automatic computers”, Dokl. Akad. Nauk SSSR, 145:2 (1962), 293–294.

After working through this Chapter, the reader will possess a fits-in-head, constant-spacetime and provably-correct computerized implementation of Karatsuba’s algorithm.

First, let’s specify that our multiplicands X and Y are FZ-integers, and each of them occupies exactly L Words.

Recall that in Chapter 4, we have mandated that no FZ must come into existence with a bitwise length that is not a positive power of two. Therefore, at all times, a FZ will break cleanly into two halves of equal bitnesses, as many times as desired until halves each consisting of a FZ one Word in length are formed.

And so, at every recursive (see further below) invocation of our procedure, we are able to begin by finding a positive integer k (where L = 2k) i.e. the length of a multiplicand cut in half.

Thereby we can take b = MachineBitness × k, where k = L / 2.

Let’s also define, for brevity, the following recurring terms:


LL = XLoYLo
HH = XHiYHi
MM = XLoYHi + XHiYLo

And trivially obtain the equation:


XY = LL + 2bMM + 22bHH

Karatsuba’s discovery is the fact that it is possible to compute the MM middle term using only one b × b multiplication in place of two. In the most common implementations of the algorithm, this appears as the equivalence:


MM = XLoYHi + XHiYLo
.. = (XLo + XHi)(YLo + YHi) - HH - LL

This algebraic identity replaces an expensive operation (multiplication) with three cheap (addition/subtraction) arithmetical operations. However, the possibility of generating a carry during addition when computing the terms of the binomial makes the mechanism slightly slower and uglier than it strictly has to be. So we will instead use another variant of the algorithm, sometimes called subtractive Karatsuba. It makes use of the following trivial equivalence:


MM = LL + HH - (XLo - XHi)(YLo - YHi)

In FFA’s realization, this will take the following — at first, seemingly gnarlier — form:


Dx = |XLo - XHi|
Dy = |YLo - YHi|
DD = Dx × Dy
MM = LL + HH - (-1CX)(-1CY)DD

…where CX is the sign (in the convention where it equals 1 if XLo is less than XHi, and 0 otherwise) of XLo – XHi and CY is the sign (similarly) of YLo – YHi.

And it is not difficult to show that, if we take


DDSub = CX XNOR CY

then:


MM = LL + HH + (-1DDSub)DD

Or, for kindergarten:

Or, for the thick, for the impatient,
CX CY DDSub (-1DDSub)DD MM =
0 0 1 -DD LL + HH - DD
0 1 0 DD LL + HH + DD
1 0 0 DD LL + HH + DD
1 1 1 -DD LL + HH - DD

…i.e. the DD term ends up subtracted if (XLo – XHi)(YLo – YHi) is positive; otherwise it gets added. Which is as it should be. And this gives us the formula:


XY = LL + 2b(LL + HH + (-1DDSub)DD) + 22bHH

And now let’s draw a possible shape for this mechanism “electrically”. First, let’s do the obvious Right Thing and begin by stuffing LL and HH into the lower and upper, respectively, halves of the register in which we are computing the sum. And after that, we will add the “middle” terms, at the requisite offset (i.e. at the b-th binary place) :

LL HH
+ LL 0
+ HH 0
+ (-1DDSub)DD 0
= XY

The above would work as a physical realization, but our 2b-sized additions sadly became 3b-sized, because of the need to ripple out the carries (recall that the addition of any two s-sized integers is physically s+1-sized, as there is the possibility of overflow.) But can we do anything about this, or do we have to live with it?

Turns out: there is a pill. One economical way to make the additions stay 2b-sized, is to accumulate the carries. We will put them in a Word-sized register called TC (i.e. tail carry). After the three terms of the original equation have been added into the result XY, we will make the result correct by rippling out (at the cost of walking over a b-sized space) the accumulated carry in TC into the tail (i.e. most senior quarter segment) of the result XY.

The new scheme looks like this:

LL HH TC := 0
+ LL TC += Carry
+ HH TC += Carry
+ (-1DDSub)DD TC += Carry – DDSub
+ TC
= XY

The reason why, after the third addition, we must…


TC += Carry - DDSub

…is not uninteresting, and I am quite tempted to leave it as an exercise. And in fact I think I will:


Chapter 10, Exercise 1: Why is it necessary to subtract DDSub from TC ?


Observe that we can simplify the program we are about to write, by proving the lemma:


TC >= 0

The fact that TC is greater than or equal to zero by the time it gets rippled, trivially follows from the fact that:


MM = XLoYHi + XHiYLo

There is no way for the value of above expression to become negative, given that no subtraction happens in it, and that both of the multiplicands in both of the terms, are positive! From which it necessarily follows that, given as…


MM = LL + HH + (-1DDSub)DD

…it must then at all times be true that:


LL + HH >= (-1DDSub)DD

And therefore it will never become necessary to propagate a “borrow” when rippling TC into the tail, and we can use a strictly “additive” formulation in the pertinent code.

But what is the maximum possible value of TC ? It so happens that the answer is: two. And we can show… but wait. Why not leave this as an exercise ! So :


Chapter 10, Exercise 2: Can the value of TC ever become greater than 2 ? Give an example of a Karatsubaization where it does; or if it cannot, prove that it cannot.


And now let’s put all of this together into a mechanism:

FZ_Mul.adb:

   procedure Mul_Karatsuba(X  : in  FZ;
                           Y  : in  FZ;
                           XY : out FZ) is
 
      -- L is the wordness of a multiplicand. Guaranteed to be a power of two.
      L : constant Word_Count := X'Length;
 
      -- An 'LSeg' is the same length as either multiplicand.
      subtype LSeg is FZ(1 .. L);
 
      -- K is HALF of the length of a multiplicand.
      K : constant Word_Index := L / 2;
 
      -- A 'KSeg' is the same length as HALF of a multiplicand.
      subtype KSeg is FZ(1 .. K);
 
      -- The three L-sized variables of the product equation, i.e.:
      -- XY = LL + 2^(K*Bitness)(LL + HH + (-1^DD_Sub)*DD) + 2^(2*K*Bitness)HH
      LL, DD, HH : LSeg;
 
      -- K-sized terms of Dx * Dy = DD
      Dx, Dy     : KSeg;  -- Dx = abs(XLo - XHi) , Dy = abs(YLo - YHi)
 
      -- Subtraction borrows, signs of (XL - XH) and (YL - YH),
      Cx, Cy     : WBool; -- so that we can calculate (-1^DD_Sub)
 
      -- Bottom and Top K-sized halves of the multiplicand X.
      XLo        : KSeg  renames    X(  X'First       ..   X'Last - K );
      XHi        : KSeg  renames    X(  X'First + K   ..   X'Last     );
 
      -- Bottom and Top K-sized halves of the multiplicand Y.
      YLo        : KSeg  renames    Y(  Y'First       ..   Y'Last - K );
      YHi        : KSeg  renames    Y(  Y'First + K   ..   Y'Last     );
 
      -- L-sized middle segment of the product XY (+/- K from the midpoint).
      XYMid      : LSeg  renames   XY( XY'First + K   ..  XY'Last - K );
 
      -- Bottom and Top L-sized halves of the product XY.
      XYLo       : LSeg  renames   XY( XY'First       ..  XY'Last - L );
      XYHi       : LSeg  renames   XY( XY'First + L   ..  XY'Last     );
 
      -- Topmost K-sized quarter segment of the product XY, or 'tail'
      XYHiHi     : KSeg  renames XYHi( XYHi'First + K .. XYHi'Last    );
 
      -- Whether the DD term is being subtracted.
      DD_Sub     : WBool;
 
      -- Carry from individual term additions.
      C          : WBool;
 
      -- Tail-Carry accumulator, for the final ripple
      TC         : Word;
 
   begin
 
      -- Recurse: LL := XL * YL
      FZ_Multiply(XLo, YLo, LL);
 
      -- Recurse: HH := XH * YH
      FZ_Multiply(XHi, YHi, HH);
 
      -- Dx := |XL - XH| , Cx := Borrow (i.e. 1 iff XL < XH)
      FZ_Sub_Abs(X => XLo, Y => XHi, Difference => Dx, Underflow => Cx);
 
      -- Dy := |YL - YH| , Cy := Borrow (i.e. 1 iff YL < YH)
      FZ_Sub_Abs(X => YLo, Y => YHi, Difference => Dy, Underflow => Cy);
 
      -- Recurse: DD := Dx * Dy
      FZ_Multiply(Dx, Dy, DD);
 
      -- Whether (XL - XH)(YL - YH) is positive, and so DD must be subtracted:
      DD_Sub := 1 - (Cx xor Cy);
 
      -- XY := LL + 2^(2 * K * Bitness) * HH
      XYLo := LL;
      XYHi := HH;
 
      -- XY += 2^(K * Bitness) * HH, but carry goes in Tail Carry accum.
      FZ_Add_D(X => XYMid, Y => HH, Overflow => TC);
 
      -- XY += 2^(K * Bitness) * LL, ...
      FZ_Add_D(X => XYMid, Y => LL, Overflow => C);
 
      -- ... but the carry goes into the Tail Carry accumulator.
      TC := TC + C;
 
      -- XY += 2^(K * Bitness) * (-1^DD_Sub) * DD
      FZ_Not_Cond_D(N => DD, Cond => DD_Sub); -- invert DD if 2s-complementing
      FZ_Add_D(OF_In    => DD_Sub,            -- ... and then must increment
               X        => XYMid,
               Y        => DD,
               Overflow => C); -- carry will go in Tail Carry accumulator
 
      -- Compute the final Tail Carry for the ripple
      TC := TC + C - DD_Sub;
 
      -- Barring a cosmic ray, 0 < = TC <= 2 .
      pragma Assert(TC <= 2);
 
      -- Ripple the Tail Carry into the tail.
      FZ_Add_D_W(X => XYHiHi, W => TC, Overflow => C);
 
      -- Barring a cosmic ray, the tail ripple will NOT overflow.
      pragma Assert(C = 0);
 
   end Mul_Karatsuba;

But… Does this routine work? How fast does it run? What do the new procedures like FZ_Not_Cond_D do? Where and when do we recurse ? Why did Comba’s Multiplier have to change at all !? And, most importantly, is the proof “proofy” enough ?? Find out in Chapter 11!


~To be continued!~

Dicţionar Român-Englez.

This post exists to give a permanent home and linkable reference point for certain materials. Specifically, a — to my knowledge, currently the only one found anywhere on the entire Netgrep-able plain-text English-Romanian dictionary.

The original source was an ancient piece of MS-Windows nagware. The perpetrator of this item (I hesitate to dignify his actions with the word “author”) took the — very clearly pre-1989, public domain — material, and embedded it in a crypted EXE “viewer” turd. Why he did this, can only be guessed at, but a certain observation by Mircea Popescu may shed some light on this particular type of psychopathology:


So : you have found somewhere something exceptional. A treasure trove. Looky, you were just walking down the road and found a chestful of wonder. What to do ? Well, confronted with such an occurence, some people proceed in the following manner : they run off to town, gather the folk and explain : “looky, I’ve found this chestful of wonder, so and so and back and forth, let’s go pick it up, stick it in a museum, so it may be inspected, admired, discussed, preserved for future generations so they may also gain by the said wonders”. Other… let’s call them people by virtue of bipedalism, proceed in a different manner : they cloink a coupla with the sledgehammer so as to break down the find into shards the size they can fit in a pocket, after which they stick it on their oxcart. Nevermind that those things aren’t made to go with oxcarts, nor do they help the oxcart in any way, the peon sits proudly on his pile of dung. And if anyone asks where he found the things, for they don’t really go with his general appearance, he answers just as proudly : he raised them since they were but pistols. (This is an ancient joke, saying that a gypsy is picked up for stealing a rifle. Before the judge, he sees a farmer brought under charges of having stolen a cow. “Why’ve you stolen the cow ?” asks the judge. “I’ve not stolen it your Honor, I raised it myself since it were a calf”. He’s eventually released, and when the gypsy’s turn comes, “Why’ve you stolen the rifle ?” asks the judge. “I’ve not stolen it your honor, I raised it myself since it were a pistol”.)

“What happens when you add a drop of sewage to a bottle of fine wine ?”

And so, with a big, fat, Fuck You to the oxcart rider, all of it (decrypted, deduplicated, and minus some obvious rubbish) is out in the open, permanently:

File Description SHA256
ro_eng_ascii.txt Romanian < -> English Dictionary. b10ac91292746466433b2cdb4ff67268b556463fe144e38cd20bde4057c0d5e3
ro_eng_tech.txt Romanian < -> English “Technical” Dictionary. 82f44453fe55847d5ee0e1b80697cf8a1e6fa088bf2d823ab79bb7b2a8ce8fc3
ro_eng_expr.txt Romanian < -> English “Expressions” Dictionary. c085054bf9f51d17d8d7c6a610b496b5086647391114c07cfb28a264fcb4dee0

The first of these items (the general-purpose dictionary) is also available as a V-Genesis:

ro_eng_genesis.vpatch

ro_eng_genesis.vpatch.asciilifeform.sig

The diacritics are conspicuous in their absence.
I suspect that the sledgehammer-wielder had removed them, for whatever reason of his own.

You can also flip any of these Romanian -> English dictionaries into an English -> Romanian one, e.g.:

paste -d'`' < (cut -f2 -d'`' ro_eng_ascii.txt) <(cut -f1 -d'`' ro_eng_ascii.txt) | sort -n > eng_ro_ascii.txt

In all three files, the backtick (`) is used to separate each line into the two respective languages; parsing and searching is quite trivial.

Edit: files now mirrored here.