"Cryostat" Genesis.

Cryostat is a Fits-in-Head minimal (~700 LOC, including comments) static library for adding safe and reliable persistent storage to Ada data structures. It makes use of memory-mapped disk I/O via the MMap() system call, present in Linux (kernel 2.4 and newer) and all compatible operating systems. This mechanism permits efficient work with persistent data structures substantially larger than a machine's physical RAM.

AdaCore offers their own implementation of MMap() support in GNATColl. However, IMHO their item is an atrocity, in many ways very similarly to their GNAT Sockets pile of garbage (the near-unusability of the latter is what prompted me to write Ada-UDP in 2018.) AdaCore's MMap library is not only a behemoth replete with e.g. special cases for MS-Win support, but its use is entirely incompatible with safety-restricted compilation profiles.

Cryostat, on the other hand, does NOT require enabling the use of pointerism, unchecked conversions, the secondary stack, heap allocators, or other bulky and objectionable GNAT features, in the calling program. It does however require finalization to be enabled. This is used to guarantee the safe sync-to-disk and closing of the backing MMap when the data structure it contains goes out of scope.

Let's proceed to building Cryostat and its included demo program.

You will need:

Add the above vpatch and seal to your V-set, and press to cryostat_genesis.kv.vpatch.

Now compile the included CryoDemo:

cd demo
gprbuild

... this will build both the demo and the library.

But do not run it quite yet.


First, let's see what this demo consists of :

cryodemo.adb:

with Interfaces;  use Interfaces;
with ada.text_io; use  ada.text_io;
 
with Cryostat;
 
 
procedure CryoDemo is
 
   -- Path on disk for the example Cryostat backing file :
   File_Path : constant String := "cryotest.bin";
 
   -- Now, let's define an example data structure to place in a Cryostat :
 
   -- Example payload array's element type: byte.
   subtype ADatum is Unsigned_8;
 
   -- Let's make it 512MB - far bigger than a typical stack, to demonstrate
   -- that it will in fact reside in the Cryostat, rather than on the stack :
   A_MBytes : constant Unsigned_32 := 512;
 
   -- Example payload: an array.
   subtype ARange is Unsigned_32 range 0 .. (A_MBytes * 1024 * 1024) - 1;
 
   -- Complete the definition of the payload data structure :
   type TestArray is array(ARange) of ADatum;
 
   -- Declare a Cryostat which stores a TestArray :
   package Cryo is new Cryostat(Form     => TestArray,
                                Path     => File_Path,
                                Writable => True,  -- Permit writing
                                Create   => True); -- Create file if not exists
 
   -- Handy reference to the payload; no pointerisms needed !
   T : TestArray renames Cryo.Item;
 
   -- T can now be treated as if it lived on the stack :
 
begin
 
   Put_Line("T(0)    before :  " & ADatum'Image(T(0)));
   Put_Line("T(Last) before :  " & ADatum'Image(T(T'Last)));
 
   -- Increment each of the elements of T :
   for i in T'Range loop
      T(i) := T(i) + 1;
   end loop;
 
   Put_Line("T(0)    after  :  " & ADatum'Image(T(0)));
   Put_Line("T(Last) after  :  " & ADatum'Image(T(T'Last)));
 
   --- Optional, finalizer always syncs in this example
   --  Cryo.Sync;
 
   --- Test of Zap -- uncomment and get zeroized payload every time :
   --  Cryo.Zap;
 
   Put_Line("OK.");
 
end CryoDemo;

In the demo, we define TestArray -- a data structure consisting of a 512 megabyte array, and invoke Cryostat to create a persistent disk store for it. (When the program is first run, the array -- instantiated as T -- will contain only zeros.) After this, we increment each byte in T, and terminate. When, in the end, T goes out of scope, the finalizer kicks in and properly syncs the payload to disk. Thus, T behaves exactly like a stack-allocated variable, with the exception of the fact that its contents are loaded from disk upon its creation (on the second and subsequent runs of the program) and synced to disk upon its destruction (or if Sync were to be invoked.)

Observe that the calling code is not required to perform any file-related manipulations, or to juggle memory; all of the necessary mechanisms (including error handling) are contained in the Cryostat static library.

When we first execute the demo:

./bin/cryodemo

The following output will appear:

T(0)    before :   0
T(Last) before :   0
T(0)    after  :   1
T(Last) after  :   1
OK.

If we run it again, will see the following:

T(0)    before :   1
T(Last) before :   1
T(0)    after  :   2
T(Last) after  :   2
OK.

... and so forth. cryotest.bin, the backing file used by the Cryostat in the demo, will consist of 512 megabytes of byte value N, where N is the number of times the demo has executed. For example, after the first run, a hex dump:

hexdump -C cryotest.bin

... will yield:

00000000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................|
*
20000000

Let's use the traditional strace tool to confirm that the demo behaves as specified:

strace ./bin/cryodemo

The following output will appear:

execve("./bin/cryodemo", ["./bin/cryodemo"], [/* 84 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x644798)       = 0
set_tid_address(0x6447d0)               = 3660
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGABRT, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGFPE, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
rt_sigaction(SIGBUS, {0x41c360, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
sigaltstack({ss_sp=0x644a80, ss_flags=0, ss_size=16384}, NULL) = 0
rt_sigaction(SIGSEGV, {0x41c360, [], SA_RESTORER|SA_STACK|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x42498c}, NULL, 8) = 0
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
open("cryotest.bin", O_RDWR|O_CREAT, 0666) = 3
ftruncate(3, 536870912)                 = 0
mmap(NULL, 536870912, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f3bcc575000
writev(1, [{"", 0}, {"T(0)    before :   0\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(Last) before :   0\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(0)    after  :   1\n", 21}], 2) = 21
writev(1, [{"", 0}, {"T(Last) after  :   1\n", 21}], 2) = 21
writev(1, [{"", 0}, {"OK.\n", 4}], 2)   = 4
msync(0x7f3bcc575000, 536870912, MS_SYNC) = 0
munmap(0x7f3bcc575000, 536870912)       = 0
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

There are a few minor knobs that still ought to be added to Cryostat (See README.TXT) but even as it presently stands, it is already sufficient for basic experimentation with clean and compact databases implemented wholly in Ada.


~ To Be Continued. ~


This entry was written by Stanislav , posted on Thursday June 04 2020 , filed under Ada, Bitcoin, Cold Air, Computation, SoftwareArchaeology, SoftwareSucks . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

8 Responses to “"Cryostat" Genesis.”

  • St Gregory of Nyssa says:

    I am glad to see you writing again about orthogonal persistence. If you have time perhaps another conceptual essay would be in order, in the vein of Don't Blame the Mice, which discusses the contrast between Ada's old-school runtime-checking and C's false economy of performance. Unlike with LISP, these are two languages which belong to the same family. The decay of programming therefore seems to have happened in (at least) two stages: first, the sacrificing of homoiconism and macros, and second, the death of the "managed languages" if we may use that term, such as Ada, Pascal, COBOL, and Simula, which continued to impose an environment of predictability.

    • Stanislav says:

      Dear St Gregory of Nyssa,

      > I am glad to see you writing again about orthogonal persistence.

      Unfortunately there can be no true orthogonal persistence on a standard Linux box -- the way I understand the term, it refers to a complete cure, from the userland point of view, for the disease of storage volatility.

      Such a cure requires iron support (at the very least -- battery-backed preservation of RAM to disk, and the removal of direct blockwise access to said disk from the expected set of userland functionality.)

      What you see here is instead simply the addition of a long-ubiquitous POSIX knob to a place where it was, peculiarly, missing.

      > the contrast between Ada's old-school runtime-checking and C's false economy of performance...

      IMHO, "Naggum's bathtub" has this reasonably covered (albeit with CL as the example of sane behaviour.)

      > the death of the "managed languages" if we may use that term...

      Much worse than a proper death -- "managed languages" are ubiquitous today, but implemented with the maximum conceivable brain damage (e.g. Python and related half-working scripting langs; the various MS atrocities; and similar.)

      Fortunately, Ada still works (though it required considerable effort, and not only mine but other people's) to wake it up in the cave in which it slept.

      Yours,
      -S

      • St Gregory of Nyssa says:

        > Fortunately, Ada still works (though it required considerable effort, and not only mine but other people's) to wake it up in the cave in which it slept.

        The variety of the "pragma" settings and their respective nuances have to be one of Ada's greatest draw-backs, since they complexify the mental model of how the language internally operates. That said, the presence of mixed run-time and compile-time safety-features attests to the existence of a certain engineering mindset predating the ubiquity of C and C++. As CAR Hoare once stated:

        Every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to - they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law.

        I think modern programmers and incoming students need to realize that the problems of C (and by extension C++) are not a straightforward consequence of the language simply being "low level" or "static" or "imperative." Rather, the problem lies in the fact that the C community popularized the social acceptability of leaking abstractions. It would seem that languages such as Java and Python subsequently evolved out of this new worldview.

        • Stanislav says:

          Dear St Gregory of Nyssa,

          > The variety of the "pragma" settings and their respective nuances have to be one of Ada's greatest draw-backs, since they complexify the mental model of how the language internally operates.

          How on earth is this a drawback?! IMHO, the ability to shut down support for unnecessary (in a given program unit) language features is one of the most attractive aspects of the language.

          Not all programs require support for heap allocation, or the "secondary stack" (used for returning variable-length data structures from function calls), or "tasks" (threads, in Ada world), etc. And the presence of these features substantially bulks up the binary and complicates analysis.

          > Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs

          Ada in fact permits the programmer to switch off runtime checks if he is (somehow) certain that a program is correct across all possible inputs. When this is done, you get roughly the same binary as you would have if you had written the program in C.

          > Rather, the problem lies in the fact that the C community popularized the social acceptability of leaking abstractions.

          Indeed, culture of shitware. "The New Jersey Philosophy."

          > It would seem that languages such as Java and Python subsequently evolved out of this new worldview.

          And ever-continuing attempts at pushing "job-creation" and "job-security" shitlangs as "the new thing".

          Languages which actually work as specified and permit the construction of actually correct (i.e. ones which permanently and completely solve a specified problem) programs are anathema to the modern "software industry."

          Yours,
          -S

  • anon says:

    This guy complains that I/O errors with mmap() thow signals - particularly SIGBUS when things like network mounts go away. Does Ada provide reasonable facilities for managing POSIX signals?

    • Stanislav says:

      Dear anon,

      Network mounts (under all operating systems known to me) are an infamously leaky abstraction -- all kinds of file system knobs are known to misbehave when they are in use.

      That being said, a vanished network mount is analogous to yanking your HDD cable -- what would be the "graceful" way to handle the resulting barf in software (other than simply terminating the process) ?

      FWIW you can define POSIX signal handlers in standard GNAT; though as I understand, if you are using Ada Tasks backed by native Linux threads, will have to override the built-in ones.

      Yours,
      -S

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" highlight="">


MANDATORY: Please prove that you are human:

56 xor 27 = ?

What is the serial baud rate of the FG device ?


Answer the riddle correctly before clicking "Submit", or comment will NOT appear! Not in moderation queue, NOWHERE!