Shards of Lost Technology, and the Need for High-Level Architectures.

The modern high-level-language programmer thinks (if he is of the thinking kind) of low-level system architecture as a stubborn enemy, or, at best, a harsh and indifferent force of nature. Anyone who suggests that everyday desktop apps ought to be written directly in a CPU’s native instruction set is viewed as much the same kind of lunatic as someone who brings up swimming as a practical means of crossing the Atlantic on a tourist vacation. Yet, unlike the Atlantic, the ocean of low-level machine ugliness which we perilously cross in our HLL boats is one of our own creation. Un-creating it is far from impossible. It is not even a particularly deep problem.

There are viable alternatives to the present way of building computers. Those in the know sometimes say that today’s dominant architectures are “built to run C.” In order to fully appreciate the truth of this statement, one must put on an archaeologist’s hat and unearth some which were not. There are many interesting lessons we could learn from the ruins of computer architecture’s Age of Exploration. Let’s examine the Scheme-79 chip: the only architecture I know of which was truly elegant inside and out. It eschewed the compromises of its better-known contemporary, the MIT Lisp Machine (and its later incarnations at LMI and Symbolics) – internally microcoded stack machines, whose foundational abstractions differed minimally from those found in today’s CPUs and VMs. The experimental S79 fetched and executed CONS cells directly – and was coupled to a continuously-operating hardware garbage collector. I will not describe the details of this timeless beauty here – the linked paper is eminently readable, and includes enough detail to replicate the project in its entirety. Anyone who truly wishes to understand what we have lost is highly encouraged to study the masterpiece.

Here is one noteworthy tidbit:

“A more speculative approach for improving the performance of our interpreter is to optimize the use of the stack by exploiting the observation that the stack discipline has regularities which make many of the stack operations redundant. In the caller-saves convention (which is what the SCHEME-79 chip implements) the only reason why a register is pushed onto the stack is to protect its contents from being destroyed by the unpredictable uses of the register during the recursive evaluation of a subexpression. Therefore one source of redundant stack operations is that a register is saved even though the evaluation of the subexpression may not affect the contents of that register. If we could look ahead in time we could determine whether or not the register will retain its contents through the unknown evaluation. This is one standard kind of optimization done by compilers, but even a compiler cannot optimize all cases because the execution path of a program depends in general on the data being processed. However, instead of looking ahead, we can try to make the stack mechanism lazy in that it postpones pushing a register until its contents are about to be destroyed. The key idea is that each register has a state which indicates whether its contents are valuable. If such a valuable register is about to be assigned, it is at that moment pushed. In order to make this system work, each register which may be pushed has its own stack so that we can decouple the stack disciplines for each of the registers. Each register-stack combination can be thought of as having a state which encodes some of the history of previous operations. It is organized as a finite-state automaton which mediates between operation requests and the internal registers and stack. This automaton serves as an on-the-fly peephole optimizer, which recognizes certain patterns of operations within a small window in time and transforms them so as to reduce the actual number of stack operations performed.”

“The SCHEME-79 Chip” (G. J. Sussman, J. Holloway, G. L. Steel, A. Bell)

What we are looking at is a trivial (in retrospect) method for entirely relieving compilers of the burden of stack discipline: a necessary first step towards relieving programmers of the burden of compilers. A systems programmer or electrical engineer educated in the present Dark Age might ask why we ought to demand relief from CPUs which force machine code to “drive stick” in register allocation and stack discipline. After all, have we not correctly entrusted these tasks to optimizing compilers? Should we not continue even further in this direction? This is precisely the notion I wish to attack. Relegating the task of optimization to a compiler permanently confines us to the dreary and bug-ridden world of static languages – or at the very least, makes liberation from the latter nontrivial. So long as most optimization takes place at compile time, builders of dynamic environments will be forced to choose between hobbled performance and the Byzantine hack of JIT compilation.

The instruction set of a properly designed computer must be isomorphic to a minimal, elegant high-level programming language. This will eliminate the need for a complex compiler, enabling true reflectivity and introspection at every level. Once every bit of code running on the machine is subject to runtime inspection and modification by the operator, the rotting refuse heaps of accidental complexity we are accustomed to dealing with in software development will melt away. Self-modification will take its rightful place as a mainstream programming technique, rather than being confined to malware and Turing Tarpit sideshows. Just imagine what kind of things one could do with a computing system unpolluted by mutually-hostile black box code; one which could be understood in its entirety, the way you understand arithmetic. Today’s CPU designers have mind-boggling swaths of silicon real estate at their disposal. Yet they are shackled by braindead architectural dogmas and the market’s demand for backwards-compatibility with a 1970s traffic light controller. This scenario could have been lifted straight from a 1950s science fiction comedy.

The foundations of the computing systems we use are built of ossified crud, and this is a genuine crime against the human mind. How much effort (of highly ingenious people, at that) is wasted, simply because one cannot press a Halt switch and display/modify the source code of everything currently running (or otherwise present) on a machine? How many creative people – ones who might otherwise bring the future to life – are employed as what amounts to human compilers? Neither programmers nor users are able to purchase a modern computer which behaves sanely - at any price. We have allowed what could have once become the most unbridled creative endeavor known to man short of pure mathematics to become a largely janitorial trade; what could have been the greatest amplification of human intellect in all of history – comparable only to the advent of written language – is now confined to imitating and trivially improving on the major technological breakthroughs of the 19th century – the telegraph, telephone, phonograph, and typewriter.

Brokenness and dysfunction of a magnitude largely unknown for centuries in more traditional engineering trades has become the norm in computer programming. Dijkstra believed that this state of affairs is the result of allowing people who fall short of top-notch in conventional mathematical ability into the profession. I disagree entirely. Electronics was once a field which demanded mathematical competence on the level of a world-class experimental physicist. Fortunately, a handful of brilliant minds gave us some very effective abstractions for simplifying electrical work, enabling those who had not devoted their lives to the study of physics to conceive ground-breaking electronic inventions. Nothing of the kind has happened in computing. Most of what passes for widely-applicable abstractions in the field serves only to hamstring language expressiveness and thus to straightjacket cube farm laborers into galley-slave fungibility, rather than to empower the mind by compartmentalizing detail. (OOP is the most obvious example of such treachery.) As for invention, almost everyone has forgotten what genuine creativity in software development even looks like. Witness, for instance, the widespread belief that Linux exemplifies anything original.

I predict that software complexity will eventually cross over the border into the truly unmanageable, and we will begin to see absurdities worthy of Idiocracy. Perhaps this time has already come. I realize that my claim to competence at re-inventing computing from scratch is tenuous at best; yet thus far almost no one else is willing to even contemplate the possibility that we are faced with systemic problems which cannot be solved in any other way, and will continue to worsen.

This entry was written by Stanislav , posted on Monday August 03 2009 , filed under Hardware, Hot Air, LispMachine, MIT, Memory, Papers, ShouldersGiants, SoftwareArchaeology, SoftwareSucks . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

22 Responses to “Shards of Lost Technology, and the Need for High-Level Architectures.”

  • I hope your code reads better than your prose:-)

    You ripped-off my name, asshole. LoseThos. http://www.losethos.com

  • You just sound a little stilted. Nice vocabulary, but tone-down the arrogance.

    I can read fancy words. God makes riddles when he talks to me.

    God says…
    disagreements Victor Ergo talkers truly stopped improperly
    stretching banished reasoning restored disalloweth peacefully
    silly willingly required write bosses ourself reformed
    tastes narrowness inevitably ken current bowing begun
    folded desires profess dates amazement ornamentedst hereditary
    pronounce trial Apostles All unpleasantly front fantastic
    drunken stated detain strikes cleansed clothed Employee
    Whence C judgment clave pay blow exalted involved cannot
    hadst institution faults wait wearing earnest invest foundation
    Bishop sue livest tower divine unwholesome incurable holies
    page wrench established ensample despised vowed collect
    ibiblio stretching twenty doubtless flagitiousness seed
    May path direction unthankful prated wearisome unhappily
    ear condition flies regardest shoes Glad Doubt Him hook
    meditated assigned conversation riddle infuse collectively
    lives ALL words different Intelligences pressure trusting
    END displacing large servants

    Your writing sounds like a rant of a person more crazy than I am.

  • Raoul Duke says:

    kudos. keep it up. seriously.

  • passing through says:

    Your heart is in the right place but you do not come across as a practical person.

    The most obvious path to your apparent goal is to:

    STEP 1: start working today, in clojure, to craft a toy implementation of one of loper os’s desired components (if I were you I’d pick out orthogonal persistence). The goal here is to settle on a clean, comprehensible api and semantics for loper os orthogonal persistence. Write this as simply and generically as you can — avoid clojurisms you don’t intend to pull into loper os. Where you find yourself dipping into Java-land take copious notes — such ‘dips’ are like tips of icebergs, indicating functionality loper os must eventually supply — and write abstraction layers such that your persistence code doesn’t directly touch the java underbelly. (These abstraction layers, of course are another api and semantics pairing to get right: since you’ve just written v.01 of your persistence layer the persistence abstractions are v.01 of your memory hardware interface).

    You will walk away from this exercise with:

    - a working kernel of an orthogonal persistence layer
    - a draft api / semantics for said orthogonal persistence layer
    - a draft memory-hardware abstraction layer
    - copious notes about what demands this orthogonal persistence layer places upon a hypothetical loper os machine (including functionality the os needs to supply and specific hardware support that’d be most beneficial)

    STEP 2: From there you can repeat, moving on to say a process manager or an object inspector (coding against your persistence api from step 1.)…rinse and repeat.

    Eventually you will arrive at an almost-loper: its implementation will not (yet) be elegant and it will not (yet) be truly self-hosting, but the core of the system will work (if slowly, and without any real hardware assistance); the important things are the semantics and the available apis, and you’ll have those, and semi-independent implementations thereof.

    From there you can take stock:

    - has custom silicon become more easily attainable? Take your (now *very* copious notes) and get to work on figuring out what you want your custom silicon to do and how you can graft your codebase onto said silicon.

    - still haven’t excised high-level java dependencies (like strings)? start whittling them down one-by-one, replacing the code behind the abstraction layer with simple-lisp implementations coded against a suitable lower-level abstraction layer (eg one that abstracts java arrays and presents to you as raw memory)

    - missing important functionality? write that in simple-list against the existing apis.

    This is the beauty of a simple, comprehensible system — you can start in the middle and grow it out into a nucleus that’s very portable (even if porting to a particular platform requires some *very* inelegant code for grafting it on).

    Since lisp source is particularly rewritable (thanks mostly to macros) it shouldn’t matter much which particular platform you bootstrap off of; I suggest picking clojure (and crafting yourself a style guide of what clojurisms you want and don’t want, then sticking to said style guide in the “kernel” code) as it gets you a lot of modern infrastructure for free (pervasive unicode shaves man-years off your eventual labor, here) and seems likely to be the most commonly-spoken lisp dialect in another 10-12 years if present trends continue…but any lisp will do, really.

    If you don’t like my proposed strategy — start in the middle, get the semantics right, then grow it down to hardware (and up to userland applications) from there — there’s always the bottom-up approach, but I’d not give you good odds with that approach; we’re decades away from when it’s likely to be sufficiently economical to really get good custom silicon spat out of a print-on-demand foundry, and in honesty getting the set of special hardware features you’re going to want right is just not so easy if you don’t already have a software implementation at least sketched out.

    You could, right now, be making substantial progress towards your ostensible goals, using the tools of today to bootstrap your way to the tools of tomorrow (really, of yesterday); doing so might take time away from what is apparently your primary hobby — scouring the internet for contrarian articles, looking for ego validation ego — but it’s an option at your disposal.

  • A nony mous says:

    Ahh, Terry A Davis has finally gone off the deep end, along with his amazingly braindead OS design.

    Don’t you feel silly for fawning over it now…?

  • Simon Hawkin says:

    Who is this Terry Davis anyway?

  • Jouni Osmala says:

    If you have equal amount of power and hardware resources, jit with hardware designed as compiler target should be faster. The microcoded approach would be like writing a jit, without ability to upgrade newer versions. And having parser consuming power that in normal systems would be done once, but this time its all the time.
    Lets repeat after me. Execution operations are free, decoding instructions and choosing which instruction to execute is expensive. If you complexify the COMPLEX portion of hardware that takes majority of the 6 year developement cycle you might even get CPU that costs 10 years of work for 1000 engineers. And performs 10x slower than anything else on market because they need to do more work per executed instruction in runtime than anything else.

    • Stanislav says:

      > If you have equal amount of power and hardware resources, jit with hardware designed as compiler target should be faster.

      Theoretically faster, yet pervaded with brokenness and bloat which mostly cancel out the speed gain.

      http://www.loper-os.org/?p=55
      http://www.loper-os.org/?p=37

      > If you complexify the COMPLEX portion of hardware that takes majority of the 6 year developement cycle you might even get CPU that costs 10 years of work for 1000 engineers. And performs 10x slower than anything else on market because they need to do more work per executed instruction in runtime than anything else.

      Have you read *any* of the historical literature on the subject? The C Machine is not provably the ultimate speed demon. It too contains parts which perform work that would be rendered irrelevant on a more intelligent architecture. The MMU, for instance. That silicon real estate could hold cache instead.

      In any case, the obsession with speed is moronic. Speed does not equal functionality.

  • The war is over. Forth lost. Well … maybe not:

    http://www.intellasys.net/

  • Julian Morrison says:

    The Java JVM illustrates both what’s right and wrong with this idea. It is possible to stack dump every running process in a JVM. It’s even possible in theory to decompile it into completely equivalent Java code, edit it, recompile and replace it (although I don’t know of any attempt to do this). People have made pure-JVM CPUs, although they weren’t very impressive. The JVM usually hides the conventional CPU – all code inside it might as well be executing over the metal, and you get most of the advantages.

    But, the JVM locks in an OO design, and a language, Java, that implements it about as tightly as C implements x86 machine code. And so other languages that run on the JVM have a compilation step at least as involved as Scheme on an x86. Following compilation, they look like Java – and they decompile to Java. The “one language on one CPU that implements it” model breaks down, in Java as it would in Scheme. Ultimately, languages move faster than hardware, even simulated hardware, and there are lots more of them each with their own ideas as to the minimal set of operations. Thus there’s no point trying to make a perfect CPU. (There may be some point in trying to make a good and very language agnostic VM that can smooth out some of the repetitive work of compilation; LLVM is trying this.)

  • [...] What kind of architectures would computers have if we could go back and start over? The foundations of the computing systems we use are built of ossified crud, and this is a genuine crime against the human mind. How much effort (of highly ingenious people, at that) is wasted, simply because one cannot press a Halt switch and display/modify the source code of everything currently running (or otherwise present) on a machine? How many creative people – ones who might otherwise bring the future to life – are employed as what amounts to human compilers? Neither programmers nor users are able to purchase a modern computer which behaves sanely – at any price. We have allowed what could have once become the most unbridled creative endeavor known to man short of pure mathematics to become a largely janitorial trade; what could have been the greatest amplification of human intellect in all of history – comparable only to the advent of written language – is now confined to imitating and trivially improving on the major technological breakthroughs of the 19th century – the telegraph, telephone, phonograph, and typewriter. [...]

  • [...] Loper OS » Shards of Lost Technology, and the Need for High-Level Architectures. (tags: architecture history hardware lisp scheme self-modification) [...]

  • Pete says:

    Very late my time here, but a bitter, well-argued piece. Don’t listen to the know-nothing idiots that think your prose is too purple: it’s precise, detailed and coherent.

    It’s a practical call-to-arms for people who care that they’re producing, if not rubbish, then certainly code and ideas fro the scrap-heap. Seize the day, people, and think how we can do it better.

  • [...] of the 19th century – the telegraph, telephone, phonograph, and typewriter.” – Stanislav Datskovskiy Category: Uncategorized  |  Comment (RSS) [...]

  • [...] turning Loper OS into an ab initio CPU architecture project, I have been using Xilinx development boards for prototyping.  For the past [...]

  • Matt Campbell says:

    To fully appreciate the degree to which machine code has to “drive stick” in register allocation and stack discipline, I suggest reading the Wikipedia article on x86 calling conventions.

    I think it’s also instructive to compare the disassembly of the Java bytecode for a simple program with the disassembly of the same program compiled to x86-64 machine code (by GCJ). I’ve posted an example of this kind here.

    A few things I noticed about the two disassemblies:

    Java bytecode reflects the abstractions of the Java language, e.g. a method whose argument count and argument types are known. In contrast, though we sometimes refer to our microprocessors as C machines, the x86 and x86-64 processors don’t even know the number of arguments that a function has, let alone their types. The compiler knows these things, but that information isn’t preserved in the executable program, unless perhaps the program is compiled with debugging info included, and with some optimizations enabled. (On x86-64 in particular, I see a lot of “” when looking at GDB stack traces.)

    The machine code produced by GCJ has to explicitly check for null pointer exceptions at a few points. No need for this in the Java bytecode, since the Java bytecode instruction set is designed for a machine that has this ability built-in.

    The Java bytecode is simply a lot easier to read, since it’s a lot closer to the abstraction level of a high-level language.

    These days, the tasks that truly require maximum efficiency should be delegated to GPUs. I agree with Stanislav that CPU instruction sets should be designed for ease of debugging. The sheer amount of software written in C/C++ means that this is unlikely to happen soon. But I suppose the amount of software written in Java and C#, or languages that run on the Java and .NET platforms, is cause for some hope. A Java machine might not be as elegant as the Scheme-79 chip, but it would surely be better than our current processors that barely even understand the abstractions of C.

  • Matt Campbell says:

    I wonder if a microprocessor designed to run JavaScript code (or at least a bytecode representation of JavaScript semantics) would be close enough to what you want. After all, JS is dynamically typed and garbage collected, so at least at run time, it’s much more Lisp-like than what our current CPUs run. I bring this up because currently, JavaScript seems to be the one high-level language that microprocessor vendors would be most likely to design for, if they decide to try designing a CPU to run a high-level language directly.

  • John says:

    Dear Stanislav,

    How is progress on your FPGA project Jupiter? I really believe that x86 and modern operating systems are (as in your analogy) the kazoos of the computing world. I am ready for a violin. Is there anything your readers can do to help, like contributing source code?

    Best,
    -John

    PS Do you have a Twitter or somewhere that you write more often? I am a big fan.

    • Stanislav says:

      Dear John,

      Minor nitpick: Jupiter is the (working title of) the Linux-based emulator, rather than the FPGA-based system.

      And I am quite amused that you wonder if I have a Twitter account. What, exactly, ought I be posting there? The mundane minutiae of my two day jobs?

      On the very rare occasions that I come to think I might have something worth saying, I say it here. Without a character limit or an idiot chorus.

      Yours,
      -Stanislav

  • [...] this http://www.loper-os.org/?p=46 makes me wonder. Surely there are some enthusiasts somewhere, plotting a Kickstarter project or [...]

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">