The following is a conversation I recently had with an alert and inquisitive reader. It has been redacted for brevity; handles/names/irrelevant things have been removed. Some text has been turned into links to relevant material.
Anyone still reading this page for reasons related to its original purpose is encouraged to read this; it may help to answer some questions about the long-term direction of my work and how it differs from that of other OS crackpots.
reader: I have some questions about _Unfeature V_ and macros in Loper Lisp. dunno if you have a minute?
me: fire away
reader: okay. so _Law IV_ states that “Compilation is to be considered a form of caching, and thus shall happen solely behind the scenes, like all other forms of caching”, but traditional Lisps do macro expansion at compile-time. So I’m wondering when and how macros are expanded in Loper Lisp, and how they relate to self-modifying code in the manner of Dr Massalin’s Synthesis OS.
me: spot-on! i favour macro as 1st class data type, as in ‘f-expr.’
reader: ah, like in Kernel! So according to Shutt’s thesis, fexprs are a bitch to compile if they have access to the dynamically scoped environment. how do you plan to get around that? if at all?
me: Quite easily, actually. Are you familiar with the concept of ‘dataflow CPU’ ?
reader: I thought about this quite a lot, and it amazes me that so few people seem to realise that call-by-name can be decoupled from the macro mechanism. Yes, but only minimally. treat me as an idiot, if it doesn’t waste your time
reader: unfortunately it’s very difficult to find resources about non-Von Neumann archs
me: the closest analogy familiar to most computer users is: a spreadsheet!
reader: mmm. I noticed that there was some parallel with reactive programming
me: when something changes, everything it depends on is updated automagically. so, in a dataflow machine, memory is split into cells forming dependency graphs. the cpus, if you can still call them that, merely walk the graphs.
reader: oh weird, I hadn’t even managed to piece that together. so how does that relate to fexprs? I can’t make the cognitive leap
me: well. the difficulties of compiling run-time anything vanish if you switch to a dataflow machine. the concept of ‘compile’ can also be made to vanish. as does the von neumann bottleneck.
reader: oh, because everything at runtime is (in effect) concurrent?
reader: gah, thanks. I’ve been struggling with this for weeks
me: a dataflow machine should be thought of as a collection of circuits rather than procedures. think ‘fpga’ but of arbitrary size. (in practice, fpga with swappable bitstream pages.)
reader: yeah. now I’m thinking S40 (Moore) but for lisp instead of forth
me: in order to picture this concept correctly, you really have to let go of the concept of cpu in the usual sense.
me: so, for instance, think of the terminal you are now typing into.
on a true dataflow box, it would simply be the end of a circuit (screen memory) directly linked to a circuit with the keyboard matrix decoder on the other end of it.
you thus also lose the concept of ‘interrupts’ or ‘processes’.
a dataflow box doesn’t need a scheduler,
or an interrupt controller.
reader: huh. how do you manage when multiple inputs are trying to write to the same output?
me: inputs cannot try anything
reader: also, I can see already how this helps with _Law I_!
me: outputs can depend on one or more inputs.
with the logic function (e.g. AND, OR) specified.
inputs can only sit there and be depended on (or not.)
reader: this is going to take a while for me to get used to, I apologise. do you have any references on this? I apologise if I missed some on your site—I downloaded the whole site and consolidated a single page version for repeated reading, so it’s not for a lack of effort…
me: there are not terribly many references.
look for the post on my site labaled ‘the book’.
reader: oh, Kogge has some stuff about this?
me: quite a bit.
reader: blast. it’s so hard to pick it up. going to have to hit up my library by the looks of it
me: and some bibliographic paths to follow.
i must warn you, you won’t find any satisfactory references.
because it never really went anywhere.
i ended up having to derive mostly from first principles.
reader: right. I read a little about Muller C-gates at least
me: afaik nobody’s built a dataflow machine on C-gates. but it strikes me as the loudly, abundantly obvious thing to do.
reader: do you mind if I pass on your comment about fexprs in Loper to interested friends? it’ll be comments in public though.
I’ve been trying to convince them about fexprs for a while now. one of them is on the Scheme R7RS committee.
I tried to get him to add fexprs to Scheme. he told me to go fuck myself, haha
me: do what you like, but let it be known that i hate R7 (and R6) with a passion.
reader: hehe. will do!
me: and can’t stand the systems where ‘f-expr’ is presently found. i’m not particularly interested in implementing lisps on von neumann cpus these days.
reader: yeah, very much understood. I tried to write several alternatives myself…
me: and don’t particularly care how people do it.
reader: but I never really understood the important of dataflow until now.
if you do write any Loper-oriented posts in future, I’d be interested to hear more about the Loper Lisp. you said in 2010 that “I still work on my own Lisp dialect and underlying system, but in the meantime I have been productive in Common Lisp”
me: it is almost entirely absent from the literature, unless you have access to a large university library
reader: in the old days you said Common Lisp is an abomination
me: it’s an abomination, but less so than the supposed replacements.
reader: and then after imbibing more Dr Naggum and following his prescription, I think, you ended up saying you wanted to be a Common Lisp virtuoso and that CL is a Stradivarius compared to the usual C-machine kazoo based shit
me: more or less. it’s a poorly kept stradivarius, in very marginal condition
reader: but what I’m wondering is what the divergence is in your view of a (dataflow) lisp compared to Common Lisp
me: but remains a stradivarius.
me: if you want to see what dataflow feels like on your own skin, pick up an fpga and program in verilog for a bit
reader: I think, though, a huge missing piece of my understanding was just how radical are the changes that dataflow causes
me: it isn’t a general-purpose language by any means (the only data type is: the bit) but will give you a sense of the flavour.
reader: fpga and verilog: sounds better than writing spreadsheets at least… thanks for the tip
me: a terrible language, but will give you a feel for what it’s like to write a program where every line ‘executes’ simultaneously.
me: (because it is a circuit rather than a procedure)
me: the proprietary scripting language ‘mathematica’ supports dataflow. but, unsurprisingly, dog-slow.
reader: I always sort of thought that about CPU designs in general. whilst software designers say “how many cycles is this going to take?”, a hardware designer knows how many cycles something is going to take on a chip: one. or else
me: dataflow never caught on because you absolutely need dedicated hardware.
if you want real efficiency, you can’t even use standard RAM. so it gets dismissed as nuttery, on the rare occasions it comes up.
reader: not even DDR4?
me: nope. the memory bus becomes the bottleneck.
reader: ah, the Von Neumann bottleneck
me: because you can only address one machine word (or, if you pay through the nose for ‘dual port’, two) at a time.
reader: right, I see. so it’s like computing a new generation of Conway’s Life
reader: this is a little depressing, I must admit. I’ve been thinking obessively about how a Loper device (which I define as adhering to the Laws + the Lisp Unfeatures) cutting as many corners as possible, to get something to ship; to get people to understand; to get people to use it. I know you’ve said it will take astronomical time, and that you only expect it to have a ripple effect at best.
I thought if you were going to do it the proper way, I might be able to hack something together the ghetto way.
reader: but increasingly, I think that any equivocation over the bedrock is wrong, wrong, wrong.
reader: yeah, I’ve been through Urbit… and TempleOS for that matter. love ‘em. but yeah…
me: the latter is far closer to a decent example than urbit.
reader: I’ve run TempleOS on QEMU. can’t say the same for Urbit of course
me: mr. m gave me a ‘dukedom’ (’DYS’) and i still think urbit is fundamentally retarded. because it builds on a turd.
reader: I managed to get Movitz, the bare metal Common Lisp, working too. tried emulating every Lisp machine that I know of, too. success rate: bloody terrible. (only got one working. you own a real one)
me: x86 is simply not a very habitable place.
reader: went through the T3 Explorer code, at least. some funny stuff in there
me: for ab-initio os work
reader: yeah. just coercing an x86-64 chip into a 64-bit friendly state is… hell
me: http://herpolhode.com/rob/utah2000.pdf (Rob Pike on Systems Research)
reader: it’s a fucking 64-bit chip! why does everything have to be backwards compatible with the pyramids!
me: see ’standards’ section. do you know just what is involved with getting… a usb mouse working?
me: not even talking about a thumb drive. the standards document is book-length.
reader: yeah, I looked at USB drivers. hell. absolute hell.
me: just for usb mass storage. now do video (pick a chip… then come back in a year or two with a working 2d raster display. maybe.)
reader: one of the ramshackle approaches I was thinking of was to rip drivers out of an existing OS, just use those, bury the C-machine below the Loper tide mark. I am slowly unlearning my idiocy
me: been there, done that.
reader: circa 2000, there used to be an OSKit, I think it was developed by the University of Utah, that was modularised for this. that died, nobody maintained it. people are leaving the bedrock behind. they don’t even care these days, just a decade later
me: the pieces into which something breaks, when it breaks, are inescapably relevant.
reader: yeah, the LEGO blocks
reader: instruction set isomorphic to a high level programming language. that’s why I asked you today about the Loper Lisp
reader: very interested in the LEGO blocks you have in mind
me: if an interrupt could stop your cpu between, say, an ADD instruction executing, and the results being copied from the ALU to memory, there would be chaos.
me: (there were machines where this could actually happen.)
reader: one of my meme-mottos: “NOTHING IS THREAD SAFE”
reader: (I bring it out when somebody writes thread safe code and then experience the natural thread bug that takes them by surprise)
me: this is then papered over by locking.
reader: or STM
me: resulting in a single-cpu machine that wedges routinely.
me: (see python’s global interpreter lock, etc)
reader: (as you know, there is no such thing as broken software. it either works, or it doesn’t work. nice parallel to the two speeds, now that I think about it)
reader: ah yes, the GIL…
reader: I also love when a dual core machine runs everything on one core because its kernel is too stupid to realise what can be shunted off to the other core
reader: (and this is “just works” Appleware)
me: the entire procedural paradigm is incurably stupid
me: i’m not particularly interested in attempts to improve it.
me: the correct number of cpus is… zero. instead, dataflow fabric that one could buy ‘by the kilo.’
me: so, in a dataflow machine, memory is split into cells forming dependency graphs.
me: performing the functions of both storage (nonvolatile) and computation.
reader: gold dust like this is what my mind is going to be running on until I can pick up Kogge
me: pick up a book on fpga and see the diagram of how the macrocells and block RAM are connected.
reader: er… CPU merges into RAM, essentially?
me: you end up with LUTs for the logic, programmable crossbar interconnects for the routing, and small giblets of memory scattered throughout.
me: one could use conventional DRAM, and even magnetic disk, but these would have to be used as transparent swaps
reader: have you fabbed any working fpga dataflow designs yet? or are they all in verilog?
me: i have quite a bit of experimental code.
me: presently stuck on reverse-engineering the xilinx toolchain.
me: you can’t actually do what i have in mind unless you have access to either 1) silicon fab 2) fpga with entirely known design
me: lacking the $1B for (1), i went with (2)