No Formats, no Format Wars.

Computer users are forever being misled, successfully lied to, sold “old wine in new bottles,” bamboozled in a myriad ways large and small.  Why?  Simply because we are, to use the technical term, suckers.  Not always as individuals, but certainly collectively.  The defining attribute of the sucker is, of course, an inability to learn from experience.  And it seems that meaningfully learning from our mistakes is a foreign concept to us.  Nay, it is anathema.  The darkest heresy imaginable.  Something no one would bring up in polite company.  Something only spoken of by rabid crackpots, on their lunatic-fringe blogs, during full moon.

We will happily savor the same snake oils again and again, every time the same non-solutions to the same non-problems – because we refuse to learn from the past.  And much of the history of personal computing can only be understood in light of this fact.  For instance, we appear to have learned nothing from the GIF debacle.  Unisys tried to use software patents to impose a tax on all Internet users, and everyone jumped ship from GIF to other graphics formats – ones supposedly out of the reach of patent trolls.  As though anything could be safe from the well-funded and litigious while software patents remain legal.  So nearly everyone switched to PNG and the like, and the storm died down.  And no one learned the real lesson, which is that the whole notion of a Web “format” was a fundamental mistake.

And now format wars rage once more – this time over video codecs.  Patent trolls smell the blood and fear of lucrative, juicy prey: YouTube et al.  Web users and content providers live in terror, dreading the day when they will have to switch video codecs.  As we all know, this is an exceedingly unpleasant process.  First, the web browser or server must be lifted on hydraulic jacks.  Then, its hood is opened, and greasy mechanics will grimly crank the codec hoist, lifting the old video engine out from its moorings.  The vacant compartment must be scrubbed clean of black, sooty HTTP residue before the new codec can be winched into place.

Wait, this isn’t how your WWW stack works?  What do you mean, it’s a piece of software?  Surely this doesn’t mean that it is a magical artifact with functionality which can be altered in arbitrary ways at any time?  Turing-completeness?  What’s that? “This room stinks of mathematics!  Go out and get a disinfectant spray.” There’s simply no such thing as a machine which can be rewired on a whim while it runs! Everybody knows that!  If you want altered functionality, someone must physically replace the shafts and gears!

If this isn’t how our computers work, why do we act as if it were?  The core idiocy of all web format wars lies in the assumption that there must necessarily be a pre-determined, limited set of formats permanently built into a web browser.  Or, if not permanent, then alterable only through the use of clunky, buggy “plug-ins.”  Of course, this is pure nonsense.  And the fact that it is nonsense should have been obvious from the beginning, because the idiocy of laboriously-standardized data formats was obvious half a century ago – long before interactive personal computing:

“So here’s a couple of knocks on the head I had over the years. I just want to tell them to you quickly. This one I think you’ll find interesting because it is the earliest known form of what we call data abstraction. I was in the Air Force in 1961, and I saw it in 1961, and it probably goes back one year before. Back then, they really didn’t have operating systems. Air training command had to send tapes of many kinds of records around from Air Force base to Air Force base. There was a question on how can you deal with all of these things that used to be card images, because tape had come in, [there] were starting to be more and more complicated formats, and somebody—almost certainly an enlisted man, because officers didn’t program back then—came up with the following idea. This person said, on the third part of the record on this tape we’ll put all of the records of this particular type. On the second part—the middle part—we’ll put all of the procedures that know how to deal with the formats on this third part of the tape. In the first part we’ll put pointers into the procedures, and in fact, let’s make the first ten or so pointers standard, like reading and writing fields, and trying to print; let’s have a standard vocabulary for the first ten of these, and then we can have idiosyncratic ones later on. All you had to do [to] read a tape back in 1961, was to read the front part of a record—one of these big records—into core storage, and start jumping indirect through the pointers, and the procedures were there.

I really would like you to contrast that with what you have to do with HTML on the Internet. Think about it. HTML on the Internet has gone back to the dark ages because it presupposes that there should be a browser that should understand its formats. This has to be one of the worst ideas since MS-DOS. [Laughter] This is really a shame. It’s maybe what happens when physicists decide to play with computers, I’m not sure. [Laughter] In fact, we can see what’s happened to the Internet now, is that it is gradually getting—There are two wars going on. There’s a set of browser wars which are 100 percent irrelevant. They’re basically an attempt, either at demonstrating a non-understanding of how to build complex systems, or an even cruder attempt simply to gather territory. I suspect Microsoft is in the latter camp here. You don’t need a browser, if you followed what this Staff Sergeant in the Air Force knew how to do in 1961. You just read it in. It should travel with all the things that it needs, and you don’t need anything more complex than something like X Windows. Hopefully better. But basically, you want to be able to distribute all of the knowledge of all the things that are there, and in fact, the Internet is starting to move in that direction as people discover ever more complex HTML formats, ever more intractable. This is one of these mistakes that has been recapitulated every generation. It’s just simply not the way to do it.”

Alan C. Kay: “The Computer Revolution Hasn’t Happened Yet”


Why exactly does a browser need to ship with any preconceived notions of how to decode video and graphics?  Or audio, or text, for that matter?  It is, after all, running on something called a programmable computer.  Oh, that’s right, because running code which came in from the network in real time is a dirty and immoral act, one which endangers your computer’s immortal soul.  Which is why it is never, ever done!

In all seriousness, modern hardware provides more-than-sufficient horsepower to make the idea of replacing all media formats with a “meta format” at least thinkable.  Such a thing would consist of a standardized “sandbox,” perhaps one somewhat specialized for media processing.  Something not unlike a competently written, non-user-hostile incarnation of Adobe Flash.  It goes without saying that this would be a far easier sell were we using a non-braindead CPU architecture – one where buffer overflows and the like are physically impossible.  There is, however, no reason why it could not be built on top of existing systems by competent hands.

As for the question of hardware accelerators:  FPGAs have become so cheap that there is simply no reason to ship a non-reprogrammable video or audio decoder ever again.  Why pay royalties and fatten patent trolls?  Let the act of loading the decoder algorithm – whether a sequence of instructions for a conventional CPU, or an FPGA bitstream – be co-incident with the act of loading the media file to be played.  The latter will contain the codec (or a hash thereof, for cache lookup) as a header. [1]  Media player vendors will then cease to be responsible for paying codec royalties – the player hardware or software will have become a “common carrier.”  Let the trolls try to collect danegeld from a hundred million consumers!

At present, working around a software patent is difficult only because switching formats takes considerable work and requires some unusual action on the part of variably-literate users.  An end to this situation may very well mean a decisive victory over patent trolls – not only because software and hardware makers will be able to skirt accusations of patent infringement by out-pacing their attackers [2], but also because it will undermine the main source of income sustaining the patent trolls’ day-to-day corporate existence: royalties from proprietary decoders shipped with consumer equipment such as DVRs and MP3 players.

Wipe out the patent parasites, and at the same time fulfill the original promise of the Web by liberating us from the mind-bogglingly idiotic notion of the “browser” and its “formats.” Sounds like a good deal to me.


[1]  Of course, it is not necessary to include a given decoder blob with every corresponding media file.  Caching can be used to conserve bandwidth.  I need not spell out the details of how to do this – it should be obvious to the alert reader.  However, such clever tricks are not as necessary as one might imagine.  Just compare the bitwise footprint of a typical media codec (implemented on existing systems) with, say, that of a typical YouTube transfer session!

[2]  It may even be possible to automate the process of making minor-yet-legally-significant alterations to decoder and encoder algorithms, faster than patent trolls could search for new angles of attack.

This entry was written by Stanislav , posted on Tuesday January 18 2011 , filed under Distractions, Idea, ModestProposal, NonLoper, SoftwareSucks . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

30 Responses to “No Formats, no Format Wars.”

  • Brent says:

    How do you distinguish between what you’re proposing and (the as-hyped-in-1995 potential of) the web version of the jvm, which has been around at least since the old long-obsolete hotjava browser?

    • Stanislav says:

      Dear Brent,

      Ideally: download and execute native code, in real time. On top of an operating system which isn’t a piece of shit, and doesn’t open you up to a world of hurt as a result.

      Think of it this way: a web page as they now exist is merely a program for a rather-inefficient virtual machine, which causes your computer to display a document (from a rather restricted configuration space, at that.)

      I should add that on a correctly-designed computer, the distinction between loading a web site and downloading/running a native executable need not exist.

      As for security: do not confuse the limitations of braindead operating systems with the laws of physics.

      Yours,
      -Stanislav

      • dmbarbour says:

        Ideally: download and execute native code, in real time.

        How is this ideal compatible with your earlier assertions about computer insecurity: the option of not allowing opaque, deliberately incomprehensible, potentially hostile blobs to visit our computers is simply not on the table. Or your Laws of Sane Computing – VI Reveals purpose: All of the information contained inside the machine’s storage array (see the Third Law), whether executable or not, shall be accessible at all times for inspection and modification by the operator, in the form preferred for modification.

        Anyhow, I think Curl language is close to your proposed ideal… except that the implementation is proprietary.

        • Stanislav says:

          Dear dmbarbour,

          >How is this ideal compatible with your earlier assertions…

          Excellent question.

          On the PC, it certainly isn’t. Using a “C machine” costs you laws “IV”, “VI” – and arguably “III” and “V” – right off the bat.

          The only way to implement this idea in a fully “Seven-Laws Compliant” way is to have exclusively non-opaque, non-incomprehensible blobs, which are subject to inspection/modification at all times. So you would have to part with the PC architecture, replacing it with something rather different.

          >Anyhow, I think Curl language is close to your proposed ideal… except that the implementation is proprietary.

          If proprietary plug-ins for existing browsers are your cup of tea, why not use Adobe Flash?

          Yours,
          -Stanislav

          • And how exactly do you propose to prevent someone from packaging their software as an incomprehensible blob? Have you discovered a programming language in which it is impossible to write obfuscated code?

            • Stanislav says:

              Dear Bassett Disaster,

              Of course anyone can choose to write obfuscated code. Just as anyone can choose to slash my tires in the night, and probably won’t be made to answer for it.

              It is an ethical problem, just like certain others I have discussed. There is no strictly technological answer to an ethical problem, and I won’t pretend otherwise.

              My long-term hope is that once it becomes easy to supply users with “comprehensible blobs”, it will at some point become socially expected, in the same way that refraining from slashing tires is socially expected. There is some precedent after all: consider Linux users’ disdain for closed-source kernel modules.

              Yours,
              -Stanislav

              • That Bassett Disaster says:

                Sorry — I didn’t mean to imply that anyone would be doing deliberate obfuscation. My experience is that all non-trivial software is essentially incomprehensible to essentially all people. It takes far more effort than most people are willing to invest to improve on that.

                The fact that you frame Turing-completeness as something that makes a piece of software simpler to maintain than a physical machine betrays either some kind of confusion about fundamental concepts in the theory of computability, or ignorance of the complexity of production software, or both. I mean, I just love the fact that the vast majority of the behavior of my app is undecidable — it makes it a snap to ensure that my fix for one bug didn’t cause another bug somewhere else!

                (What’s the difference between a mechanic and a software developer? A mechanic won’t let the car out of the garage before they’re sure it’s safe; a software developer is just never sure the program is safe.)

                Let’s grant you a “correctly” designed computer that runs Lisp (a fairly good one) natively, and a world filled with open-source software available immediately from the network, complete with comments and identifiers — enough that you have no need for closed-source. Now, find the bug in this code you just downloaded:

                https://github.com/franzinc/aserve/blob/master/main.cl

                (This is just an example. The actual code you downloaded was probably a different 3000 lines written by someone else to do something else.)

                So, how long do you think it will take to find? Or, if you think there isn’t one — how long do you think it will take to write a proof of that? Or should we simply hold that there is no reason to believe there is a bug in it, until someone notices it, and the damage is already done? That sounds a lot like where we are now.

                Detecting the bug (assuming it’s the kind that causes a fault that can be detected by a computer) and dropping into a debugger when the user does encounter the bug would, of course, help them find and fix it — if they are a programmer. Sadly, the vast majority of computer users aren’t programmers, just like the vast majority of drivers aren’t mechanics. Good luck convincing them that they should use their time to exert political and economic pressure towards making large-scale changes to the computer industry that would finally put the tools that they have no use for into their hands.

          • dmbarbour says:

            If proprietary plug-ins for existing browsers are your cup of tea, why not use Adobe Flash?

            It is not my cup of tea. I mention Curl because it is closer to your vision. It isn’t just a plugin. You can also use a Curl app, that is simply a box that downloads, compiles, and executes Curl code, which seems closer to your vision above.

            Curl has other properties that make it worth considering. It provides a gentle gradient between HTML and RIA (sound, animation, 2D and 3D, large data set manipulation, etc.) without requiring polyglot programming. It supports limited composition – e.g. Curl app inside a Curl app, via sandboxing. Flash doesn’t provide these advantages, though it is better for video at the moment.

  • _mind says:

    You’re only thinking about one dimension of the problem. In reality, you will run up against the “expression problem”, in that your immediate use will be more apparent, but orthogonal un-thought-of reuses will become harder or impossible. A java/flash applet executes instructions and is a step towards the kind of universal object format you talk about, yet they are harder or impossible to index, style, and adapt to local UI conventions.

    The primary feature of declarative protocols/languages is to convey a common meaning of data without specifying *how* it needs to be interpreted. Try to do this with imperative code and you’ll run up against the halting problem.

    (programming logic as a standard PC feature would be great, but there will still be a market for video ASICs for low power (mobile) and high-quality realtime encoding. also how would you propose to set a standard format for FPGA bitstreams when topology/block capabilities/timing is ever-changing. Once again we need a declarative format).

    • Stanislav says:

      Dear _mind,

      Re: software: this is why we need to dispense with the whole idea of a “web browser.” Unify machine architecture and the VM in question, stop treating the network as special and separate from local storage.

      Re: indexing: Have the CPU execute S-expressions directly, as discussed in my other posts. These are nicely searchable.

      Re: FPGA bitstreams: Here’s one attempt at standardization in spite of the hostility of chip vendors: the “MPGA.”

      Yours,
      -Stanislav

  • [...] This post was mentioned on Twitter by news.yc Popular, Michael Foukarakis, bartezzini and others. bartezzini said: No Formats, no Format Wars: Comments http://goo.gl/fb/Mxqyk [...]

  • Lambda says:

    - Chrome’s NaCl is somewhat going in the direction you suggest, but using LLVM bytecodes as substrate.
    - I’d prefer that we used lambda calculus instead. It’s 100% safe, and 100% equivalent to a Turing machine. (eg Haskell compiles to it)… i.e. it can’t hijack your machine which is the main objection to this idea.

  • Andrew Wahbe says:

    And how exactly does search indexing work in your hypothetical utopia? Do I have to run the program and make sure I somehow hit every possible output state?

    An HTML page is a program — written in a declarative language. That allows it to also be used as a data format — which give us things like search and simple authoring. See: http://www.w3.org/DesignIssues/Principles.html#PLP

    I’m all for making the web “better” — as long as we realize what the trade offs are.

  • [...] Loper OS » No Formats, no Format Wars. (tags: opensource) [...]

  • Lambda says:

    @Andrew. For static content TEX & Postscript do both. For dynamic content, the problem you mention applies to Javascript too.

  • Jason Treit says:

    This is the best piece of persuasive technical writing I’ve come across in a long time.

    One aspect I’d be keen to see pressed further is how third parties would get these unformatted sandboxes talking to each other. Not at hour zero, but down the line, when someone thinks of a clever algorithm to run across millions of videos, or wants to build a topology of data from different augmented reality games, or god knows what. Not to imply that browsers and formats necessarily hasten rather than delay such progress. Consider, though, Alex Russell’s defense of the “textual, interpreted nature of HTML” and its now-ingrained crop of externalities.

    While I share Kay’s bias for bootstrapped interpretation, it’s important to reconcile that desire with historical awareness of a single, simple, lax markup construct that resulted in a complex system unlike anything before or since. No other solution wove together the web.

  • dmbarbour says:

    Passing around native code is analogous to sending a monad, except using the same opaque type for everything. video :: IO (), text :: IO (), image :: IO (). There is some anticipated environment that the IO () can manipulate (e.g. by calling a set of common functions). Presumably, the environment could be virtualized for security, same as we currently run OS-inside-OS.

    While this allows a lot of flexibility for content, it also makes the content opaque to further modification, transclusion, stylization (e.g. with CSS), composition. It would be extra-difficult to add subtitles to a video, modify for a mobile phone, translate a page of text presented as a 2D canvas, annotate a video stream with meta-content such as time and geographic information and named content (e.g. frame 2103 contains “vase”, “flower”, “John”). If we ever wanted a zoomable user interface, we’d need to trust that the video’s code is well designed to consume fewer resources while running in a smaller window.

    I have similar objections to JavaScript. Overuse of JavaScript makes accessibility difficult – e.g. for blind readers or language translation or search indexing or adapting to mobile phones. And there is a reason for JavaScript’s same origin policy.

    The use of ‘formats’ for information exchange – or some other form of higher level, more declarative language – seems to me a better option for many purposes. But I do favor something that can be efficiently reduced to native speeds.

    • Stanislav says:

      Dear dmbarbour,

      Perhaps one solution to the problems of search, translation, annotation, etc. would be to take runtime snapshots of the execution state, particularly display-writes. I won’t pretend this is trivial. I have spent much time thinking about these and other issues, and cannot yet offer a good answer to all of them.

      Yours,
      -Stanislav

    • jhuni says:

      I have objections to any use of JavaScript at all. This language is horrible and to large extent JavaScript is to blame for the sad state of the web. If we had a more powerful programming language some of these things like emulating HTML would be practical.

      But it is not just JavaScript that is the problem, file systems are a huge part of the problem. People send files over the network rather then sending just what is needed which is the cause of an enormous amount of waste.

      What we really need is as you pointed out, a declarative language (e.g Prolog), or at least a language that reduces side effects to a bare minimum. Then when our langauge uses any side effects we will store them in a log so that we can achieve universal undo. That way if we run a program in this language, it can’t do any lasting damage because we can undo everything that it does.

      So a side effects logging system which obeys the second law of sane computing is what we should strive for.

      In order to make this practical we will need efficient persistent data structures. The language out there which has the richest set of persistent data structures I know of is Clojure. Furthermore, since Clojure is capable of using any libraries on the JVM it is an immediately practical language, so if we truly want to develop a sane computing environment, we should start by developing a desktop environment in Clojure. The JVM will only be used for the side effects component, so later that component may be replaced with something more Lisp-like if we can build a superior Lisp VM.

      • jhuni says:

        Proof carrying code in conjunction with certified compilation offer an efficient means of safety verification based upon mathematical logic:

        http://www.cs.berkeley.edu/~necula/Papers/thesis.pdf

        • Stanislav says:

          Proof-carrying code is a cargo-cult scam. See here.

          • jhuni says:

            Your link only explains that PCC is only a useful security verification technique in exceptional cases when you must use low level binary formats, which is certainly the case, however, I wouldn’t go so far to say that is a “scam.” It is just one technique we should consider using, regardless of what standards bodies say about it.

            But as I mentioned before, what we should strive for is declarative protocols that automatic most side effects, such as memory management, persistence, versioning, compilation, etc. Lisp has been the best example of this ever since it introduced one of the most important forms of automation to the world: garbage collection.

  • Leo Richard Comerford says:

    I have responded (indirectly) to this post on Hacker News (response, response, response).

  • Roger says:

    Ultimately, I am mostly glad Google took the stand they did with H264, although it serves to prop up another proprietary format flash, it does not lock us into a only temporarily free format. However, I feel they should have done it from the beginning, rather than snub Firefox, only to recant later. As the saying goes, better late than never, but it makes it much more difficult for me to view their actions as being made for the right reason, as opposed to a snub against Apple.

  • Your proposed solution creates as many problems as it solves. Forget the “reusing video objects in mobile hardware” crap the user above talked about. Think simply of reading vs writing. The simple fact is that each object naturally has multiple representations. One representation is commonly tailored to display the object, the second representation to edit the object. There may or may not be multiple display representations too. Consider also the problem of upgrading.

    Yes you’re making it easier to upgrade display representations by decoupling them from the rest of the OS. But guess what? You’re coupling them to the objects, which means they’re MORE difficult to upgrade in other senses. You went from “it’s impossible to upgrade the representation without an act on the part of the user” to “it’s impossible to upgrade the representation WITH an act on the part of the user”. Which means, you’ve disempowered users, bravo.

    I suppose you could design a bastard hybrid so that you distribute objects with code but you create code-modifying modules that upgrade an objects’ obsolete methods to newer methods whenever they land in your computer. (This way representations can evolve, and they can evolve in direct proportion to their popularity.) Assuming you’re able to sandbox this properly, and assuming your meta-software (”browser”) has decent economic models to prevent DoS attacks in all of that code people would spread around, then this would be an elegant solution to obsolete formats / code hanging around. But it still wouldn’t work.

    You see, even if you’ve got code that modifies code, you’re creating two tiers of code (code vs code that modifies code) just like you have currently (data vs code). More problematically, you’re forcing programmers and users to think on the meta-level, and we both know programmers are stupid. And of course you have to deal with dual representations (reading vs writing) somehow. However all of that’s on the level, and to really understand the pros and cons of your solution, you have to understand it on the meta-level.

    What you’re actually doing is changing code-update strategy from a restrictive (only with user’s prior approval) past the emerging authoritarian strategy (only with DOM authority’s prior approval) to a permissive strategy (try first on originator’s authority). I’m not opposed to this and as I said, it is an incredibly elegant solution to some problems. However, it absolutely requires an object capability system and an utterly solid economic model under the user’s control, and perhaps much else. Basically, you need excellent sandboxes. And now I wish I’d actually paid attention to that technology rather than dismissing it as too mainstream and overhyped.

  • Houser says:

    A high-level operating system should be able to detect the resources a program can access by the kernel libraries it requires. Even if I can’t prove by inspection that a program does what it says it does, I can prove that it doesn’t do things it can’t, and even better this can be done automatically by the kernel. The danger of running an external program is greatly mitigated if you know that it only has access to the graphic services the kernel provides, and doesn’t have access to e.g. data destruction services, settings, or peripheral services like your missile launcher. Users who are programming illiterate still benefit from open code and an OS that understands it and can perform typechecking to report and enforce security invariants.

    Additionally, in a world where this type of architecture and design were commonplace, help services connecting to tech savvy support people could integrate into systems, providing precise instructions on how to deal with any problems that arise. Reproducibility is easy when you can beam the state of your debugger over the wire.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">