October 17, 2003

Om Mani Padme Net

If I were younger, I would take Clay Shirky to be my internet guru. I would dress in a bright plain robe, and follow him around, chanting:

Shirky: In Praise of Evolvable Systems : Why something as poorly designed as the Web became The Next Big Thing, and what that means for the future.

If it were April Fool's Day, the Net's only official holiday, and you wanted to design a 'Novelty Protocol' to slip by the Internet Engineering Task Force as a joke, it might look something like the Web:

  • The server would use neither a persistent connection nor a store-and-forward model, thus giving it all the worst features of both telnet and e-mail.

  • The server's primary method of extensibility would require spawning external processes, thus ensuring both security risks and unpredictable load.

  • The server would have no built-in mechanism for gracefully apportioning resources, refusing or delaying heavy traffic, or load-balancing. It would, however, be relatively easy to crash.

  • Multiple files traveling together from one server to one client would each incur the entire overhead of a new session call.

  • The hypertext model would ignore all serious theoretical work on hypertext to date. In particular, all hypertext links would be one-directional, thus making it impossible to move or delete a piece of data without ensuring that some unknown number of pointers around the world would silently fail.

  • The tag set would be absurdly polluted and user-extensible with no central coordination and no consistency in implementation. As a bonus, many elements would perform conflicting functions as logical and visual layout elements.

    HTTP and HTML are the Whoopee Cushion and Joy Buzzer of Internet protocols, only comprehensible as elaborate practical jokes. For anyone who has tried to accomplish anything serious on the Web, it's pretty obvious that of the various implementations of a worldwide hypertext protocol, we have the worst one possible.

    Except, of course, for all the others.

    MAMMALS VS. DINOSAURS

    The problem with that list of deficiencies is that it is also a list of necessities -- the Web has flourished in a way that no other networking protocol has except e-mail, not despite many of these qualities but because of them. The very weaknesses that make the Web so infuriating to serious practitioners also make it possible in the first place. In fact, had the Web been a strong and well-designed entity from its inception, it would have gone nowhere. As it enters its adolescence, showing both flashes of maturity and infuriating unreliability, it is worth recalling what the network was like before the Web.

    In the early '90s, Internet population was doubling annually, and the most serious work on new protocols was being done to solve the biggest problem of the day, the growth of available information resources at a rate that outstripped anyone's ability to catalog or index them. The two big meta-indexing efforts of the time were Gopher, the anonymous ftp index; and the heavy-hitter, Thinking Machines' Wide Area Information Server (WAIS). Each of these protocols was strong -- carefully thought-out, painstakingly implemented, self-consistent and centrally designed. Each had the backing of serious academic research, and each was rapidly gaining adherents.

    The electronic world in other quarters was filled with similar visions of strong, well-designed protocols -- CD-ROMs, interactive TV, online services. Like Gopher and WAIS, each of these had the backing of significant industry players, including computer manufacturers, media powerhouses and outside investors, as well as a growing user base that seemed to presage a future of different protocols for different functions, particularly when it came to multimedia.

    These various protocols and services shared two important characteristics: Each was pursuing a design that was internally cohesive, and each operated in a kind of hermetically sealed environment where it interacted not at all with its neighbors. These characteristics are really flip sides of the same coin -- the strong internal cohesion of their design contributed directly to their lack of interoperability. CompuServe and AOL, two of the top online services, couldn't even share resources with one another, much less somehow interoperate with interactive TV or CD-ROMs.

    THE STRENGTH OF WEAKNESS AND EVOLVABILITY

    In other words, every contender for becoming an "industry standard" for handling information was too strong and too well-designed to succeed outside its own narrow confines. So how did the Web manage to damage and, in some cases, destroy those contenders for the title of The Next Big Thing? Weakness, coupled with an ability to improve exponentially.

    The Web, in its earliest conception, was nothing more than a series of pointers. It grew not out of a desire to be an electronic encyclopedia so much as an electronic Post-it note. The idea of keeping pointers to ftp sites, Gopher indices, Veronica search engines and so forth all in one place doesn't seem so remarkable now, but in fact it was the one thing missing from the growing welter of different protocols, each of which was too strong to interoperate well with the others.

    Considered in this light, the Web's poorer engineering qualities seem not merely desirable but essential. Despite all strong theoretical models of hypertext requiring bi-directional links, in any heterogeneous system links have to be one-directional, because bi-directional links would require massive coordination in a way that would limit its scope. Despite the obvious advantages of persistent connections in terms of state-tracking and lowering overhead, a server designed to connect to various types of network resources can't require persistent connections, because that would limit the protocols that could be pointed to by the Web. The server must accommodate external processes or it would limit its extensibility to whatever the designers of the server could put into any given release, and so on.

    Furthermore, the Web's almost babyish SGML syntax, so far from any serious computational framework (Where are the conditionals? Why is the Document Type Description so inconsistent? Why are the browsers enforcement of conformity so lax?), made it possible for anyone wanting a Web page to write one. The effects of this ease of implementation, as opposed to the difficulties of launching a Gopher index or making a CD-ROM, are twofold: a huge increase in truly pointless and stupid content soaking up bandwidth; and, as a direct result, a rush to find ways to compete with all the noise through the creation of interesting work. The quality of the best work on the Web today has not happened in spite of the mass of garbage out there, but in part because of it.

    In the space of a few years, the Web took over indexing from Gopher, rendered CompuServe irrelevant, undermined CD-ROMs, and now seems poised to take on the features of interactive TV, not because of its initial excellence but because of its consistent evolvability. It's easy for central planning to outperform weak but evolvable systems in the short run, but in the long run evolution always has the edge. The Web, jujitsu-like, initially took on the power of other network protocols by simply acting as pointers to them, and then slowly subsumed their functions.

    Despite the Web's ability to usurp the advantages of existing services, this is a story of inevitability, not of perfection. Yahoo and Lycos have taken over from Gopher and WAIS as our meta-indices, but the search engines themselves, as has been widely noted, are pretty lousy ways to find things. The problem that Gopher and WAIS set out to solve has not only not been solved by the Web, it has been made worse. Furthermore, this kind of problem is intractable because of the nature of evolvable systems.

    THREE RULES FOR EVOLVABLE SYSTEMS

    Evolvable systems -- those that proceed not under the sole direction of one centralized design authority but by being adapted and extended in a thousand small ways in a thousand places at once -- have three main characteristics that are germane to their eventual victories over strong, centrally designed protocols.

  • Only solutions that produce partial results when partially implemented can succeed. The network is littered with ideas that would have worked had everybody adopted them. Evolvable systems begin partially working right away and then grow, rather than needing to be perfected and frozen. Think VMS vs. Unix, cc:Mail vs. RFC-822, Token Ring vs. Ethernet.

  • What is, is wrong. Because evolvable systems have always been adapted to earlier conditions and are always being further adapted to present conditions, they are always behind the times. No evolving protocol is ever perfectly in sync with the challenges it faces.

  • Finally, Orgel's Rule, named for the evolutionary biologist Leslie Orgel -- "Evolution is cleverer than you are". As with the list of the Web's obvious deficiencies above, it is easy to point out what is wrong with any evolvable system at any point in its life. No one seeing Lotus Notes and the NCSA server side-by-side in 1994 could doubt that Lotus had the superior technology; ditto ActiveX vs. Java or Marimba vs. HTTP. However, the ability to understand what is missing at any given moment does not mean that one person or a small central group can design a better system in the long haul.

    Centrally designed protocols start out strong and improve logarithmically. Evolvable protocols start out weak and improve exponentially. It's dinosaurs vs. mammals, and the mammals win every time. The Web is not the perfect hypertext protocol, just the best one that's also currently practical. Infrastructure built on evolvable protocols will always be partially incomplete, partially wrong and ultimately better designed than its competition.

    LESSONS FOR THE FUTURE

    And the Web is just a dress rehearsal. In the next five years, three enormous media -- telephone, television and movies -- are migrating to digital formats: Voice Over IP, High-Definition TV and Digital Video Disc, respectively. As with the Internet of the early '90s, there is little coordination between these efforts, and a great deal of effort on the part of some of the companies involved to intentionally build in incompatibilities to maintain a cartel-like ability to avoid competition, such as DVD's mutually incompatible standards for different continents.

    And, like the early '90s, there isn't going to be any strong meta-protocol that pushes Voice Over IP, HDTV and DVD together. Instead, there will almost certainly be some weak 'glue' or 'scaffold' protocol, perhaps SMIL (Synchronized Multimedia Integration Language) or another XML variant, to allow anyone to put multimedia elements together and synch them up without asking anyone else's permission. Think of a Web page with South Park in one window and a chat session in another, or The Horse Whisperer running on top with a simultaneous translation into Serbo-Croatian underneath, or clickable pictures of merchandise integrated with a salesperson using a Voice Over IP connection, ready to offer explanations or take orders.

    In those cases, the creator of such a page hasn't really done anything 'new', as all the contents of those pages exist as separate protocols. As with the early Web, the 'glue' protocol subsumes the other protocols and produces a kind of weak integration, but weak integration is better than no integration at all, and it is far easier to move from weak integration to strong integration than from none to some. In 5 years, DVD, HDTV, voice-over-IP, and Java will all be able to interoperate because of some new set of protocols which, like HTTP and HTML, is going to be weak, relatively unco-ordinated, imperfectly implemented and, in the end, invincible.

  • Posted by DeLong at October 17, 2003 05:31 PM | TrackBack

    Comments

    "the mammals win every time"

    Perhaps I'm behind the current thinking (again),
    but isn't the current theory that the mammals only won due to a convenient asteroid?

    Posted by: Barry on October 17, 2003 07:42 PM

    I think you are seeing causality in the wrong places. HTTP/HTML did not become successful because they were imperfect. They succeeded because they were at the right place at the right time, and they were made available for free, and they found commercial and social purpose.

    Posted by: Alan on October 17, 2003 07:44 PM

    Here's an a fairly well known article form the Lisp programming world http://www.jwz.org/doc/worse-is-better.html which bears a striking similarity to Shirky's post (betcha Shirky read it). It's was written in the early '90s about the more esoteric competition between the Lisp and C languages. The main point is that the language that gets an early advantage ends up winning, because of network effects. Whichever language is first to be available, and good enough to do the job wins.

    It's the same way with everything in computing. Popularity feeds popularity. That's why Bill Gates is so rich.

    Posted by: rps on October 17, 2003 11:28 PM

    I think that there's two points that are important - first, things which work well when partially implemented on a small scale have an edge. They generate immediate payback, develop a pool of skill, and enable rapid versioning.

    But one which is missed ~50% of the time is openness. HTML was an open set of standards, which could be implemented off of a one-page set of instructions. Anybody with access to a computer could start writing; anybody who thought that they knew what they were doing could put out manuals and tutorials, and templates.

    But perhaps the most important feature was that there was little intellectual property protection - I've learned a lot of HTML through the 'view page source' school. This leads to extremely rapid dissemination of knowledge.

    Anything which is harder to copy and tinker with will move more slowly.

    An example from the 1980's would be Lotus and Excel. If you received a spreadsheet, you could generally look at how it was made, and copy any useful techniques.

    The trick is that it's harder to make a profit in the old-fashioned way, through proprietary techniques.

    Posted by: Barry on October 18, 2003 06:01 AM

    Barry,

    A case could be made that the "openness" of HTML et al. is equivalent to its "ability to evolve."

    I think the original thesis is elegant, but too elegant. So I distrust it. The desire for elegant theories leads to drawing connections and conclusions where they shouldn't be.

    Look at the article rps posted. It is full of this kind of error. There is simply no way that Lisp, an interpreted language, would supplant horrible C for tasks that assembly language was used for prior to its invention. Yet the author can't help but lament the Way Things Ought To Be.

    Posted by: Alan on October 18, 2003 08:48 AM

    Alan: Lisp is normally _compiled_.

    Posted by: Walt Pohl on October 18, 2003 09:01 AM

    It wasn't then.

    Posted by: Alan on October 18, 2003 03:52 PM

    According to the article, Lisp had pretty well caught up to C in terms of speed by 1987 (and yes, it was compiled).

    Anyway, Lisp is certainly much faster than Perl, Python, and Java, which started gaining popularity not long after the article was written. I would think that these languages won, and Lisp didn't, because they are more similar to familiar languages like C and shell scripting. Once again, people seek local maxima.

    Posted by: rps on October 18, 2003 06:41 PM

    One of the few underappreciated things about web and Internet protocols is their adherence to what is termed the end to end argument in system design:http://people.qualcomm.com/karn/library.html It was the first paper I had to read in my distributed systems class, and I wish everybody in charge of designing protocols would internalize its principles. In the context of end-to-end it is not hard to see why http. http was concerned with one thing and one thing only: delivering a page of data from one location to another. How either end treated the data (html? pdf? doc?) was none of its business. How any authentication tokens were passed were none of its business either. Needless to say all of those things were ultimately implemented on top of http: this allowed the implementation of login forms, the use of certificates for authentication etc.

    Posted by: washingtonirving on October 18, 2003 08:16 PM

    Lisp has been compiled since the 70s.

    Posted by: Walt Pohl on October 18, 2003 11:31 PM

    An interesting thing about messy (but momentarily advantageous) products gaining acceptance over their cleaner counterparts is effect over the long term... I'll use an analogy from programming to illustrate.

    Where rapid changes are needed, the messy programs have a definite edge. If your program is not beautifully designed according to some grand philosophy but, rather, a hacked-together piece of junk, it's much easier to tweak the appropriate places and get it to do something new, which in a neat modular program often requires a complete overhaul.

    The long term consequences are bloated, unreadable, and ultimately unmaintainable code. This isn't evolution in nature, where nature doesn't need to understand the genome to shift it: this is evolution driven by human beings, and there is a limit to the complexity of a piece of software at which a human being will throw up his hands and say "forget this" (my programming friends would probably use a stronger word starting with f). I wonder if there's something similar for protocols and standards?

    Posted by: Ray on October 19, 2003 09:37 AM

    It was noted above:

    Whichever language is first to be available, and good enough to do the job wins.

    This is SO true. It is, in fact, how COBOL was loosed on the world. IBM was pushing PL/I, which for all its (many) faults is an order of magnitude better than COBOL. However, it wasn't ready until about 6 months after the introduction of the 360 line. So, COBOL was used instead. Six months turned out to be too long. Forty years hasn't been long enough to clean up the resulting mess.

    FWIW, LISP was routinely referred to and though of as an interpreted language until well into the 80's. I always find this locution irritating; interpretation vs compilation is an issue of the implementation, not of the language definition. I've seen C inperpreters. But it's true that most implementations conformed to the stereotype.

    One more comment; this post:

    Where rapid changes are needed, the messy programs have a definite edge. If your program is not beautifully designed according to some grand philosophy but, rather, a hacked-together piece of junk, it's much easier to tweak the appropriate places and get it to do something new, which in a neat modular program often requires a complete overhaul.

    The long term consequences are bloated, unreadable, and ultimately unmaintainable code.

    is a little confused and somewhat self-contradictory. First it says that messy code is easy to change, then it says that messy code is unchangeable. The second of these is correct, as I know from painful experience.

    However, the point I think the poster is trying to make (apologies if my mind reading is off) is an important one: in exploratory situations, where requirements can't be defined in advance, rigor in software design creates paralysis, or at least very slow advance. An easy-to-hack system is what's needed, and LISP fits the bill well. This is not the same as a well-defined case where you're trying to make fast but well defined changes. In cases where stability and good change control are needed the exploratory, quick hack approach leads to disaster. Having the judgement to know which is which is much harder than it sounds.

    Posted by: Jonathan Goldberg on October 19, 2003 03:49 PM

    "The long term consequences are bloated, unreadable, and ultimately unmaintainable code. This isn't evolution in nature, where nature doesn't need to understand the genome to shift it: "

    I'm missing something. DNA "codes" are also bloated, unreadable, and only lately have we been able to attempt "maintenance" tweaks. Nature DOES, it seems to me, hack junk together quickly to partially exploit temporary and ill-defined niches. A flu virus optimized for a goose gets randomly "hacked" to infect pig-systems, in which additional mods take place allowing the code to jump into human-systems. Happens nearly every year, starting in environments where pig-goose sh^t exchanges are common, (mostly in Asia), and spreading into environments where humans customarily have very little exchange with either pigs or geese. (Toronto?)

    But now, uhm. Somebody square for me the preference/admiration of sloppy-platform freely-kludged gradually-evolving systems over tightly-integrated elegant and quantumly leaping systems with local-personal distaste for kludgy Microsoft WIN-DOS as opposed to wonderful Apple Mac-OS?

    Posted by: Pouncer on October 20, 2003 06:18 AM
    Post a comment