January 13, 2003
Consequences of Linkrot

The first lesson is that linkrot is incredibly rapid. The second lesson is that it thus becomes critically important not just to link but to quote--and to quote extensively. The third lesson is that not even fear, surprise, and ruthless efficiency can defeat linkrot. If you want your links to be worth anything in two, three, or five years, download *all* the pages you're linking to to your hard disk.


Idle Words: I've been working with some of the many Movable Type weblogs I got this week, seeing how my search code works and scarfing down the content. I purposefully picked weblogs that had been running for years, and left the dates out of the search display. I'd heard people go on and on about the chains and shackles of reverse chronological order, and I thought I'd experiment with just reading things by topic. Well, it doesn't work. I mean, the search itself works -- you search on dog and get back results on dog -- but what doesn't work is the links. By far the majority of weblog posts are short one-liners with a link in them. The next category after that is the tossed salad variety format -- a paragraph full of loosely connected ideas built around pointers to interesting sites. Of course this is the whole point -- we're supposed to be making a reasonable stab at hypertext -- but it turns out the links are terribly brittle. Reading these grizzled posts is like looking through an old scrapbook, where the writing is clear but the pictures have all bleached to white. The further back you go in the past, the fewer working links you can find. 'Permalinks' to other boggers get broken as people change ISPs, domain names, or software. Links to novelty sites and flavors of the month dry up; links to bubble-era dot coms have gone down with the ship. 'Permanent' links to news sites get retired to a polite 404 every time the software changes.

The irony here is that most of this content still exists. More things get moved around than disappear, and much of what is really gone still lives on in the Internet Archive. But the cost of finding that information skyrockets once a link goes down. Something as simple as a tabbed interface made a difference to thousands of web users because it became easier to open new links. By the same token, any rotted link throws up a wall to the user. Even a custom 404 with a good search box on it, guaranteed to find the content you are after, is no match for a working link. And very often the link is an integral part of the content. Just think of dear old Suck, itself now defunct, where the links were their own commentary. Try reading a few of their back issues from 1998 and see if you can find anything in that link graveyard. The sad part is that these old sites and old posts aren't old by any meaningful standard. The oldest blog entry I've looked at dates from 1998, and the blogger who wrote it is still in his twenties. I have book reports from the fourth grade in a paper bag in my closet, but I can't find a silly Jakob Nielsen parody done two years ago.

We're so caught up in keeping track of who is linking to what just at the moment that we've neglected to think about what is going to remain of the "blogosphere" ten years from now. *Two* years from now, for many sites. The average half-life of a link on an education site is fifty-five months -- less than five years. What do you think the figure is for weblogs? What do you think it will be for trackbacks, or site comments? I keep thinking of the museum up in St. Johnsbury, where they have case after case of stuffed tiny birds, meticulously catalogued, with their feet glued to the branches and their feathers all falling out. And in the corner, a gigantic piebald moose. We need some better way of capturing the web for posterity than just a bunch of screenshots grabs, essential as they are. There's got to be a way to make our links less brittle.

Posted by DeLong at January 13, 2003 09:57 PM | Trackback

Email this entry
Email a link to this entry to:


Your email address:


Message (optional):


Comments

Ted Nelson's original idea for hypertext was for two-way links. The complexityand overhead of this is why he never got anywhere with it. Berner-Lee's one way links allowed the web to happen, but it has led to linkrot. Maybe it's price we pay for convienence...

Posted by: jimbo on January 14, 2003 06:16 AM

The world wide web is fascinating to peruse precisely because it has such low barriers of entry for anyone that wishes to publish. For a nominal cost, paid in either your cash or someone else's cash, it is possible to post articles and opinions very quickly. This leads to the problem that people can publish opinions on the internet without any meaningful way of cataloging that publication. I was thinking recently that organizations involved in producing new common file formats using XML would be well advised to devise a standard for inserting the ISBN of an electronically published document. It would be even better if there was a scheme for confirming the work was unaltered based on an ISBN or something similar.

One could argue that it is the very lack of barriers of entry to this medium that have created the problem. With no good way to gather money there is no funding available for developing common online publishing rules and guidelines. It is also possible that the cost of cataloging the majority of online material simply is not worth it. As online publishing grows there will be a greater amount of material that people do think is worth the cost of cataloging and tracking. At that point there will either be a cataloging system in place cheap enough for everyone to use or a great deal of worthwhile content may be lost.

Posted by: Iain Babeu on January 14, 2003 06:52 AM

don't knock the Fairbanks Museum! those are originals from Mr. Fairbanks' own collection. they suck in certain ways, but it's interesting to see a museum of what a museum was like back in the day.

and do you know how hard it is to run a museum in an impoverished rural area that's not even really near any ski areas?

Posted by: vtvt on January 14, 2003 09:22 AM

I try to copy the contents of things I am interested in personally, but dont bloggers need to worry about copyright? CAN you just copy Paul Krugman's column twice a week so you can say you agree with it?

Posted by: John E on January 14, 2003 09:39 AM

There is indeed a copyright problem. I'm going to try to get some sort of blog entry on it in the next few hours.

Posted by: Tim Hadley on January 14, 2003 03:09 PM

Maybe not everything needs to be saved for posterity. Most weblog posts are useful for a day or a week to sort the news and perhaps add a comment, but after that, does anyone really need them?

Posted by: John on January 15, 2003 09:02 AM

i guess you can either let alexa do it for you or do it yourself, it's a $1/gigabyte now! it's like that study you quoted on the cost of light :)

Posted by: kenny on January 16, 2003 06:39 AM
Post a comment
Name:


Email Address:


URL:


Comments:


Remember info?