October 18, 2003

Linkrot!

Kevin Drum writes:

Calpundit: Calpundit Back Up: CALPUNDIT BACK UP....Yeah, I'm back. What a mess. A combination of host problems and Movable Type fragility wiped out my site for the entire day. I'd bore you with all the details, but I'm too pissed off right now to write about it. Anyway, all the posts and archives are back online, although all comments have been lost for the previous dozen or so posts.

Also, all my permalinks changed during the reconstruction process, so if you have ever linked to anything at all here, your links now point to the wrong post. It's not quite the classic version of linkrot, but it's close.

I find that google reports that I have "about 185" references to Kevin Drum or Calpundit, all 185 of which are now pointing to the wrong place. I clearly need to either (a) resign myself to having nearly tenscore dead or misleading links, or (b) learn enough perl to write a filter to rewrite all those links to an intermediate "apology" page and then link them forward to http://www.calpundit.com/.

In any event, thank God I have been good at quoting enough context to make much of the discussion and commentary still intelligible even with the hyperlink gone beyond repair.

But as this goes on--and it will, with individual websites popping up and dying off, with rogue programs trashing databases, and with large corporations deciding to reorganize and break every link--I need to become much better at stuffing more of the context of everything into the "extended entry" page.

That or we need something like the wayback machine to establish a truly persistent and permanent internet archive that we can link to with confidence.

And perhaps a version of movable type that makes it trivially easy to export backups with postid numbers attached, so that the exact file structure can be recovered with trivial ease after a database collapse?

Posted by DeLong at October 18, 2003 12:48 AM | TrackBack

Comments

You must be referring to the Internet Archive Wayback Machine.

http://www.archive.org/web/web.php

Very useful tool. Not perfect, but good for nostalgia, anyway.

Posted by: Harold on October 18, 2003 02:49 AM

The Shifted Librarian has had a lot of posts on this subject. Not surprisingly, being a librarian's web log.

One thing that's humorous to note in reference to your last post is that this precisely the problem a lot of the systems that preceded the web were trying to remedy.

On some level, this is always going to happen. Can't link to a page in a book because the mediums are incompatible. There's a wonderful project that is trying to keep making virtual machines for machines of the past (PDPS, vaxen, etc) so that the software can still be run.

So it's way worse than links. It's content rot. You may be able to find it, but you won't be able to interpret/run/hear/see/interact with it.

Posted by: JC on October 18, 2003 11:01 AM

I had to reconstruct my MT blogs twice within two weeks. It was a pain, but I didn't run into the linkrot problem. Here's why:

Each of my individual archive file names is based on the title of the entry, not based on the entry ID number. Since the title does not change when you move (or reinstall) the blog - the file names stay the same. And all inbound links continue to work perfectly.

To set your MT blog up this way, I used the DirifyPlus plugin - although you could just as easily use the built-in Dirify function:

1) Click Weblog Config settings and choose Archiving.

2) For "Individual", select the "Archive File Template" field.

3) Enter .html

4) Click Save.

The above will save each individual entry like this:

title-of-this-entry.html

The dirify plugin also provides other options such as underscores, etc.

Posted by: Mark Carey on October 18, 2003 04:04 PM

You may be interested in NewsBruiser, which lets you import entries from a different journal system (e.g. MT) via the RSS feed, and which has trackback and (soon) comments. Both trackback and comments have or will have Bayesian spam-catching. The author and I will help you make the switch if you like. Really robust in the experience of everyone who's used it, I believe. Simple to use.

http://newsbruiser.tigris.org/

http://www.crummy.com/2003/10/17/3

Posted by: Sumana Harihareswara on October 19, 2003 12:42 AM
Post a comment