July 05, 2002

MIT Thinks About How to Build the Universal Library

One of the problems of the information age has been to figure out how to build the digital library of everything. It looks as though we are evolving a distributed system for indexing and evaluating the quality of information in the universal digital library: it's called Google. But how do we build the tools needed so that everything gets into the digital library? That's still a big question. MIT is trying to solve it.

The Chronicle: 7/5/2002: 'Superarchives' Could Hold All Scholarly Output

The most ambitious and most closely watched superarchive is being developed at the Massachusetts Institute of Technology. It is called DSpace, and its goal is to collect research material from nearly every professor at the institute --Athough participation will be voluntary. "We want to give faculty the infrastructure that supports alternative forms of publishing," says MacKenzie Smith, associate director of technology for MIT's libraries. Over the past two years, officials at MIT have been building a set of software tools to support the repository, and to make it easy for professors to submit material. Those tools are nearly ready, and four departments and programs at MIT will be testing them this summer. Beginning this fall, MIT plans to open the archive to all of its professors. "We don't know how quickly it's going to catch on," says Ms. Smith, though she adds that professors have been enthusiastic about the concept...

i think if it is as brewster kahle suspects:

"Having the capital cost of equipment drop to effectively zero allows you to think bigger. You start thinking about the whole thing. For instance, the gutsy maneuver of saying "let's index it all," which was the breakthrough of Altavista. Altavista in 1995 was an astonishing achievement, not because of the hardware -- yes, that was interesting and important from a technical perspective -- but because of the mindset. "Let's go index every document in the world." And once you have that sort of mindset, you can get really far.

"So if all books are 20 TBs, and 20 TBs are $80,000, that's the Library of Congress. Then something big has changed. All music? It's tiny. It looks like there're only one million records that have been produced over the last century. That's tiny. All movies? All theatrical releases have been estimated at 100,000, and most of those from India. If you take all the rest of ephemeral films, that's on the order of a couple hundred thousand. It's just not that big. It allows you to start thinking about the whole thing."

or jorn barger posutulates on personal storage:

"You certainly, immediately, want it to start archiving everything you read on the Net, for future reference. You want this all to be word-indexed, like a generic search-engine but entirely 'local'. You want all the good stuff to be sorted by topic, into your own personal Yahoo/DMoz, reflecting your priorities... and you want it to watch and alert you whenever a good webpage you've archived/mirrored is updated.

"When you think a whole website looks good, you probably want to mirror the whole site, so that your future local searches will find whatever it offers on that topic. When you start doing this, you can quickly fill your 40gigs... but you can plan to double that capacity every year via inexpensive upgrades. So downloading the equivalent of the Encyclopedia Britannica-- about 200 megs-- every day, forever, is already a perfectly reasonable strategy."

then it's up to a combination of webbot crawlers [like google, the web archive and blogdex], user contributed information [like gooja, h2g2 and everything2, or like stanford's plato] and perhaps new custom search facilities [such as textanalysis (ECM), ontopia and recommind! (located in berkeley :)] to aggregate, integrate and disseminate it all. think of asimov's foundation in every pocket! it'll be like implementing indra's net :)

Posted by: kenny on July 6, 2002 10:12 AM

Raj Reddy at Carnegie Mellon has been seriously working at creating a true Universal Library for the last couple of decades. It would contain "all the significant literary, artistic and scientific works of mankind".

The basics of Raj's vision are at The Universal Library and are well worth a look.

Posted by: Lance on July 8, 2002 09:23 AM
