December 13, 2003

Another Milestone Is Reached

James Surowiecki writes in the Chaucer comment thread, apropos of downloading 10,000 books from Project Gutenberg onto my hard disk:

Apologies in advance: this has nothing to do with Chaucer, but instead is about the pace of technological change. Anyway, I was reading George Gilder's "Life After Television" today (don't ask why), and came across a passage where he was talking about the challenge that the PC would pose to the cost structure of centralized databases (which at the time charged thousands of dollars for searches). Gilder writes of this one guy who was trying to come with an alternative pricing method for information (this is in 1988): "[His computer's] hard-disk memory could hold dozens of megabytes of information, or the amount of data contained in more than one hundred books . . ."

So at a price that, inflation-adjusted, was two or three times as expensive as a good PC today, you could hold the text of "more than a hundred books" on your computer. Now Brad's got 10,000, and it takes up what, a tenth or a fifteenth of a laptop's hard disk (one-seventy-fifth of the storage on my desktop)? We're living through this, and I still think it's hard fully to comprehend it.

More interesting, perhaps: yesterday, when I finished downloading the Project Gutenberg 10,000, I for the first time had more locally-accessible books in cyberspace than in physical space. We have perhaps 4,000 books here in the house, and I have perhaps 3,000 more at the office. But now I have 10,000 inside the computer.

Posted by DeLong at December 13, 2003 02:54 PM | TrackBack

Comments

have you noticed the "F", word is not used, swive was the verb?, To connect The Rings and chaucer, Tolkien published a book, " Sir Gawain, and the Green Knight", this was contemporaneous with Chaucer, but from a different part of england, he gave a few passages, of the original, and it was less intelligible than Chaucer.

Posted by: big al on December 13, 2003 06:16 PM

____

Brad DeLong writes:
> More interesting, perhaps: yesterday, when I finished
> downloading the Project Gutenberg 10,000, I for the first
> time had more locally-accessible books in cyberspace
> than in physical space. We have perhaps 4,000 books
> here in the house, and I have perhaps 3,000 more at the
> office. But now I have 10,000 inside the computer.

Interesting. I *think* we have you beat at home, although I'd have to do the overdue Doomsday Book Book to know for sure, but you've got me waxed on the office front. 3000 books at 1 inch per book (this *is* academia, right?) gives me 250 linear feet of books, or 50 5-foot shelves, or 10 full shelf-units of the kind I'm most familiar with here. So I have to fold.

But then there's Project Gutenberg. Now, not every "book" they have is truly book-length, but I will concede 7000 solid book units in that collection. At 1 inch per book-unit, that's just about *80 wall feet feet of 7-foot high book shelves*. That would be all 3 bed-rooms in our house done floor-to-almost-ceiling in books except where there are windows, where I would have window seats installed. I would weep to have so many books on shelves. And now that Project Gutenberg has gotten this far, I expect them to grow exponentially for years.

And yet... And yet, however wonderful this is, the real problem is that you could have one million books to read for free, and I am not sure people would read them. In our small city, our library has (I think) 300,000 books on the shelves, and it is a very busy library, but more than half the people in the town will read but one book a year. I have *good* students who gripe about their upcoming GREs, and how horrible the Verbal part is because the vocabulary required is so obscure. But it's easy to see that if they had only read a minimal amount (1 hour per day) they would have read at least 50 million words in their past 10 years, and encountered every word they would ever see on such a test so many times that they should not be worth worrying about at all. But it is, and these are the better students at a state university.

So Project Gutenberg could fill 100 DVDs with books, and I'm not sure this will mean that any more reading would get done, and this depresses me a very great deal.

Posted by: Jonathan King on December 13, 2003 08:51 PM

____

And Christian Science Monitor complains about loss of retail jobs, saying something about "last frontier of jobs not requiring a college educaiton!" Irons links to it ArgMax.

That kind of thinking managed to keep Gutenberg's machine off of Ottoman territory for three hundred years, and that was about the time when the Empire's strength was either at its zenith or they could afford to think so.

The Bush team and their supportes sometimes look to me like the equivalents of those Ottoman anti-Gutenberg circles.

Posted by: Bulent Sayin on December 13, 2003 10:27 PM

____

Aside from the amusement factor of being able to download 10,000 books to your laptop, why bother?

1) With that amount of data, the most important thing becomes the ability to index and search through it to find what you're looking for. (Project Gutenberg's files are notoriously cryptically-named.) Did you also download their seach-engine?

2) They're adding 100 or so eBooks a week. Are you going to assiduously keep updating the "library" on your laptop with new acquisitions?

3) While I greatly admire the ability browse through Chaucer while proctoring an exam, every classroom where I might find myself proctoring an exam has WiFi, so one can access that copy of Chaucer directly from Project Gutenberg. No particular need to download it in advance.

Put differently, the advances in network connectivity are every bit as startling as the advances in hard drive capacity. On the one hand, we have an increasing ability to store vast amounts of data locally. On the other hand, we increasingly cease to care where the data is stored, because we can access it nearly instantaneously from nearly anywhere.

Posted by: Jacques Distler on December 13, 2003 10:58 PM

____

What volume of data does the human eye transmit to the brain every second and how does the brain deal with it?

And now connectivity brings data from beyond what the eye can see, from around the other side of the earth, from beyond earth, from the past (and from the future too?).

How to adapt to that?

And Christian Science Monitor worries about keeping "retail jobs that don't require college education"!.

(About 10 years ago, Jaques Chirac the French President was worried about protecting the small grocery shops in front of competiton from supermarkets, thinking about measures like keeping supermarkets off the city limits. He woke up pretty soon, though, to the fact that France would be better off with an army of retail professionals than mo and po shops owners. And now Americans need to wake up to the fact that America would be better off if entire retail business worked with virtually zero labor and all college graduates where any staffing was needed at all.)

Posted by: Bulent Sayin on December 14, 2003 03:46 AM

____

"
Put differently, the advances in network connectivity are every bit as startling as the advances in hard drive capacity. On the one hand, we have an increasing ability to store vast amounts of data locally. On the other hand, we increasingly cease to care where the data is stored, because we can access it nearly instantaneously from nearly anywhere."

Posted by: Jacques Distler on December 13, 2003 10:58 PM

Actually, it does matter where the data is stored, because, generally, one can't access it at all from anywhere. The really nice thing about Project Gutenburg is that it is publicly available. For the 90-odd percent of us (in the US, at least) who aren't on academic networks, a huge amount of information is not available, except by paying a lot of money. Lexis/Nexis is still expensive.

We've just got a taste of a possible future.


Posted by: Barry on December 14, 2003 05:47 AM

____

Barry --

I completely agree that too much information remains locked up in expensive databases. On the other hand, I'm not sure where you live, but in at least some cities public libraries have excellent free information databases, many of them accessible via the Web. In New York, for instance, you can use JSTOR (and a host of other academic/popular databases) in the library, while remotely you can access databases that provide full-text searches and full-text capabilities for many many popular and academic journals. I think, though, that the NY Times is not included in that -- although you can obviously physically read it on microfilm in most libraries.

Posted by: James Surowiecki on December 14, 2003 08:50 AM

____

Re "swive," I believe Germaine Greer was pushing it a few years back as somehow less sexist than, ahem, plow.

Re Chaucer and Gawain, I don't want to overblow my knowledge, but I think the reasons why Chaucer may be more intelligible are two: one, he is a "London/Southeast" dialect, which was the winner in the competition of contendors; but, two, Chaucer not only received but he also shaped the dialect. Not that Chaucer picked a winner, but that Chaucer created a winner. Note also that Chaucer uses a "European" verse form familiar to us, unlike Gawain's alliterative Olde Englyshe. Another example is Langland's "Piers Plowman," again contemporaneous with Chaucer, but to me even seemingly even more remote. I find that my 57-year-old College English reader prints Gawain and Piers Plowman in translation, Chaucer in the original (but they are all more or less readable with a bit of practice and imagination).

Posted by: Buce on December 14, 2003 11:05 AM

____

Agreed that having multiple "Bryants" of e-books locally available is fine; I do wish there was a Google-ike tool for both indexing the texts AND keeping track of how I tend to search for strings within the texts.

Bush (Vandevar -- if I have that spelling correct) predicted this problem back when he thought such libraries would be implemented in
micro-film... I would not be surprised to learn that algorithms exist to conduct such searches.
Hyper-Boolean-GREP or something.

Posted by: Pouncer on December 14, 2003 11:50 AM

____

Vannevar Bush (1890-1974). Byte mentionned him and his works some years ago in a magazine that focused on information representation, IIRC.

DSW

Posted by: Antoni Jaume on December 14, 2003 12:43 PM

____

Hey Brad! How did you get all ten thousand at once? I'd really like to know. I've downloaded plenty of PG books (uploaded a couple too) but I never found any way to do it except one book at a time.

Posted by: W. Kiernan on December 14, 2003 03:42 PM

____

I repeat my Luddite point that it still takes you exactly the same time to read each one ...

Posted by: dsquared on December 15, 2003 01:22 AM

____

Buce and Big Al,

For all the difficulties of translation, the story of "Gawain" is so straightforward that I have used it to entertain my daughters on the subway since they were 4 and 5. It is a bit of a trick to get past the hunting parts (I don't tell them the animals Gawain receives in exchange for kisses are dead), but they really like the story. It is among their most requested stories.

Chaucer is rich and complex, but I'm glad scholars have had a taste for other virtues as well, and have preserved the writings of linguistic "losers".

Posted by: K Harris on December 15, 2003 09:16 AM

____

Are e-books really for reading? I think they are for looking thing up, like the OED.
The Puritan"plain vanilla" doctrine of Project Gutenberg is it weakness as well as its strength. It harks back to the ASCII days before Knuth reinvented typography for desktop computers. If you really like an author, nowadays you put a typographically elegant version on a fan website. Is there a way of finding these more direct than Google?

Posted by: James on December 16, 2003 05:24 AM

____

W. Kiernan:
"Hey Brad! How did you get all ten thousand at once? I'd really like to know. I've downloaded plenty of PG books (uploaded a couple too) but I never found any way to do it except one book at a time."

http://www.gutenberg.net/events/dec03.shtml
PROJECT GUTENBERG DECEMBER 2003 EVENTS

To celebrate Project Gutenberg's 10,000th eBook, the The Magna Carta, there will be a series of events in the San Francisco Bay Area in December 2003. Some events are RSVP, or by invitation, while others are open to the public.

Help to celebrate this milestone, wherever you are, by giving away eBooks. Grab just a few of your favorites, or download an entire CD image. If you have the bandwidth and a DVD burner, grab the DVD image (4.13GB) with over 9500 eBooks! (It doesn't have all 10,000 because some are too big to include, and others are under copyright protection in the US.).

Posted by: Bill Woods on December 16, 2003 01:26 PM

____

Post a comment
















__