October 24, 2003

And Amazon Has Changed the World Again

Gary Wolf writes in Wired about how Amazon has once again changed the world:

Wired News: The Great Library of Amazonia: ...An ingenious attempt to illuminate the dark region of books is under way at Amazon.com. Over the past spring and summer, the company created an unrivaled digital archive of more than 120,000 books. The goal is to quickly add most of Amazon's multimillion-title catalog. The entire collection, which went live Oct. 23, is searchable, and every page is viewable.

To build the archive, Amazon CEO Jeff Bezos has had to unravel a tangle of technological and copyright problems. His solution promises to remake the publishing business and give Amazon a powerful new weapon in its battle against online competitors such as Yahoo, Google, and eBay. But the most interesting thing about the archive is the way it resolves the paradox of the book, respecting its physical form while transcending its limits.

I recently drove to a home in Silicon Valley and spent a few hours digitally searching the text of books. My host was Udi Manber, an Israeli-born computer scientist and author of a popular textbook, Introduction to Algorithms: A Creative Approach. Ten years ago, while developing a seminal piece of Unix search software called agrep, Manber came up with a concept for an information tool he has yet to build. It was supposed to search the mess of papers on his desk. The idea that you could perform a digital search of physical objects has long fascinated him. "Why not have users take pictures of their bookshelf?" Manber asked when we first met. "We could scan the images, extract the titles, and then let them search the entire text of the books they own."

The notion of Amazon scanning all of its books but allowing users to search only those they own is a clever way around the central barrier to creating a digital archive: Copyrights are distributed among tens of thousands of publishers and authors. But when Manber told Bezos his idea, he found the Amazon founder ready to work on a grander scale. Bezos wanted his customers to be able to search everything.

In his small, ranch-style Palo Alto house, Manber and I sit side by side at a table near the kitchen as he begins typing my queries into his laptop. The computer is connected to a prototype of the archive, which at the time of my visit is scheduled to go live in a few weeks. Within seconds, I am captivated. The experience reminds me of how I felt a decade ago, when I first began browsing the Web...

Posted by DeLong at October 24, 2003 11:50 AM | TrackBack

Comments

Here is where information becomes knowledge. And the law of unintended consequences will be sure to make it more interesting still. There just aren't enough hours in the day for learning all the interesting things that are out there. And cheap, too.

Posted by: pt martin on October 24, 2003 12:24 PM

This is nifty, but a google IPO on google is niftier..

Posted by: Roland on October 24, 2003 12:39 PM

This is nifty, but a google IPO on google is niftier..

Posted by: Roland on October 24, 2003 12:45 PM

The Memex, eh?

I spend enough time correcting scannos at Distributed Proofreading to be dubious of the searchability of photographed text, but as more and more published work is created digitally, Amazon's approach is nifty.

http://www.pgdp.net

Posted by: clew on October 24, 2003 04:18 PM

I give them an "A-" for effort, but the results are a bit shoddy. They don't seem to have enough proofreaders and the text recognition software isn't quite up to par.

Here's some results from a search in Peskin and Schroeder's book:

". . . 16 Chapter 2 The hleirr-Gordon Field and end of this region. If we restrict our consideration . . . of this book. since it will make the transition to quantum inechanics easier. Recall that for a discrete system one can define a conjugate inolnentum Er - BLwy (where (~ = . . ."

". . . 74 Chapter 3 The Dirac Field (c) Let us write a 4-component Dirac field as - . . . parts (a) and (b). That is, promote \(.r) to a quantum field satisfying the canonical anticoiuinutation relation {i~(X)~ k, (y)} = bahbl3l(X-y), construct a Hernnitian Hanniltonian, and find a representation of . . "

Klein-Gordon, mechanics, momentum, anticommutation, Hermitian, Hamiltonian - all came out wrong.

Or do they expect the customers and publishers to do the proofreading?

Posted by: ETC on October 24, 2003 08:16 PM

"The notion of Amazon scanning all of its books but allowing users to search only those they own is a clever way around the central barrier to creating a digital archive"

or then again, it's the way to make Amazon.com shareholders several hundred million dollars poorer, since it would be harder to find a closer analogy to MP3.com's disastrous attempt at pushing the boundaries of wilful infringement if you tried. (I wouldn't be surprised if Amazon had the political clout to make a court decide in its favor, or the economic clout to make a consortium of publishers think twice about annoying the crap out of their largest single conduit to the public, but to see this kind of massive change in copyright rules accomplished by corporate fiat is odd, to say the least.) Note also that, according to some reports, you can currently view all the pages of any amazon-archived book whether you own it or not.

Posted by: paul on October 25, 2003 08:17 PM

This is a way for Amazon to make money off its search engine. With several thousand hits resulting from any lookup, what will be important - as in a Yahoo search - is placement on the first user screen; and soon Amazon will be charging publishers for that.

Posted by: Jon Meltzer on October 27, 2003 05:13 AM
Post a comment