July 21, 2003

Amazon Wants to Help You Look Inside the Book

Amazon's big problem has always been that you cannot browse. You cannot open up a book and read ten pages in its middle if you are shopping at Amazon.

Now Amazon is trying to think of ways to get around this problem. And this "look inside the book II" program could be really useful, and really effective. It not only lets you browse, it tells you which of the many books is the one you want to look inside.

Amazon Plan Would Allow Searching Texts of Many Books: xecutives at Amazon.com are negotiating with several of the largest book publishers about an ambitious and expensive plan to assemble a searchable online archive with the texts of tens of thousands of books of nonfiction, according to several publishing executives involved.

Amazon plans to limit how much of any given book a user can read, and it is telling publishers that the plan will help sell more books while better serving its own online customers.

Together with little-publicized additions to Amazon's Web site, like listings of restaurants and movie showings, the plan appears to be part of a strategy to compete with online search services like Google and Yahoo for consumers' time and attention. Providing a searchable online database of the contents of books could make Amazon a more authoritative source of information, drawing additional traffic to its online retail store.

A spokeswoman for Amazon declined to comment and would not confirm any of details of the plan. The publishing executives said Amazon had asked them to keep the plan confidential until the start of the service, which was scheduled for the fall.

Amazon is calling its program Look Inside the Book II, the publishers said...

There is one bizarre thing in this New York Times piece: the claim that this is not an expansion and improvement of the book core of Amazon's business but is instead "part of a strategy to compete with online search services like Google and Yahoo... [by] mak[ing] Amazon a more authoritative source of information." Huh? This is about letting people figure out what books would be useful to them, and giving them a taste. This is not about creating an online encyclopedia.

Posted by DeLong at July 21, 2003 08:05 AM | TrackBack

Comments

Now if thy could only add comfortable seating, and a coffee bar...

Posted by: Jeremy Osner on July 21, 2003 08:15 AM

I'm also pretty sure that Amazon already signed an agreement to license Google's technology for searching the Amazon site. Not that they couldn't try to compete later, but I think its unlikely.

Posted by: Bharath on July 21, 2003 10:01 AM

I'm also pretty sure that Amazon already signed an agreement to license Google's technology for searching the Amazon site. Not that they couldn't try to compete later, but I think its unlikely.

Posted by: Bharath on July 21, 2003 10:03 AM

I'd like a personal version, so you could search the text of all the books you actually own...

Posted by: Jon H on July 21, 2003 11:16 AM

Actually, it's not like the publishers haven't already figured this one out -- e.g., the sample chapters that have been up for more than five years now at Baen Books (www.baen.com),
though admittedly they were one of the first...

Posted by: Tony Zbaraschuk on July 21, 2003 11:35 AM

Way OT:

Hey, vote in today's CNN poll, if you haven't already. get your friends and family to vote too. Looks like the right is trying frantically to arm-wrestle this one down.

www.cnn.com

Poll: Is President Bush doing a good job?

(scroll down & to the right)

Posted by: jim on July 21, 2003 11:48 AM

Tony Zbaraschuk writes: "Actually, it's not like the publishers haven't already figured this one out -- e.g., the sample chapters that have been up for more than five years now at Baen Books (www.baen.com),
though admittedly they were one of the first"

It's not the same. It's not about putting the full text of books online as readable documents, or even about putting significant chunks online so that readers can sample the book.

The idea is just that users would be able to search for a word, and get back a listing of all the books in which that word appears. Not just a listing of books for whom that word is a keyword, or subject, or is in the title, which is the case now.

In order for that to be useful, you need to see the context in which the term is used in a book. You don't want to buy a 30 dollar book and find that the term was used only once in a caption of an illustration, and was used as a proper name, not as the noun you were looking for.

So if "Columbus" appears on page 315 of a book, you need to be able to see page 315, and maybe a page before and a page after; maybe a little more. Then you can tell if the reference was to driving through Columbus, Ohio, or if the reference was part of a page about Christopher Columbus.

What you get to see depends on what you searched for. It's not like Baen providing a fixed subset of a novel's chapters.

Presumably, if you knew a word that appears on every page of a book, you could (slowly) view the entire content of the book by searching for each word in sequence and viewing the results in context. (But if you could do that, you probably have the book open in front of you.)

Posted by: Jon H on July 21, 2003 01:11 PM

Jon H wrote:

"So if "Columbus" appears on page 315 of a book, you need to be able to see page 315, and maybe a page before and a page after; maybe a little more. Then you can tell if the reference was to driving through Columbus, Ohio, or if the reference was part of a page about Christopher Columbus."

and

"Presumably, if you knew a word that appears on every page of a book, you could (slowly) view the entire content of the book by searching for each word in sequence and viewing the results in context. (But if you could do that, you probably have the book open in front of you.)"

In fact, you could easily view the entire book without having a copy open in front of you if you can see the context. Search for, say, "intolerance", and get a hit on page 42, which you'll be shown together with pages 41 and 43. On page 43, you see the word "pedestrian"; search for it, and you'll be shown pages 42-44, and so on. Of course, it shouldn't be too difficult to limit searches to, say, five per book.

I agree that something like this would be a great idea. Particularly so for a lazy researcher, who'd just need a quote from one page to beef up his references...

Posted by: Antti on July 21, 2003 02:03 PM

Brilliant.

What it suggests to me is that they are trying to make the content of every book from every publisher part of the draw - i.e. they make that corpus searchable, then you are allowed a glimpse into the books that match.

So imagine; you want to know something about - oh mushrooms and their use by refugees in the first world war - do you go to Google first, or do you go to the Amazon index.

Lots of options for what to do with the the index once it's compiled.

Brilliant

Posted by: Ben on July 21, 2003 06:39 PM

Antti - good point.

Of course, if they show only one page of context, it doesn't work.

If they do it intelligently, I suppose they could use index entries that cover a page range, and provide the range. If no range is given, give one page of context.

I expect pages themselves will still be provided as images, rather than text.

They'd have to limit the number of pages viewed from a particular book somehow. Otherwise, it might be easy to write a program to submit search requests, grab the page images, and OCR them.

Posted by: Jon H on July 21, 2003 06:49 PM

Antti - good point.

Of course, if they show only one page of context, it doesn't work.

If they do it intelligently, I suppose they could use index entries that cover a page range, and provide the range. If no range is given, give one page of context.

I expect pages themselves will still be provided as images, rather than text.

They'd have to limit the number of pages viewed from a particular book somehow. Otherwise, it might be easy to write a program to submit search requests, grab the page images, and OCR them.

Posted by: Jon H on July 21, 2003 06:50 PM
Post a comment