December 04, 2002
William Powers Is Clueless About Google

Writing in the National Journal, William Powers marvels at Google:


D.C. Dispatch | 2002.11.26 | Powers: ...What's intriguing about all of this—and the reason media folk are watching it somewhat nervously-is that Google gathers and presents the news automatically, without any input from people, not even journalist people. As the site itself explains, the headlines "are selected entirely by computer algorithms, based on how and where the stories appear elsewhere on the Web." The idea is that if you take humans out of the selection process, you get a clean read on the news, uncorrupted by bias...


I read this, and I think, "My God! How can any human being be so totally clueless?" Google doesn't take humans out of the selection process, it puts humans into the selection process. Google looks at "how and where the stories appear elsewhere on the Web." People put the stories up on the web. In looking at patterns of diffusion across the web, Google news is adding up and averaging the news-related decisions of tens of thousands of different people.

The real difference between Google--when it is working, that is--and other, more conventional news sources is that other news sources are the results of one person's or a few people's news judgment. Google's rankings, by constrast, are the results of the aggregated judgments of every person and organization whose websites it scans--thus making the mob smart, and (or so Google hopes) producing a collective vote for the important news that is of higher quality than any one individual can produce.

But taking humans out of the process? Bah, humbug! Google's entire success has been built on its ability to find a better way to use the structure of webposts and weblinks to figure out what the people who post on the web are thinking, and to add up their votes as to what is interesting and important.

Google does not work by being an icy, bias-free computer making inhuman judgments. Google works by maximizing human input into the search-and-evaluation process, because two thousand (human) heads are better than one.

Posted by DeLong at December 04, 2002 08:07 AM | Trackback

Email this entry
Email a link to this entry to:


Your email address:


Message (optional):


Comments

Well said, Brad.

Posted by: Mike on December 4, 2002 09:28 AM

Ah ... no?

Google.com rank's pages using linking as a proxy for a page's quality. That works reasonably well because some page makers build links to pages they value. While this is a proxy for value at least it reflects the voice of a large sample.

News.google.com can't do that because it takes time for links that reflect careful evaluation of a pages value to accumulate. News.google.com has to move faster than that. It is news :-) At most the voice it listens to is, in the end, that of the news outlet editors - which is only slightly different than the voice of the people that manage their news feeds.

News.google.com must use something other proxy for page-value to guide it's selections. The people who engineer that algorithum are it's editors the same way that an automobile's sheet metal has a designer inspite of it's being stamped out by a machine.

A lack of clarity about all this could be quite dangerous. What if the folks at google engineered their news page ranking to benefit a region, religion, or even their personal interests. Would we notice, would they?

It is certainly interesting that as the news editors begin to treat google and news.google as one of their primary news feeds that will undercut the existing news distribution giants.

Posted by: Ben Hyde on December 4, 2002 09:53 AM

Google's own FAQ page mostly talka about their article grouping technology. My bet is that grouping is done on a purely lexical basis - which isn't very hard (and happens to be my field of expertise.) They make no claim that the choice of news sources is automatic, just that the articles available on those sites are automatically sorted. They say they use "more than 4,000 news sources from around the world", and I'll bet that list of new sources is not automatically constructed and every site on it has been added by a real person.

As for what order articles appear in, I'll bet they also use lexical means to determine which articles have the content that most describes the story. They admit to taking into account "how often and on what sites a story appears elsewhere on the web." This means - I suspect - that articles chosen to appear first are the ones that have the content most frequently associated with the story on all the websites. These are not necessarily the most informative articles. In depth articles will have content that is not in everyone else's coverage. For example, the top headline on news.google.com right now is "Israeli missiles 'fired in Gaza'", linking to a BBC article that is only a few paragraphs long. The number 2 article is a much longer piece from Ha'aretz covering several different related stories. Why is the short article ahead of the more comprehensive article? Because it is looking for the most average content first.

If it works the way I suspect it does, it is biased towards the most mainstream reporting - particularly wire articles that are copied unchanged on many different websites. A search for "LTTE peace talks" tends to confirm this. The response starts with syndicated and wire service articles, only reaching - for example - World Socialist Website's extensive coverage only on page 4 and dribbling their articles over the next several pages.

This is fine when you want to know what everybody knows in a short time without visiting many different websites. But it will tend to discriminate against in-depth coverage.

Posted by: Scott Martens on December 4, 2002 10:22 AM

I'm wondering if there is a danger of some kind of undesirable feedback effect: If a significant number of the pages that Google uses in its analysis are themselves produced by Google-like programs, rather than humans, then what?

Posted by: Daryl McCullough on December 4, 2002 08:42 PM

...like having a significant amount of investment be through index funds?

Posted by: clew on December 5, 2002 10:59 PM

>>I'm wondering if there is a danger of some kind of undesirable feedback effect: If a significant number of the pages that Google uses in its analysis are themselves produced by Google-like programs, rather than humans, then what?<<


Yes, a positive-feedback explosion seems possible: a situation in which websites and news sources are well-known and popular not because they are good but just for their well-knownness and popularity.


Brad DeLong

Posted by: Brad DeLong on December 6, 2002 12:04 PM

Snip
<
A lack of clarity about all this could be quite dangerous. What if the folks at google engineered their news page ranking to benefit a region, religion, or even their personal interests. Would we notice, would they?
>

How long before information retrieval on the web is nationalised ? How long before the web leads to world goverment ?

Posted by: Keith Robinson on January 8, 2003 03:40 AM
Post a comment
Name:


Email Address:


URL:


Comments:


Remember info?