June 07, 2004
It Seems Like Smoke and Mirrors
When Microsoft's new search engine (which is, I am told, already crawling the web at high speed) debuts, it will be very good: it will be very good because everyone trying to artificially inflate their search relevance is targeting Google, and Google has to dissipate a huge amount of resources trying (semi-successfully) to undo the efforts of those who are gaming it, while nobody is targeting Microsoft.
Should Microsoft's new search engine garner significant market share, however, its quality will start to degrade as the game-players start paying attention to it.
Here John Battelle reports on listening to Bill Gates on search:
John Battelle's Searchblog: Gates at D: MSFT Will Wear White Hat In Search: [Gates] repeated that Microsoft will clean up its search practices,* but he seemed to hint things would go a bit further than that. "They have a way of formatting things that has had some appeal," Gates said. "It will be matched. Web search is a incredible business," he continued. "(But) If you want to find things that are local...it's terrible today. If you want to find things that are of particular interest to you, it is quite terrible today."
Gates blamed search's shortcomings on its keyword-based approach, and argued that natural language and contextual semantic approaches will be the next leap forward.
The computer scientists I know are overwhelmingly of the opinion that Gates is either (a) using mirrors and smoke to confuse his audience, or (b) genuinely has no clue how hard the problems of "natural language and contextual semantic approaches" are. Amazon has a lot of information about my taste in books. It has proven unable to use it effectively even in its own limited scope.
*I.e., as John Battelle wrote last March: "MSN will announce that beginning July 1, MSN Search will clearly delineate paid ads from organic search results, with the result being that organic (or algorithmic) results will be above the fold (the top half of the page) for the first time since...well since recent memory.... I've complained over and over about how crappy MSN search is, mainly due to the fact that you can't see the organic forest for the commercialized trees..."
Posted by DeLong at June 7, 2004 09:28 AM
| | Other weblogs commenting on this post
To be clear, anyone trying to game Google is almost certainly going to game Microsoft. There is no magic bullet that will automatically cause a site to shoot to the top of Google's ranks - people who game Google put a lot of work into setting up dummy sites, playing around with server settings, and hiding content. Microsoft will run into these same problems, unless Gates has come up with a magical algorithm that can disregard all the sites currently trying to fool the search engines.
I'd even go so far as to say Google has an advantage here, since they are a "pure" search company and so have spent a lot of time and money figuring out ways of filtering out the b.s. Microsoft, relatively new to the scene, will have a lot of learning to do.
Hey, this is the real game of life...learn the rules and by the time you've learnt them they've changed. That's what's going to happen to most of us when we retire. (Welcome to WalMart)
Now Google and the advertisers can try to play this game intelligently. I'm not sure exactly how to model it or whether there is some equilibrium state.. Maybe the model we need here is of an adversary advertiser as a Probabilistic Polynomial time Turing machine.
But it's realy no different than the typical investor zero-sum game, (psst here is a hot tip, or this is an easy way to make a hundred million bucks)
Brad needs to provide some kind of pithy formulation of the search engine relationship he describes here -- to economists, it's conceptually equivalent to Goodhart's Law or the Lucas Critique, so future generations will likewise speak in hushed tones of "DeLong's Law."
If Gates meant that Latent Semantic Analysis algorithms can be used effectively and can be tuned to an individual's recent behavior then I do not think it is entirely "smoke and mirrors". Some of these tools are almost off the shelf and everyone wants to know whether their performance will compare with Google's for local searching.
I discussed this with Gates when he visited the MIT AI lab many years ago. Gates definitely understands the nature of the problem, he has also been hiring the best computer science researchers arround for a long time.
I have no doubt that there are computer people who discredit any initiative from Microsoft. It has always been thus. The Cobol/Mainframe gang screamed as their trade skills were rendered irrelevant by microcomputers. Today the Unix artisans are doing the same thing.
I have written system level code for many operating systems, including Unix, VMS and Windows NT. I didn't think much of UNIX ten years ago and it hasn't improved since. I have no doubt that at some point in the future the Windows artisans will be complaining bitterly about some new competitor.
Computer skills take many years to learn and are unfortunately not very transportable. The more obscure and arcane the system in which the artisan has expertise, the more important it becomes to protect that investment. You will find no fiercer advocates of their systems than Genera Lisp Machines, an O/S so convoluted that it took a gifted MIT doctoral student to achieve a moderate level of proficiency.
Not to mention Yahoo! - paid ad on the top, left, middle and in what look likes the organic resluts too (where it says Web Results), you have to pay to be included and then pay per click (well, you don't have to - but our site was dropped and that seems to be the only way back in). Yahoo search is a total sham - they went overbaord on it and the "organic" results are nearly useless...
For an example search Yahoo for Caribbean http://search.yahoo.com/search?fr=fp-pull-web-t&p=caribbean
NLP is a very hard problem, but NLP techniques can still be useful. In fact, NLP techniques and some semantic analysis are currently being used in information retrieval--check out work being done at Carnegie Mellon and the University of Massachusetts. Methods built on an NLP framework have been shown to work at least as well as traditional methods. The question is whether they will work for web retrieval, which is a bit different from the ad hoc retrieval researchers usually focus on.
"The Cobol/Mainframe gang"
What gang is that? I'm more worried by the "If it's not PPT (= Probablistic Polytime Turing) computable it ain't interesting" gang.
Assiduus usus uni rei deditus et ingenium et artem saepe vincit - Constant practice devoted to one subject often outdues both intelligence and skill. (Cicero)
Felix qui potuit rerum cognoscere causas - Happy is he who has been able to learn the causes of things. (Vergil)
Ultima ratio regum - The final argument of kings. (Inscription on French canons in the times of Louis XIV.)
Solitudinem fecerunt, pacem appelunt - They made a desert and called it peace. (Tacitus)
Verbum sapienti sat est - A word to the wise is sufficient