## March 16, 2003

### On the Impact on Web Quality of Positive-Feedback Linking Practices

Are the best websites--the most interesting, the most informative, the most authoritative--the easiest to find? We have a world wide web in which we use the link structure to find things. But because we ourselves add what we find to the web's link structure, the number of links to a site depends not just on its quality but also on how easy it is to find. To the extent that services like Google that are in part functions of the web's link structure have become key search tools, these potential positive-feedback mechanisms have been strengthened.

Is there a danger that we are drifting toward a web of celebrity rather than of information--one in which well-known sites are well-known and prominent because of their well-knownness rather than their quality?

This is an interesting problem to try to think about...

Let's start with the simplest possible useful model of how the links to a website evolve over time. At any moment the rate of change of the links L to a website are:

• increasing at a rate b1L as relatively clueless links are added by people whose websurfing is guided by the existing link structure, or by things like Google that aggregate the existing link structure.
• decreasing at a rate b2L through linkrot.
• increasing at a rate Q, where Q is an index of the quality of the website, as the clued-in link to websites that are useful, informative, and authoritative.

This means that the dynamics of links L follow the simple equation:

(1) dL/dt = b1L - b2L + Q

And our questions are two: First, will the number of links to a website converge to be proportional to the quality Q of the website? Second, how long will this convergence take?

If the website starts at some time 0 with L0 links (derived from past history or celebrity or whatever), then the solution to the differential equation (1) above is:

(2) L = L0e-(b2-b1)t + (1 - e-(b2-b1)t)(Q/(b2-b1))

Where t is the index of the current time.

If b2 is greater than b1--if (independent of quality) having a lot of links tends to put downward pressure on the number of links to a website, as linkrot removes links faster than the clueless who are just surfing the web's link structure add them--then this equation is well behaved. As t grows larger, e-(b2-b1)t shrinks to zero: the impact of the initial link number L0 on the current link number L vanishes. As t grows larger, (1 - e-(b2-b1)t) grows to equal one: the number of links converges to an amount proportional to the site's quality:

(3) L = (Q/(b2-b1))

The closer is b1 to b2, the less relevant is this long-run result: it might take eons for convergence to occur...

If b2 is less than b1--if (independent of quality) having a lot of links tends to put upward pressure on the number of links to a website, as the clueless who are just surfing the web's link structure add links faster than linkrot removes--then this equation is not so well behaved. It is most illuminating to rewrite (2) as:

(4) L = (e(b1-b2)t)(L0 + Q/(b1-b2)) - Q/(b1-b2)

Over time, (e(b1-b2)t) grows without bound: positive feedback produces rapid exponential growth, after all. Looking across websites, as long as (b1-b2)t is relatively large, different sites' relative link numbers are not proportional to their qualities Q, but instead to (L0 + Q/(b1-b2)). If Q is large relative to L0(b1-b2), then there is little long-run impact: relative link numbers are nearly proportional to website quality. But if Q is not large relative to L0(b1-b2), then initial conditions--early start, web celebrity, whatever--have a powerful influence on relative links numbers--and thus on effective web footprint--even in the longest of runs.

So how relevant is this simple model? I don't know. I'm thinking about it...

Well, the phenomenology of link growth is clearly non-linear, e.g., spontaneous formation of 'domains' in which links travel easily but don't get out, saturation effects, etc.,-- one could go on. Don't forget, linear equations have only a -very- limited number of kinds of asymptotic behavior.

Matt

Posted by: Matt on March 16, 2003 11:03 AM

>>Don't forget, linear equations have only a -very- limited number of kinds of asymptotic behavior.<<

Yes! That's why we economists use them! Once you write down a linear differential equation, it's easy to figure out what it will do!

It's our hammer, and to us--as to everybody with a hammer and no other tools--everything looks like a nail!

Posted by: Brad DeLong on March 16, 2003 11:38 AM

When I started paying attention to blogs, I observed that BoingBoing was the best blog, by some set of poorly defined personal criteria of mine (a nice slant towards interesting neat stuff, regularly updated, a group blog, not just links but not great wafty essays either, that sort of thing). Now, at that time, BoingBoing was a moderately popular blog but not anywhere near the first rank. Now, it's right in the top few pretty much regardless of counting methods. I've observed this going on in a lesser way with other blogs.

Which is a way of saying that the model's wrong, or at least trivial; that almost nobody cluelessly links to famous blogs, and almost everybody links to specific blogs that they personally think are high quality; but that the chances of each linker seeing a particular blog are determined by its current fame. So a little-read excellent blog might acquire relatively many links from its few readers; a once-famous but now off the boil blog will pick up relatively few links from its many readers.

My theory is that cream rises in blogging; better blogs and more interesting bloggers are linked to more often, both in blogrolls and in specific links, and new, interesting blogs become heavily linked quite quickly.

But you know, IANAE, and nor do I play one on TV.

Posted by: Alison Scott on March 16, 2003 03:51 PM

Is this a question of "how important is path dependence"?

The model doesn't seem (to me) to capture the you-scratch-my-back etc. aspect of the blogosphere, where various bloggers pay close attention to various other bloggers, as either positives or negatives. Brad (for example) will link to bloggers he either explicitly wants to argue with, or he explicitly thinks say things he agrees with and wants to highlight.

And there will be bloggers who like what Brad says who will then pay attention and link back, and the more he links to them, the more inclined they are to return the favour, and so on.

Meantime, popular and "loud" (opinionated) bloggers like Andrew Sullivan get attention of that sort -- linked to by people who like them, while often being used by the likes of Brad as representative examples of what they DON'T like ...

So, one, I'm not sure quite how to interpret his b variables yet.

And I'm also not sure Alison whether blog-quality rises to the surface. In the world of talkback radio etc., opinionatedness and outspokenness becomes a virtue in and of itself. Blogging is more interactive, but I suspect some of the same dynamics may be at work.

But I am posting on the fly here and may not have grasped enough of what's in the model yet to do it justice.

Posted by: Michael Harris on March 16, 2003 05:48 PM

Actually, the mystery variable in all of this is the Q. What does "quality" mean in this model? How is it measured and interpreted?

*ponders*

Posted by: Michael Harris on March 16, 2003 06:33 PM

Is there a danger that we are drifting toward a web of celebrity rather than of information--one in which well-known sites are well-known and prominent because of their well-knownness rather than their quality

I made this point at the XCom 'Take It Outside' seminars last October, actually. My exact words were: "Google rewards ubiquity over veracity." It's sad but true. And while in certain cases, the two are identical, usually in matters of incontrovertible fact -- the Google spellcheck technique, for instance -- in the dissemination of opinion, the difference is staggering.

I suspect that Brad has the same sense of 'Quality' as I do, to some extent, because we both come out of (different) academic communities, with networks of peer reference that we know from experience tend to bring the cream to the top. But the nature of political-weblog citation is different to that in academia, because it's driven by snarky sentences and brazen backslapping rather than any sense of people reading and absorbing nuanced argument.

Posted by: nick sweeney on March 16, 2003 07:26 PM

Isn't the solution to (1)

L = L0exp(b1 - b2)t + Qt?

which reflects the fact that in this model, as long as b2 > b1, growth is determined primarily by clueless links (exponential) as opposed to growth from quality (linear).

Posted by: Tom Slee on March 16, 2003 07:55 PM

- No it isn't. I was wrong, Brad was right. I wish you could delete your own posts on this thing.

Posted by: Tom Slee on March 16, 2003 07:57 PM

Despite my rusty calculus, I will venture another opinion.

The question you are addressing would be, are the two basic mechanisms of link growth (quality or celebrity) distringuishable on the basis of observed link distributions? I don't think your initial pass at a model does leads to a link distribution, and therefore doesn't get to the heart of the issue.

If I understand right, the observed distribution, being power law in nature, is driven by something exponential in link growth. But whether Q figures in the exponent or not is another issue, and I don't see how to distinguish your model from Alison Scott's, which would give something like

dL/dt = (b1Q - b2)L

or

L = L0 exp(b1Q - b2)

Not that I'm any closer to it, of course....

Posted by: Tom Slee on March 16, 2003 08:12 PM

Of course, by making a mistake and correcting it I have increased the number of comments in this thread without improving the quality. If you read this item because "that item has a lot of comments -- it must be an interesting one" then I guess that's one more piece of evidence against quality.

(Nothing against Brad's original posting of course)

Posted by: Tom Slee on March 16, 2003 08:19 PM

Hehehe, Tom.

*bumps the comments section with another gratuitous addition to make it look like a profound discussion*

[Actually Tom, hald of the crowded comments bits on this particular board have an epsilon factor, where epsilon is non-trivial, involving multiple idential posts by people who didn't know that their first post had worked. When a post has 50 comments, epsilon can be 10 or more of those.]

Posted by: Michael Harris on March 16, 2003 08:31 PM

http://www.google-watch.org/

A look at how Google's monopoly, algorithms,
and privacy policies are undermining the Web.

Posted by: perdita durango on March 17, 2003 01:55 AM

All a bit like life really.

To him that hath shall be given . . .

Posted by: Nigel Hawthorne on March 17, 2003 02:01 AM

Note that b2 might actually be a lot higher than Brad is implicitly assuming; nobody really cares about links in a blogroll; it's the links to particular stories in particular posts that matter, and they tend to fall off the front page really quite quickly.

Posted by: dsquared on March 17, 2003 03:32 AM

Speaking of hammers... Using static linking is used as a proxy for quality is convienent, and one certainly would be remiss not to use what ever data is lying about as best one can, but it is flawed in assorted ways. It would be very nice to have actually transaction counts rather than relationships counts.

Everybody has a fax machine, but these days they are used less and less. Most companies own a typewriter, almost none of them use it. I have links on my blog, but I personally never use them, since the blog roll has been displaced by my RSS feed aggregator - that results in my failing to maintain it. It is, of course, a sin not to vote!

Posted by: Ben Hyde on March 17, 2003 06:47 AM

You asked: "Are the best websites--the most interesting, the most informative, the most authoritative--the easiest to find?"

It depends on who is doing the searching. That is, how skilled is the person seeking information about finding content online. So no, the best Web sites are not necessarily the easiest to find.

Skill is important not only because of the amount of material out there but also because there are services that try to channel users toward particular types of content, content whose producers pay for placement. Pay-for-placement content is often more visible than content based on relevance especially if the searching is fairly generic which it is in many cases (taking your average user). Studies have shown that the majority of queries on the Web are single term queries.

Users differ in their ability to use the Web. I base this on observations with a random sample of 100 Internet users. More to come on this as I continue analyses on my current project.

Posted by: Eszter on March 17, 2003 01:43 PM
