January 30, 2003
Any Text. Anytime. Anywhere. (Any Volunteers?)
Wired 11.02: VIEW
Any Text. Anytime. Anywhere. (Any Volunteers?)
The mechanics of a universal library are simple. The tricky part: harnessing the free labor.
By J. Bradford DeLong
It's a bad day in the stacks.
I go three for seven: three books found, one that should be there but isn't, one recorded lost, and one checked out that will have to be recalled. The seventh is the one I really want: QB54.C661. There's no copy in UC Berkeley's main Doe Library stacks - it's shelved in the Math Library. The Math copy is not where it's supposed to be, but the catalog claims there are two copies on the shelves in Moffitt. Then time's up. It will be 64 hours before I get another crack at tracking down Appendix D of Carl Sagan, ed., Communication With Extraterrestrial Intelligence.
And that's after two hours doing very low grade work: looking up call numbers, wandering down corridors, waiting for the crowds to clear out between the compact movable stacks so I can get at the shelves without crushing grad students in the next aisle. Knowledge revolution, indeed.
So why can't I just call up the text on my computer screen? Where is my universal online library?
In 1971, Michael Hart asked himself the same question, and started Project Gutenberg with the goal of making every single text ever written freely accessible. From a worldwide cost-benefit standpoint, Project Gutenberg is a no-brainer. Sure, it'll cost $750 million in volunteer time (if the typists are in the United States - just $50 million if they're in India), but some 1 million book titles could be called up anytime anywhere from anything with an Internet connection. We're talking 10 cents per person, or less, to create the universal online library. Click - whatever you want to read next is there. The hard task is mobilizing the resources to build it.
In the past there have been three means of setting large collective projects in motion. Governments have commanded people to work. Markets have used carrots rather than sticks by offering the possibility of profit. And philanthropists have spent their wealth in places where governments and markets have failed. More recently, we have seen the emergence of a fourth way: open source. A charismatic and technically adept organizer can mobilize, say, 5 percent of the work time of each of thousands of contributors motivated solely by the intangible rewards of solving a problem. The result can be popular, elegant, and groundbreaking. Like Linux.
Linus Torvalds' open source operating system project got its start two decades after Michael Hart's open source library project, but has long since surpassed it. Linux scales well and flourishes because contributors eat up the intellectual problems of programming and gain status by pitching in. Of course, Bill Gates' closed source software project, which postdates Project Gutenberg by only four years, also scales well and flourishes, because the overwhelming majority of users would rather pay for the reliability of the leading brand.
Project Gutenberg, however, has failed to achieve any form of critical mass. It's not a high priority for governments. It hasn't attracted large donations from foundations. Since the whole point is to create a free universal online library, it won't be driven by markets. And as an open source project, the positive-feedback loops are not strong enough. The work is time-consuming and boring.
Thus Project Gutenberg has inched ahead at a snail's pace. In its 32nd year of existence, the collection has only 6,267 etexts. Now, Gutenberg is not the only source of freely or cheaply available electronic texts. I can see pieces of the universal online library taking shape in the JSTOR journal-articles archive and Allison Druin and Benjamin Bederson's International Children's Digital Library, which is being built in the philanthropist mode. But much of what exists elsewhere is far from free. Or easily accessible. And most texts still aren't digitized. So I continue to spend large parts of my waking life wandering around the bleeping stacks looking for things that aren't there.
Technology isn't the problem. The past half-century has seen vast improvements in scanning, storage, search, and data transmission. But we still have only a crude set of tools for harnessing the public spirit on a mass scale.
We can do it, however, with a hybrid solution. Get government to play a key role, from limiting copyright to sponsoring projects. Build an open source core around which profit opportunities emerge, and fill in the holes with acts of philanthropy. This type of collective effort could produce public projects that don't rely on the involuntary servitude of humble researchers. (Ahem.)
So here's a shout-out to blogger Michael McNeil, who (with the kind permission of its author) has added Appendix D to our nascent universal online library. Freeman Dyson's manifesto-commentary on J. D. Bernal's The World, the Flesh, and the Devil is finally at my fingertips, no thanks to call number QB54.C661. Thank http://impearls.blogspot.com/2002_11_10_impearls_archive.html #84429829.
Contact J. Bradford DeLong at www.j-bradford-delong.net.
Posted by DeLong at January 30, 2003 11:40 AM
"And that's after two hours doing very low grade work: looking up call numbers, wandering down corridors, waiting for the crowds to clear out..."
Isn't this what graduate research assistants are for? Scribble down a bunch of LoC numbers, hand them to an RA (with vague mumblings about maybe giving them co-writer credit) and wait for the books to materialize.
Do you have co-borrowing privileges at any other Bay Area university libraries? One of the first tricks we learned at UCLA was, if URL didn't have the book you wanted on the shelf, you drove to UCI or UCSB (or wangled a stack pass to USC or CSULB) and got your book there.
>Click - whatever you want to read next is there.
Do you think authors, publishers and booksellers might have a valid grouch and stop putting out? After all, the pharmaceutical companies expect to be able to earn returns on the R&D invested in making new drug discoveries commensurate with the risks involved.
Of course, I am willing to be convinced otherwise. Just for starters, why is access to the academic papers listed in Brad's esteemable recent reading list for Economic History restricted to those with academic affiliations subscribing to JSTOR? In this new world, will the rest of us be permitted equal access?
you missed brewster kahle- the wonder librarian of digital existance.
danny (of oblomovka) has a good start guide to what brewster's up to here:
>>Do you think authors, publishers and booksellers might have a valid grouch and stop putting out? After all, the pharmaceutical companies expect to be able to earn returns on the R&D invested in making new drug discoveries commensurate with the risks involved.<<
Copyrights (used to) expire... and that has never disincentivized booksellers to put out things. Even patented drugs eventually fall into the public domain. I am glad I don't have to pay IPR's on my aspirins... Even if there is a small disincentive to let IPR's expire - if there is one, given compounded discounting over the reasonably long lifetime of IPRs, and the the fact that the commercial value of creations down a few decades is typically strongly skewed towards 0 - , it may be well be worth the increase in social welfare from making this products widely available at little cost. Did someone get the impression that most publishers' staple business plan is centered around making money off discovering the 21st century Shakespeare?
WEll, a universal online library of some sort would still be good, independent of the copyright status. I would pay for access to such a library, and even pay per copy (minimally, anyway). This would tend to be an incentive for authors, particularly those whose works are out of print.
Also, a really interesting reference and discussion can be found surrounding the los almos preprint server--http://arxiv.org/--for math and physics documents.
Indeed, most journals articles printed int the past decade or two have online versions, and publishers pay for it.This model will apply for books, as well--eventually.
The bigger trick will be fully searchable text. Right now, finding documents is often difficult as phrasing changes, words get popularized (nanotechnology), and authors decide which keywords they will submit.
As a tangent to Bob Briant's post -- it is important to distinguish among the interests of writers, publishers and booksellers. They are certainly not the same when it comes to alternative publishing models.
As many areas of publishing have condensed into a winner-take-all market, all but the most successful writers are in a weak position when it comes to bargaining with publishers (if you're JK Rowling you're happy, otherwise not). And as book chains have become successful, so publishers have been manoeuvred into a weak position vis-a-vis distributors and the big book outlets.
Of these, it is publishers and booksellers who have most to lose from an alternative model, as most authors don't make much anyway. So just as even succsessful recording artists are threatened much less than music publishers by downloading (witness Robbie Williams' recent endorsement), so many authors may well not mind a different publishing model.
One more observation along these lines: publishers of scientific journals are doing their best to keep "their" material proprietary, and it is active scientists who are working around it via mechanisms such as e-print archives (http://www.arxiv.org).
So perhaps there is some kind of open-source solution, although we should not forget Larry Summers, who I believe said "no one, in the history of the world, has ever written code for free".
Few people have ever written books without any sort of profit in mind, but many have written wonderful things without which our civilization would not be what it is for profits that are not in the tradionally economic realm (fame - the feeling of contributing to art, science etc). For most artists and authors the first and most realistic goal is to be published somehow and be read / seen / heard.
The winner-takes-all nature of this kind of markets (nowadays) has to do with the costs that branding entails. It's a bit of a value judgement, but I don't think branding as much intrinsic value, rather I think of it as an attempt at pointless product differenciation with market power as a goal. This kind of differenciation actually hurts welfare, as it draws me to go watch crappy movies like a robot instead of taking a few minutes to inform myself online or otherwise. And in the mean time, all smaller studio artists are left starving. I can hardly see how that spurs genuine creativity. But that's me.
Actually many of the greatest books written were written with little expectation of monetary reward. The history of philosophy, economics, sociology and so on is full of examples where the initial print run was only 50 or a hundred books sold only to a few people and paid for by the author. Self publishing for vanity has given us some of the greatest books ever written.
Raj Reddy at Carnegie Mellon has been working on this for the past two decades.
~'Larry Summers, who I believe said "no one, in the history of the world, has ever written code for free".'~
I believe the quote was to the effect that no one in the world had ever TWEAKED someone ELSE's code.
The "for free" part was merely implied.
Melcher -- thanks for "tweaking" my post, without any payment.
Actually, you are all missing a point. It's called "tenure."
Publishers have a lock on the publishing of academic literature because in academic settings publishing for tenure must be done in peer-reviewed academic journals. There's a handful of them in each discipline. Usually they are controlled by large publishers: Elsevier, for example, has a near-stranglehold on scientific publication. We librarians have often been frustrated by the ability of publishers to raise prices almost without limit--and to deny access unless through subscription. The publishers know they can do it with impunity, because there are certain titles that MUST be in a library's collection or the faculty would raise hell. Usually those titles are closely linked to publication for tenure.
As an added complication, most academics do not retain copyright of an article when publishing in one of these journals; it is transferred to the publisher. The author has no say on whether his work is placed in public domain or not.
J-Stor is growing fast, and might be the best model, because it does not compete with current publishing. Their contract specifically states that they will stop digitizing at a certain pre-set time prior to the present date. Publishers digitize and sell their own current materials--which is, of course, the most important in many disciplines.
Elsevier has nothing like a stranglehold on scientific publications--at least the fields I am familiar with (physics, chemistry, mechanics, electrical engineering). They have particular journals which are best of class, but the best are quite often the society publications. IEEE, IOP, and AIP have pretty significant presence. I thought that it was medical journals where the real profits were made.
The debate reaches into libraries, and a couple of years ago, there was a pretty acrimonius debate at Stanford, where they considered dropping all subscriptions to Elsevier journals.
I do hope JSTOR succeeds. It would be a real boon to the pursuit of research. I would also hope for a change in tenure policies, which seem a potentially corrupting influence.
Wasn't it Feynman (or perhaps Gamow) who said that if you stacked every new paper published in physics on top of each other, the speed of the front of the stack would exceed the speed of light---but that breaks no rules because no information is transmitted.
Elsevier is the company everyone looks to set the prices. Science and medical journals have increased in price anywhere between 3% and 15% every year since I entered the profession. And I have yet (23 years later) been able to get one cancelled unless whoever held the purse strings ordered the library to cut something...and many times they have cut staff rather than journals.
Actually, I was not addressing the association-type journals, which are a whole other kettle of fish altogether.
There's a company called elibron.com which scans in rare public-domain books and then sells them either as e-books or as printed books very cheaply. The printed books are very well made and attractively bound. For me the best things have been old travel books and memoirs and old translations which have never been retranslated, but I have also seen fascimiles of rare scientific and philosophical classics. I got a study of old Turkish history for $25 which cost $300 on the rare book market.
I would commend anyone who would like to contribute to the Distributed Proofing Website
and make a contribution to to expanding the books in the library.