June 20, 2004

Hard Disk Costs

Abiola Lapite and Ian Montgomerie both agree that mass-storage costs are falling even faster than microprocessor costs--or bandwidth costs:

Tech Notes: On the Economics of GMail: A commenter by the name Ian Montgomerie says some very insightful things on Brad DeLong's Webjournal:

On another note, though, don't underestimate the cheapness of Google's storage. It's obviously very cheap relative to the competition, but in absolute terms they still can't afford to give everyone a gigabyte of email storage now. They know darn well that most people will use maybe 10% of that any time soon, and mostly in text (they would be stupid not to be compressing that text in the background down to a level where the typical GMail user's 100 megs or less of email are 10 megs or less of actual hard disc space, so a typical modern hard disc could serve 10,000 users). EVENTUALLY people will accumulate enough email, especially with attachments, so that many are actually using the bulk of their space. But that will take years by which time hard discs will be much cheaper. Google can reliably bet that from this point onward, hard disc prices will probably drop at least as fast as email volume rises.

Indeed, the speed at which storage costs per GB continues to drop is truly astounding to behold: Hitachi and Seagate both already have 400GB (i.e, 0.4 Terabyte) hard drives for sale, and this is for the low-end consumer market.

At a current (guessed at) cost to Google of $0.50 per GB, the 10 MBs required for the average Gmail account would cost it half a cent in hardware costs.

Truly these are marvelous times we live in.

Posted by DeLong at June 20, 2004 08:00 PM | TrackBack | | Other weblogs commenting on this post
Comments

Yahoo has just upped the ante by increasing its email account size to 100MB and individual email message size to 10MB. Two gigs are available for $19.99/yr. This leap is something that all email providers have noticed, I'm sure.

Posted by: Dubblblind on June 20, 2004 08:30 PM

____

The RAW cost may be $.50/GB, the real cost is more like $1-2/GB because of the need for reliability, which in Google terms requires triple-redundancy.

Still, the google cost for RELIABLE storage is very low. And frankly, I'd gladly pay $10/GB year for a reliable, network accessable encrypted storage...

Posted by: Nicholas Weaver on June 20, 2004 08:30 PM

____

So you need access

Posted by: Lee A. on June 20, 2004 08:32 PM

____

Your previous comments on the problems of finding digital things are right on the money. The trend in mass storage means that for some of us it is already feasible to put an entire career on a hard disk the size of a paperback book. Every paper I ever wrote. Every slide deck I ever put together for my corporate masters. The audio track for every presentation I ever gave (not quite there for video yet). Those of you more prolific than myself will have to wait another hardware generation.

I have not decided if this is a good thing or not.

Posted by: Michael Cain on June 20, 2004 08:36 PM

____

The idea of taking an IDE HDD that you buy off the shelf and sticking it into a system as huge and complex as gmail, or google the search engine, is a pipe dream. Nicholas hits the nail on the head with "reliability" but it would be hard to really quantify our estimates on their cost of storage with out getting a peek behind the curtain.

Obviously solving large, complex, technical problems is surely Google's strong suit. But I guess my point, if I must make one, is that storage scaling and backup is still "hard," and by hard I mean expensive. Oh to be a fly on the wall of the Google data center...

Posted by: Patrick Berry on June 20, 2004 08:44 PM

____

"Indeed, the speed at which storage costs per GB continues to drop is truly astounding to behold"

It has worked out to a very convenient 2x more for the money per year for a long time now. That also works out to a nice round 1000x every decade.

But unless a home user wants to store video, there is not much use for the latest capacities. I wouldn't know what to do with a 400GB drive other than restrict my usage of it to the outer edge in order to make it faster. I look forward instead to 1" drives in the 100GB range which should hopefully be about 5 years away.

Posted by: snsterling on June 20, 2004 08:53 PM

____

It will cost them more than $3/GB. Even with older hard drives, it was possible to fit thousands of accounts on a drive. With the bigger drives today, at least an order of magnitude fewer accounts can be hosted per drive. Each of these drives will need to be in some form of enclosure, require power to run, and infrastructure to connect it to the next larger network.

Google's starting from scratch, and its costs may be sustainable, but they're going to be a lot higher than Yahoo's or MIcrosoft's.

Posted by: David Yaseen on June 20, 2004 09:02 PM

____

What I hate though is how these new high speed hard drives burn out so quickly. Right now we're entering this weird sort of area where if something happened to disrupt our civilization in a matter of years our information would "evaporate". CD-roms are only rated for 1-2 decades, hard drives for a few years, and goods I go though floppies like paper. Where is the long term memory of our civilization? It's still in books, hopefully low-acid books.

Posted by: Oldman on June 20, 2004 09:44 PM

____

Nicholas Weaver writes:
>
> The RAW cost may be $.50/GB, the real cost is more like $1
> -2/GB because of the need for reliability, which in Google
> terms requires triple-redundancy.

Oh, let's try to put a few more real numbers on this. Note that my numbers will be wrong, but they are concrete. A knowledgable expert could fire away at these, but I'm guessing the overall conclusion will hold. To cut to the chase, the cost of the hard drive itself is almost irrelevant these days.

So the sweet spot for EIDE drives today seems to be at the 200 GB size, which can be had for about $110 apiece. Let's pretend it's next month, and the cost is $100 apiece. That gives us a figure of $0.50 per GB for raw diskspace. But what we need is for this to be part of a system, not just an HD plopped on your desk. We also do need a system that has redundancy, but which of course can take massive advantage of compression. With a lot of hand-waving, I shall argue that Google can achieve system integration and redundancy in the initial installation of $1 per GB.

Groovy, but the real problem is that I have to keep that diskspace spinning. Electricity costs about 10 cents per kilowatt hour; to keep one 1GB worth of disk spinning takes maybe 40 watts of power. So that cost is about $36 per effective GB per year. *Which completely dominates the hard disk cost.* (Please, if you see anything wrong here, let me know; I get .04 kW * 24 hours / day * 365 days /year * 10 cents / kW-hour = $36.50.)

Figuring bandwidth costs is even a bit trickier. I have reason to believe that keeping 1 GB worth of disk flying to and fro will not cost google over $20 per year. So all told, I project a yearly cost per GB of user email space of about $60.

Now for the fun parts... I'm willing to bet that 50 MB accounts will be the average across users, certainly in the first couple of years. So the yearly gross cost of providing email service to *one* user is $60 / 20 = $3 per year. Maybe I'm off by a factor of 10, and the gross cost is $30 per year. $10 per year is a nice round number, though, and close to the geometric mean of my estimate and a wildly pessimistic guess.

So the final quesiont then is: can Google make $10 per year off each user in advertising income? I'm guessing they can. Not only do they get to put targeted ads next to email messages, but by "converting" more people to use Gmail, those people will probably increase the usage of other google services, and their exposure to Google advertising.

But maybe I'm wrong; what to do then? Well, I've had a gmail account for about 24 hours now (thanks to a reader of this very weblog!), and I know in my heart that I would pay more than $10 per year for it already. Two years from now, Google could offer me 5 GB of combined email and web space for $29.95 per year and I'd jump on it. That would just about cover their costs on my dime, and anything they get from advertising is gravy.

OK, so there's an obvious sanity check: if the field is this lucrative, why did all of the other free-service dot-coms go bust? I think there are many reasons, starting with the fact that most dot-coms really didn't have very many clients using their services, and had little to offer to advertisers paying in cash. In addition, 5 years ago was 1999; hard disk space was 30 times as expensive, and bandwidth costs were higher as well. Plus, all of those companies had atrocious overhead; Google is absolutely lean by comparison.

Now the truly scariy part for some large software companies is covered by the possibility that gmail ends up being good enough that whole corporations can end up replacing their own email systems with it. They would have to pay, of course, but I see them coming out ahead on the deal in almost every scenario. And then there is the whole corporate internet/intranet webserving business. And, hey, then think about the whole nasty business of file-serving in general...

I, for one, welcome our new diskspace overlords.

Posted by: Jonathan King on June 20, 2004 10:07 PM

____

"to keep one 1GB worth of disk spinning takes maybe 40 watts of power"

I think you mean it takes 40 watts to keep one (high performance) drive operating. At 3x redundancy and 200 Gig drives you would be off by a factor of 65. If they only needed 7200 rpm drives then you'd be off by over 100x in your electricity calculation.


Posted by: snsterling on June 20, 2004 10:35 PM

____

snsterling writes:
>
> "to keep one 1GB worth of disk spinning takes maybe 40 watts
> of power"
>
> I think you mean it takes 40 watts to keep one (high
> performance) drive operating.

Uh, sort of. I actually (mis)calculated this with 4x redundancy, so I was off by a factor of...200 GB/4 drives = 50. But then I just left this factor out. Oops; thanks for spotting that.

So the electricity cost is less than $1 for 1 GB of raw space. That means my cost estimate for keeping 1 GB up, spinning, and flying should have been more like $25 per year. So if you have 20 users on that 1 GB of space, your cost is down to $1.25 per user (and most of that is still not hardware). Off by a factor of ten low gives an estimate of $12.50, so my geometric mean estimate is abour $4, down from $10. If I thought they could swing $10 per user per year, I'm obviously enthusiastic that they can do it with a cost of only $4.

And here's a new sanity check: Yahoo is offering free 100 MB accounts and $20 per year 2 GB accounts. Somebody who *buys* a 2 GB account is probably piling up a lot more email than somebody who goes for the freebie; I'm guessing they'll only be able to fit 2-4 users in per GB, so their cost is higher. They still make a profit, though, even without ads.

Posted by: Jonathan King on June 20, 2004 11:10 PM

____

www.nytimes.com/2004/03/31/technology/31CND-GOOGLE.html

One internal Google study put the operational cost of maintaining electronic mail storage at less than $2 per gigabyte.

Posted by: anon on June 21, 2004 01:35 AM

____

The power for the airconditioning costs more than the power for the drives and the servers/routers by a factor of five or so.

Posted by: walter willis on June 21, 2004 01:48 AM

____

The idea of taking an IDE HDD that you buy off the shelf and sticking it into a system as huge and complex as gmail, or google the search engine, is a pipe dream.

Somebody apparently never heard of RAID.

Posted by: Felix Deutsch on June 21, 2004 05:02 AM

____

All the drives don't have to be spinning all the time. People a lot less competent than Google can surely shift low-use files to dark drives andand spin them up when needed. Or they can play about with time zones; I'm not going to be consulting my e-mail archive at 3 a.m. It's not like their Web index that has to be available continuously.

Posted by: James on June 21, 2004 05:30 AM

____

I can't go through the numbers right now, but I think the ballpark figure is $1-2/GB in storage for HW cost and about a similar per-year in operations...

Google has actually told us a HELL of a lot about their system architecture:

They use off the shelf components, deliberatlely unreilable, to make a reliable SYSTEM. Cheap large IDE disks, custom (for cheapness/feature removal) Motherboards, etc.

Also, the Google Filesystem Paper (Google for "Google Filesystem") give a huge insight into how they turn a bucket of unreliable systems and unreliable disks into a RELIABLE file-storage

Posted by: Nicholas weaver on June 21, 2004 06:47 AM

____

A few more points worth mentioning

10 x compression is very, very unlikely as the note suggests. 3 or 4 fold compression is more likely.

Redundancy will double or triple storage costs. Meta data costs (database space, file system management area etc.) will probably add another 20-30% to storage requirements.

Posted by: Jon Juzlak on June 21, 2004 07:04 AM

____

Felix Deutsch writes:
>
> The idea of taking an IDE HDD that you buy off the shelf and > sticking it into a system as huge and complex as gmail, or
> google the search engine, is a pipe dream.
>
> Somebody apparently never heard of RAID.

Good thing I finished my coffee ten minutes ago, or I'd be wiping off my screen right now.

I have heard of RAID. I even have one, or, rather two (guess what one of them does?). It is because I know from RAID that I am pretty confident that Google doesn't do things...anything like the way I do them. See other posts on this thread, but Google at least *was* known for taking the core notion of RAID to another level: they depended on RAIC (redundant array of inexpensive computers). I've heard whispers that they've moved a bit up the food chain since then (now it's more like a redundant array of cheap rackmount servers). But the whole point was to get to a point where the *system* behaved well enough so that the notion of just adding boxes/racks with RAID0 striped drives controlled by software was close enough to the truth so that you could at least figure the costs that way.

My cost for HD space was just: buy 4 drives to mirror the contents of 1 drive (in different places), expect something like a 4:1 compression ratio, and then double the cost of the drive to get a system cost. Yeah, that's crude, but 800 GB of disk these days is 4 drives, costs $400, and fits in a box that shouldn't cost very much more than that. But let's say I'm off by a factor of 4; then 1 GB of raw HD space costs google $4 to buy, or maybe $.20 - $.40 for each Gmail user than rather than $.05 - $.10. The point is that the other things (now) dominate the cost of the system, but that's still low enough to make the project potentially profitable.

Posted by: Jonathan King on June 21, 2004 07:38 AM

____

Nicholas Weaver writes:
> Also, the Google Filesystem Paper (Google for "Google Filesystem")

Speaking of compression, the now (well-known) formula:

Google for "google filesystem"

returns as its first hit the following URL:

www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf

Using the literal search string to stand for the result URL
give us compression of 3:1 here, and it's human-readable. :-)


Posted by: Jonathan King on June 21, 2004 07:54 AM

____

Something people are missing is that, as I understand Google, the space they are using is not just cheap, it is FREE.

Specifically, as I understand Google, their indices are not spread over an entire drive but are limited to the first quarter, so that seeks are fast. This means the remaining 3/4s of the drive is free for other use. The difference between using it for GMail and using it for more index is that the assumption is that the GMail users are not especially aggressive, unlike the use of the index, so that, yeah, every so often the disk head will seek over to the slow 3/4s of the disk, but it'll then immediately veer back to the hot 1/4.

Posted by: Maynard Handley on June 21, 2004 08:02 AM

____

As far as compression goes, raw text compresses remarkably well. There are roughly 60-70 characters used in most text, which can be compressed down fairly tightly. In addition, the written language has many highly compressible artifacts - allowing it to compress down even further. A compression ratio for text of 1/4 - 1/10 is not unreasonable. Now, binary files tend to have a much more random distribution, and generally only compress down to 1/2-1/3 size.

Just some random info that might be useful.

Posted by: Thane Walkup on June 21, 2004 08:58 AM

____

As far as IDE hard drives go. Me, a lowly consumer, bought 2 250 GB drives on Memorial Day for $120 each, with all discounts at the cash register (no mail-in-rebates). This were 7200 drives with an 8 meg buffer.

I'm assumming that Google with mass purchasing power and buying slower drives can purchase their storage (if not maintain it) for probably below $.40 a gig.

Posted by: KevinNYC on June 21, 2004 09:33 AM

____

the real economics is how it scales, if you use a gigabyte you will most likely be accessing your mail more often which means more text ads are being shown which means more profit for google.

Posted by: inst on June 21, 2004 11:23 AM

____

OK, I'm curious - don't know if it can be answered, but have to ask.

If I use gmail, can I use the google engine - or rather a reasonable modification (cued to access after entering my username and password for example) - to search my mail?

If the answer is yes, then Jonathan King is (I suspect) wildly understating the value of a Google offer down the road of 5G mail/web for $29.95 a year would be. Look back at the past few threads on Googling for retrieval. Now consider: Public website / blog. Private website and blog. Email. All searchable with Google engine. Sure, it's not metadata and other nifty information retrieval theories. On the other hand, it'd work. The only significant weakness, in fact, would be access when at slow or no internet connection locations.

Would I pay $30/year for 5G this accessible? heh - yes. Would I pay $10 per month for this? Probably, but I'm not a big blog/mail user. And it also depends no little bit on what my access costs me as well. But still... probably.

Posted by: Kirk_Spencer on June 21, 2004 11:38 AM

____

Great site fatty lose weight with reductil and reductil uk

Posted by: reductil uk on July 6, 2004 02:30 PM

____

Great site fatty lose weight with reductil and reductil uk

Posted by: reductil on July 6, 2004 10:56 PM

____

Great site fatty lose weight with reductil and reductil uk

Posted by: reductil on July 8, 2004 03:12 PM

____

Get it up mate, it's fun!

Posted by: Viagra on July 8, 2004 08:45 PM

____

This will get yours up again, dude!

Posted by: Viagra on July 12, 2004 02:58 AM

____

It gets yours up to the top dude! The girl will enjoy it!

Posted by: cialis uk on July 13, 2004 07:15 AM

____

Muppets love Viagra!

Posted by: Viagra on July 14, 2004 04:13 AM

____

I don't really think your thoughts are right. Maybe you need a loan?

Posted by: payday loans on July 15, 2004 12:43 PM

____

Thanks for your blog!

Posted by: Kontaktanzeigen on July 16, 2004 06:27 AM

____

Kontaktanzeigen are pretty cool, aren't they?

Posted by: Kontaktanzeigen on July 16, 2004 10:38 AM

____

Thanks for your blog!

Posted by: Kontaktanzeigen on July 16, 2004 12:04 PM

____

Hire a car to feel higher!

Posted by: Car hire on July 20, 2004 05:23 PM

____

Hire a car to feel higher!

Posted by: Car hire UK on July 21, 2004 12:03 AM

____

Post a comment
















__