October 23, 2002
There's Something I Don't Understand

Intel expects to make microprocessors with a clock speed of 15 GHz--fifteen billion clock cycles per second--by the end of this decade:


PCWorld.com - When Will Desktop Chips Hit 15 GHz? ...Users can expect to see the processing speed of Intel's desktop processors hit 15 GHz and that of wireless device and PDA processors hit 5 GHz by 2010, the chip maker's chief technology officer said in Tokyo on Wednesday. The 15-GHz desktop chip, some five times as fast as the company's soon-to-be-launched 3-GHz Pentium 4 chip, will pack one billion transistors, said Pat Gelsinger, vice president and chief technology officer of Intel as he delivered a keynote address to the company's Intel Developer Forum Japan conference in Tokyo. Gelsinger would not disclose whether he expected these speeds to be seen in Pentium 4 processors or those based around a new architecture. Intel has said previously that the current Pentium 4 architecture is good up to around 10 GHz.


Clearly I don't understand what "clock speed" is. In one-fifteen billionth of a second, light is only able to travel 0.787 inches--barely across the face of the chip. How can it run so fast and still be a single chip?

Posted by DeLong at October 23, 2002 08:10 PM | Trackback

Email this entry
Email a link to this entry to:


Your email address:


Message (optional):


Comments

i think the actual important part of the chip is much, much smaller than 3/4 of an inch

Posted by: andy on October 23, 2002 09:55 PM

I don't understand this either. But the interesting point, surely, is that there is no immediate end in sight to the Moore's law phenomenon, and hence the hedonics-based drop in computer prices. Equally interestingly it seems that hard disk storage capacity is likely to double every six months during the same period, which means we'll all have tetrabyte-plus hard disks in the not-very distant future. To put this in perspective, Brewster Kayle, of internet archive fame, claims to have the best part of ten years internet global content stored on 100 tetrabytes.

Why is this interesting? Well current ISP economics is based on an internet architecture of many clients-one server, the P2P phenomenon, coupled with web-based services (like blogger and moveable type) means that all this supped-up power processing and storage capacity could take things in a very different direction.

To get a glimpse of what this all means in another direction, check out the following:

"Thousands of desktop computers working together in their spare time have resolved a long-standing biological puzzle, in a breakthrough in data processing, the British journal Nature reported yesterday.

A team led by Vijay Pande of Stanford University in California put out a call two years ago for PC owners to make idle computers available to crack a monster of a problem: how the atoms of a protein cause it to fold into a 3D knot.

Understanding this could help drugs designers come up with molecules to attack Alzheimer’s and the human form of mad-cow disease, both of which are caused by misfolding, rogue proteins."
http://www.timesofoman.com/newsdetails.asp?newsid=17469

What does all this mean? Well good news for computer users, and good news for patients for one. Consumer welfare is going to rocket here. The tricky question is what all this does for corporate business models and profitability. But just remember, the cost of having ideas is coming down like never before.

For these (and, as they say, many other) reasons, I'm fully with you Brad on the tech optimism side, but much more skeptical about the new-economy productivity implications.

Posted by: Edward Hugh on October 23, 2002 09:59 PM

In a word: "pipelining". I was going to write something myself
explaining it. Instead, I Googled pipelining clock assembly line.
The first part of the second hit (before "Simple DLX operation ...")
gives a reasonable introduction to to pipelining. Many of the other
hits look helpful too.

Let me know if your question remains after a quick look in this
direction

Posted by: Mike Gunter on October 23, 2002 10:16 PM

Instructions are actually processed faster than the clock speed on most modern chips. The clock is a way of timing instructions because multiple parts of an instruction have to all be finished before the chip can start the next instruction. Basically, each part of an instruction finishes, and then waits until the next clock cycle starts. That way, clock speed does represent the time it takes to do one instruction, because it forces everything to take exactly one clock cycle, even if it can go faster.

Anyway, at least that was how it worked on the older chips I learned on. The new techologies like pipelining complicate this situation, and there are now chips that don't even have clocks. I learned about hardware on old 8086 and 68000 chips- 20 year old technology - and I understand that things work very differently now. Clock speed is increasingly just a marketing ploy, and it's inflating faster than the Deutschmark after WWI.

Chips are a whole lot smaller than 3/4 of an inch nowadays. The biggest delays in processing tend to come from the memory bus anyway, which is not obeying Moore's law quite as well as processors themselves.

Posted by: Scott Martens on October 24, 2002 12:03 AM

One thing to bear in mind is that the delay of electromagnetic signals on an integrated chip due to the finite speed of light is small compared to delay due to finite carrier (electron, hole, what-have-you) velocity-- so if your design deals successfully with finite carrier velocity you've, willy-nilly, dealt with finite signal velocity as well.

Another thing to remember is the distinction between bandwidth and latency. The delay between input data and output data is a lot larger than a single cycle-- a lot of processing takes place between input and output.

Posted by: Matt on October 24, 2002 05:17 AM

I think the answer to the original question is "the important bits are less that 3/4" apart," but chip structure isn't my field.

I'd take issue with the claim that "there is no immediate end in sight to the Moore's law phenomenon, and hence the hedonics-based drop in computer prices," given that the traditional means of increasing chip speed(packing more transistors into a smaller area) is going to hit problems somewhere around 2015-2020, when the transistors would need to be the size of single atoms or molecules. But the key word there may be "immediate," which 2015 may not be.

Posted by: Chad Orzel on October 24, 2002 05:41 AM

Thanks for the links, Matt. The only other thing I could imagine they might be talking about is asychronous processing. But in that case it clearly wouldn't be "clock speed", would it?

Odd to remember that processing cycles used to be slow enough for people to diagnose technical problems with an oscilloscope....

Posted by: .david on October 24, 2002 07:10 AM


The actual CPU part of the Athlon XP 2800 processor (which runs at 2.2Gz) is smaller than 84 sq mm (according to Tom's Hardware Guide). Moore's Law predicts that by 2010 you could shrink that by a factor of 32, so if the size of the chip is the limiting factor, AMD could clock a shrunken Athlon chip in 2010 at 67.2Gz.

Posted by: Walt on October 24, 2002 08:51 AM

"Clock speed" is not exactly what you expect.

Within a chip there is a tiny analogue to the CPU/memory split that you are probably familiar with. Thus a chip contains not only processing mechanisms, but also memory. That memory serves as a temporary holding area for data: instructions (add two things together, compare two things, et cetera), numerical values (which at this level could actually represent characters of text, image pixels, et cetera), counters for governing the chip, and so on.

Thus, the processing part that executes instructions occupies only a fraction of the chip. As you have guessed, propagation delay makes it imperative for this element to be as small as possible. "Clock speed" is the rate at which this portion of the chip works through instructions.

The size of the chip does govern a different, slower cycle called "bus speed," which is the rate at which information moves around on the chip as a whole. And yes, coordinating events on a chip where clock speed and bus speed differ is as hard as it sounds.

Posted by: Jonathan Korman on October 24, 2002 09:08 AM

A few clarifications here as this hive mind that Brad has agglomerated here hones in on the issue:

Pipelining is not the issue here -- it is a technique independent of clock speed. In a pipelined processor with three-step operations, in a given instruction cycle (usually one or two clock cycles), one part of the processor might be executing Step C of the first operation, another Step B of the second, and another Step A of the third. In the next instruction cycle, we would get execution of Step C of the second operation, Step B of the third, and Step A of the fourth.

The real issue here, as others have stated, is the propagation of the clock signal that keeps everything synchronized. Long gone are the good old days (from a designer's perspective anyway) when you could just place a crystal oscillator near the processor and just route its clock signal all over the processor for each logic block just to tap into.

Each time the clock frequency is bumped up, the chip designers have to become more and more careful about the routing of the clock signal and everything that depends on it, both due to the speed-of-light delays and the fact that the higher the frequency, the shorter the conductor length that makes a good radiating antenna.

The issues are very complex, and I only understand them in a general sense. There is a very small number of people that understand them well; you can be sure that Intel employs a good fraction of this coterie and pays them very, very well.

As others have noted here, clock speed is not the only thing that affects computational speed. It is easy to get no meaningful increase in real speed from bumping up the clock frequency if other factors are limiting you. However, there is continual progress being made on all of these other factors as well; it just isn't as easy to summarize in a single sexy number like 15 GHz.

Posted by: Curt Wilson on October 24, 2002 09:56 AM

This does not answer Brad's question about clock speed, but it does have some interesting implications for speed of computation, and it's pretty interesting. It also has relevance for Brad's earlier post about Moore's (speak of the devil) Law.

From Reuters today:

By Caroline Humer NEW YORK, Oct 24 (Reuters) - International Business
Machines Corp. scientists have built the tiniest computer circuit yet using individual molecules, a move they say advances their push toward smaller, faster electronics. IBM researchers at its Almaden Research Center in San Jose, California, have built and operated a computer circuit in which individual molecules of carbon monoxide move like toppling dominoes across a flat copper surface. One circuit is so small that 190 billion could fit on a standard pencil-top eraser, IBM said. IBM has been working on molecular computing for years as it tries to find an alternative to silicon-based semiconductors in modern computers.
Silicon has performed well over the past few decades, fulfilling a tenet from Intel Corp . founder Gordon Moore that the number of transistors on a chip would double every 18 months. But scientists expect its physical properties to limit further advancements in the next 10 to 15 years.
IBM said the new "molecule cascade" technique enabled it to make logic elements 260,000 times smaller than those used in silicon-based semiconductor chips.
IBM is still years from translating the nanotechnology and quantum computing work it has done in research labs into a setting where such transistors could be manufactured and then used in products like cell phones and personal computers.
"The exciting thing is not so much that we're not there yet. The exciting thing is where we've come from," said IBM fellow Don Eigler.
"We've come from, in about 12 or 13 years, from discovering that we had an instrument that was just barely capable of imaging atoms and then moving atoms to function logical circuitry," he said.
They are also smaller than the circuits that IBM has made in the laboratory out of carbon nanotubes, which are extremely strong because of the nature of the carbon bond, and which IBM considers to be a possible alternative to silicon.
The molecule cascade circuits were made by creating a pattern of carbon monoxide molecules on a copper surface. IBM moved one molecule to start a one-directional cascade of molecules, similar to the way dominoes interact. The circuits to not reset themselves.
IBM is publishing details of the advancement in Science Magazine.
((Caroline Humer, New York Technology Desk, 1 646 223-6180,
caroline.humer@reuters.com))

Posted by: K Harris on October 24, 2002 11:36 AM

It's absolutely wrong to say "Pipelining is not the issue here --
it is a technique independent of clock speed." The whole point of
pipelining is to improve throughput by allowing faster clock speeds!
Much of the CPU clock-speed increase we've seen is due to pipelining.

It seems I'm not going to be able to Google my way out of writing
something about this. The key here most certainly is pipelining.

[I generally eschew stating credentials but it's probably worthwhile here
because, I presume, most of the readers don't have any expertise in logic
design. I got 'em: I've done architecture and/or logic design for CPUs,
graphics chips, video compression chips, and wireless chips.]

Brad DeLong asked how a clock cycle could be shorter than the propagation
delay across the chip. Let's make the argument leading to this question
more explicit. Assume at least one element, let's call it the
synchronization point, of the chip is involved in every operation (which
has to be true for CPU instructions because they execute single streams
of instructions where instructions can depend on the previous one.)
Assume every part of the chip can be involved in some operation -- i.e.,
all parts are active (which isn't quite true -- but close enough.) Then,
if an operation executes in a single cycle it would have to involve
communication from the synchronization point to every part of the chip.
This means propagating a signal at least half way across the chip in a
single cycle.

Given these (very conservative) assumptions, if the clock speed is too
fast to allow propagation (and on-chip propagation is slower than the
speed of light) across half the chip, operations must use distinct
parts of the chip on different cycles. Breaking operations into more
than one cycle necessarily increases their latency (the circuit
elements to do the breaking have delay, etc.) So, it would hurt your
performance to do this unless you used the parts for different
operations simultaneously. The name for this technique is
pipelining.

Note that it's not the case that pipelining has been primarily
motivated by propagation delay (though it's increasingly important.)
We pipeline chips because we want them to go fast and would do so even
if propagation delays were zero (as long as there was transistor
switching delay -- with neither type of delay we could go arbitrarily
fast!) Also, don't think pipelining is limited to CPUs. Pipelining
is a general, very old (in context) and very, very commonly used.
Every piece of hardware I've architected or designed uses it
extensively. I'd be very surprised if there is a single significant
chip which doesn't use it!

The issue of propagation of the clock is mostly a red herring for two
reasons. In the abstract, it's quite possible to synchronize clocks to
greater precision than the time for light to propagate between them. For
a chip, the important thing is that the transitions of the clock arrive
at the relevant points as simultaneously as possible. I.e., when
distributing a clock on chip you strive to make the delays to each
element as similar as possible, not to make them
small. (Since chip fabrication is far from perfect it's
impossible to make them identical.) Secondly, it's only the skew between
the clocks going to the storage elements which bound the work performed
in a single cycle that matter. Because of pipelining, that will be a
fraction of the chip which is designed to be physically compact. It's
the clock skew across the fraction of the chip involved in a single
cycle's work that comes out of your cycle-time budget. This skew isn't
much related to the speed of light.

Since I've written this much, I'll take on a related comment I've heard:
"CPUs get faster due to Moore's Law." What's sometimes meant here is
that the lion's share of the increase in CPU performance is the result of
improvements in fabrication technology. I.e., increases in the number of
transistors one can fit on a chip (Moore's law) and the speed of those
transistors (mostly because smaller is faster) led directly to the
increases in CPU performance. This is wrong! The calculation I saw (in
1997, when I was doing CPU architecture) had transistor speed accounting
for half of the CPU performance increase. The hard work and cleverness
of the people who design CPUs was necessary for the rest. This comes in
the form of increased pipelining (which requires lots of cleverness and
work), putting more memory of the chip (in the form a caches, etc. --
this leverages Moore's Law relatively more directly), improved
circuit-design techniques (building more the functions you want out of
transistor in ways that go faster -- again requiring lots of cleverness),
and clever high-level architectural techniques (e.g. executing multiple
instructions in parallel, etc.). CPU designers deserve a lot of credit.

Posted by: Mike Gunter on October 24, 2002 01:35 PM

Clock speed is, as I understand it, how long the processor has to wait to communicate with the bus while it is performing operations, that is to say, How much faster than the bus the processor is. As for distances involved, I believe the lastest processors work on a .15 micron or smaller process.

Posted by: Dennis O'Dea on October 24, 2002 04:45 PM

Intel are working on .09 micron technology right now as are other companies with an interest in pushing the limits of what chip archtectures can do such as Nvidia for their upcoming NV 30 GPU. Ars Technica has some very handy articles on CPU design. Here's one looking at the Pentium 4:

http://arstechnica.com/cpu/01q2/p4andg4e/p4andg4e-1.html

Posted by: Andy F on October 25, 2002 05:54 AM

Brad --

> Clearly I don't understand what "clock speed" is. In one-fifteen
> billionth of a second, light is only able to travel 0.787 inches
> -- barely across the face of the chip. How can it run so fast and
> still be a single chip?

I suspect your misunderstanding is about the "clock" part, not the "speed" part. The function of a processor clock is to provide a periodical signal which each of the processor's subcomponents can independently synchronize their own operation to. So what the subcomponents need is a sequence of well-defined time *intervalls*. They don't necessarily need to know a well defined *absolute* time, contrary to what one might intuitively expect based on the clocks of our macroscopic world.

So you're right about one thing: the non-zero size of chips constrains how precisely subcomponents can know absolute time. But it doesn't constrain how precisely they can know time intervalls, and it's the time intervals they need to know precisely. This resolves the paradox underlying your question.

A complimentary answer has already been given by Andy and Chad Orzel: Processor designs tend to be optimized in such a way that each subcomponent mostly communicates with subcomponents that are close to them. They rarely 'talk' to subcomponents on the opposite side of the chip. This greatly reduces the "effective chipsize" that each subcomponent cares about.

Greetings --

Posted by: Thomas Blankenhorn on October 25, 2002 07:11 AM

Mike:

I was coming at the issue of pipelining from a user's (programmer's) point of view, where pipelining has to do with multiple (whole-number) clock instruction cycles, and Brad's question really had to do with fractional clock-cycle propagation delays.

I can certainly see that from a chip designer's point of view, pipelining is a key strategy to be able to increase clock frequencies. If you want to increase the clock frequency more than you can reduce propagation delays (which, yes, have more to do with transistor switching delays than speed-of-light delays), you've got to reduce the number of things that happen in a single clock cycle by spreading out the operation over multiple clock cycles.

Since it doesn't do much good to, say, double the clock frequency, but double the number of clock cycles to finish the operation, with the next operation waiting until this operation is fully complete, yes, pipelining becomes a key strategy in high-frequency chip design. But as you yourself pointed out, pipelining is a very old (in electronics context) technique, used when clock frequencies were 1% of what they were now.

Posted by: Curt Wilson on October 25, 2002 06:09 PM
Post a comment
Name:


Email Address:


URL:


Comments:


Remember info?