Intel to Show Off 1 TFlop, 80-core CPU

By Scott M. Fulton, III | Published February 12, 2007, 3:04 PM

At a meeting of the International Solid-State Circuits Conference in Santa Clara later this afternoon (early this evening East coast time), engineers from Intel are slated to demonstrate a working version of a conceptual CPU, using designs the company may integrate into future product lines. As promised during the last Intel Developers' Forum, this concept CPU will incorporate 80 cores using an experimental "network-on-a-chip" architecture, which enables the cores to share data without depositing it in memory first.

Last month, in an about-face in its strategy toward waging the "dual-core duel," AMD CEO Hector Ruiz pronounced, "It's not about the cores," in an attempt to deflect attention toward those parts of CPU architecture where AMD may still hold a slight advantage. Today's ISSCC conference is evidently where Intel responds, "The heck it's not!"

A few new technical details were revealed today in, of all places, the agenda booklet for today's ISSCC. "A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10*8 2D array of floating-point cores and packet-switched routers, operating at 4GHz," reads the brochure. "The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. The 65nm 100M transistor die is designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W."

If you're a regular ISSCC attendee, you already know what the above means. For the rest of humanity, here's a bit of explanation:

  • Mesochronous clocking refers to the capability of a chip to accompany data with clock synchronization signals, rather than have an independent system clock generate ticks. With processor designs becoming vastly more diverse and segmented, there's a point at which the work required to maintain operations throughout all those compartments at one time becomes futile. One alternative approach is asynchronous clocking, which simply lets each department march to its own drum beat, although the beats are brought together in tandem. The mesochronous approach concedes that as the chambers of the CPU become more diverse and cavernous, you can't make the clock tick "loud" enough to pervade every room at once. So instead, clock signals are inserted into the data stream, and permitted to flow through the system as they will, with the latencies worked through later on.
  • Tiles refers to the multi-core architecture Intel is trying with this concept, in which processor cores are quite literally stamped one adjacent to the other, with the patterns in one running flush with those in the other, like laying linoleum tile. Here, tiles are laid in a 10 x 8 block, with each tile containing a core and a router. The latter is used for distributing packets within the chip itself, managing its own "micro-internet."
  • Fine-grained clock gating sounds like a feature you'd find on an antique timepiece. Actually, it's a feature that Intel has exploited since the design of the Pentium 4. It enables segments of a processor to be told to go to sleep, essentially, by turning off their clock signals. If a component doesn't "hear" a clock tick, it concludes it has nothing to do. To be able to control power consumption at a granular level, addressing as small a segment as possible, engineers fine-grain the CPU's clock gating. The danger with this approach has historically been with power leakage - the more clocks you can switch off, the more opportunities there are for current drain.
  • Dynamic sleep transistors and body-bias techniques are Intel's approach to compensating for this very danger. With dynamic sleep, transistors that don't receive their clock signals can be turned off when in sleep mode, eliminating the possibility for current leakage there. Body biasing then enables the threshold voltage for those sleeping transistors to be lowered, so it doesn't take as much power to switch them back on again.

Intel's goal is to demonstrate turning over a teraflop of performance (or "TFLOPS") - one trillion calculations per second - in a component that utilizes a mere 62 watt power envelope. This isn't so Intel can immediately begin mass producing supercomputers, but instead so the company can start planning its next quantum leap - the next big change in Intel architecture, comparable to its shift away from NetBurst and toward Core Microarchitecture last year.

In a recent interview with BetaNews, In-Stat principal analyst Jim McGregor explained the historical, evolutionary strategies between Intel and AMD:

"Intel is kind of brute-force, where they put a single process in place and multiply it by multiple fabs, [to] make sure the products run on a single process. AMD, and a lot of the rest of the industry, [takes the approach], 'Let's continue tweaking the process around the product.' So where Intel does these major node transitions, AMD and IBM do these sub-node transitions. Within a process generation, they may change things like even the transistor design several times before they go to the next node...So the jumps from process node to process node are bigger for Intel than they are for a lot of the other industry players."

Intel has foregone its usual spring Intel Developer's Forum for the first half of this year, which may be why the company is showing off its Core Microarchitecture developments, along with its 80-core concept CPU, at ISSCC this year instead.

Comments

View comments by with a score of at least

So what?...

Intel has been great at simply stacking cores on a bus. And the result is a spectacularly sounding configuration with SERIOUS bus contention issues!

The irony is that simply stacking cores seems to make so many of those who cannot fathom any environment beyond their PC gaming rig drool.

What is most interesting is that this announcement is like Porsche announcing that their new series car will have 78 cm wheels and listening to the ooohs and aaahs. Unfortunately, the really critical structure that should be the real story is missing.

In this case it is not even a simple matter of a non-contentious buss structure. And this chip obviously does not feature a 'shared nothing MIMD (multiple instruction, multiple data)' configuration. But even if it did and was built to perform solely as a massively parallel processing unit (implying that each core essentially function as a separate node), in order for the machines to work together effectively when each one is doing different instructions on different data independently of the other cores, there has to be a good message passing protocol. And for parallelism to be effective with completely independent nodes, the mesage passing has to be Very fast and reliable.

Previously IBM has created a high performance communications network known appropriately as The Switch for the RS/6000 SP configuration - the legacy of which IS the world's most powerful supercomputer!

But without details of the interprocess communications, this announcement is simply a bunch of Pentiums stacked up and glued together in a configuration that is doomed by the scalability limits of the von Neumann Bottleneck in this spectacular kludge of SISD programming.

Aside from this Rub Goldberg announcement, the REAL news is the cancellation of the Spring Intel Developer's Conference!

And the lack of industry participation sufficient to justify this function speaks volumes that are indeed deafening.

Stacking cores is easy. The next battle is to watch AMD and Intel attempt to develop what IBM has already done with spectacular success - and that is an effective message passing infrastructure coupled with a robust and efficient redundant non-contentious bus structure that will allow the cores to scale.

And what will be even more interesting, complete with lots of opportunity for patent infringement cases is how they will do it without stepping all over IBM patents!

It will be fun to watch the kings of the baby CPUs try to grow up and move into the territory OWNED by IBM and Power.

The real irony that Intel is loathe to have mentioned is that they do NOT develop on Pentiums.

Intel develops on the IBM RS/6000 SP and its variants!
So I guess the really great add would be, "Why buy a Pentium when you can buy what Intel buys!"

By the way, there is a really fascinatingly story regarding the certification of Intel's RS6000SP for Y2K (despite UNIX not having a Y2K issue! - even conceptually!!!) due to the RS6000 nodes utilizing a capacitor rather than a battery to maintain real memory (RAM). Someone might have 'thunk' that some genius inside Intel might have understood the functional advantages of employing a capacitor as opposed to a battery in a unit that is powered 24/7/365! But No!!!! Curious with just how that would relate to Y2K certification? So was I as they reopened the case over a 3 week period!! But you can't sneak anything by those swooft Intel managers!

Maybe if we got the Intel engineers together and explained how electrons are your friends as they all hold hands and skip down the wire...

And with this level of incompetency, it will be a real howl to watch this next stage in the market development.

Score: 0

|

What part of the new "275mm2 network-on-chip architecture" did you not understand?

It's a different architecture. Be excited or don't. Why get mad?

Score: 0

|

Who is mad?

Stacking cores without an efficient message and resource passing software and buss structure is simply akin to some kid putting 25 audio amplifiers in a car and playing his hip hop music at 150 dB. Much ado about nothing. At best this is a big math coprocessor.

Stacking cores has never been the limitation, it was the ability to scale due to the buss and resource managment.

Intel has simply proven that they can stack more than others, while others have focused on more productive features. AMD has spent more time on the buss and IBM is so far ahead in massively scaleable technology that Intel isn't even in their rear view mirror.

Its a Pyrrhic victory. Much ado about nothing, unless you need a really big calculator, and not much else.

Score: 0

|

I heard it went past 1.5 TFlops actually.

"What's the Futuremark 3DMark06 score??"

You are kidding, right? This isn't a chip that will have anything to do with gaming, I assure you. It is compared to the fastest supercomputer in the world's processors, and it even beats it if you had 200 of them like the supercomputer did.

This is a "superprocessor" per se, like the Department of Defense uses this type of cpu in a supercomputer to calculate every possible route to intercept incoming nuclear warheads before they reach America, calculating the national debt (lol just kidding), and that sort of thing. Plus the processor is just a processor and not an entire computer (was tested in a supercomputer not a PC).

Score: 0

|

Lay-off, the guy must be a true enthusiast. If given the chance I would immediately start SPEC testing and then brag to my buddies ;0

Score: 0

|

Nope!

Its not massively parallel.

And no, it is not like the CPU in any supercomputer! You might want to become familiar with the RS6000 family that owns the market.

If you are interested in discovering why its not a 'supercomputer' and what is necessary you might want to do some research and to learn about technology represented most easily by IBM's "The Switch" used with the RS6000 SP and the software component whose acronym is PSSP.

The CPU is almost a minor concern compared to the IPC message passing and resource management system.

Score: 0

|

Hahaha! Some of actually work in and with the technology!

That should confuse you!

Score: 0

|

What's the Futuremark 3DMark06 score??

Score: 0

|

My guess is: Very very poor 3DMark scoring if I could get one done at all. Most of that class of machines aren't noted for thier DX or OpenGL performance.

Score: 0

|

What does AT&T's 'Mark the Spot' app say about service quality?

That's a question for Betanews readers to answer in comments to this post.

Windows fix for TLS security bug still forthcoming, won't be Tuesday

Anyone looking for a fix for last month's discovery of a potentially serious security hole in TLS and SSL may have to wait until everyone is ready to act together.

Google rolls out real-time search, Near Me Now, extended personalization

Over time, searches from PCs and mobile phones will grow even "more personalized." But what about user privacy and search results that give you "the truth"?

Intel's marriage of CPU and GPU not ready for prime time

Although there will be an Intel component this month that can compute and plot in parallel, Betanews was told today, it won't be based on Project "Larrabee."

Betanews Podcast: Rupert Murdoch and the buying stuff online problem

We'll have a more difficult time paying for online news if the underlying protocol for online payment has a big gaping hole in it.

Not the first, not the last, technology predictions for 2010

Carmi Levy | Wide Angle Zoom: The real truth is probably that what went around in 2009, will come around to haunt us next year.

Google Goggles: Hands on with the Shazam of the Real World

Google today unveiled Goggles, its visual search lab for Android devices that identifies objects by sight.

Microsoft: Windows 7 Family Pack wasn't 'pulled,' it just sold out

If you hurry, you may still be able to find the last Family Pack upgrade editions hanging around retail store shelves, but probably not so much online.

Clever iPhone game returns after being bumped over a name dispute

The game's simple concept and multitude of platforms and puzzles manage to pull off a retro, 8-bit style that's reminiscent of an old Atari game given a modern makeover.

An alternative to Research in Motion's enterprise e-mail? There's an app for that

Good Technology today released an iPhone app compatible with its enterprise e-mail solution.

Playing catch-up in 2010: Windows Mobile, BlackBerry, and Symbian

Microsoft, RIM, and Nokia are each working on improved mobile operating systems. But could these efforts add up to too little, too late?