Sun's New UltraSPARC T2 Has 64 Threads

By Scott M. Fulton, III | Published August 8, 2007, 4:16 PM

Sun Microsystems' new UltraSPARC T2 processor, announced yesterday, promises to break new ground in CPU parallelism not only by offering eight cores per chip, but eight threads per core. But will this necessarily mean 64 times the processor power? The answer depends on how you define "thread."

For at least the past half-decade, Sun Microsystems has known that the key to performance improvement in the "post-megahertz" era of microprocessors is to discover how to implement parallelism without dedicating a whole processor core to each thread.

Multicore architecture is efficient in many respects, and it satisfies the "requirement" of Moore's Law to cram more transistors into each new design. But it's inefficient in the sense that it should not always require four times the processors to perform four times (or less) the work, even if we compact everything onto a single die.

Parallelism in computing is the capability to execute multiple sequences of instructions at once. Semiconductor manufacturers have different approaches to this concept: Just prior to the dawn of the multicore era, Intel tried (and in some instances, is still trying) "hyperthreading," which refers to its CPUs' capability to suspend one set of instructions, complete with their respective registers, and concentrate for a time upon a second set. This kind of implicit parallelism enables programs to be compiled for a single-threaded processor, and yet still receive some benefits of being bunched up.

True explicit multithreading -- which Intel initially tried in its first-generation Itanium architecture -- enables the CPU to schedule sequences' execution much more logically, though it depends on programmers to compile their software to discretely instruct the CPU about how those threads should be scheduled. Itanium and Itanium 2 processors don't need multiple cores to pull this off; they use symmetric multithreading (SMT), which might have revolutionized the CPU industry much earlier than Core Microarchitecture had its instruction set been compatible with x86.

Then there is the type of parallelism exhibited by today's graphics cards, which is altogether different. There, a single instruction can be executed on multiple groups of data simultaneously, by way of multiple pipelines. This is very useful in a low-count instruction-set environment where your code spends most of its time shading polygons.

For Sun's UltraSPARC T2 series announced yesterday - the culmination of the long-rumored Niagara 2 project - the so-called "server-on-a-chip" borrows a little from all three of these concepts. In a concerted effort to blast its way right back into the CPU market, where its designs once caught fire among workstation builders, the T2 sports eight cores with eight threads apiece, all on a single chip.

But what's a thread? Or rather, how does Sun think of "thread?" It isn't clear from the company's marketing literature, but Sun does have its own idea. Yet it's not being ambivalent about sharing what that idea truly is; in fact, by opening up the T2 architecture and its associated documentation under the General Public License, Sun is being very direct and honest about the fact that its threads are different than others'.

Sun's UltraSPARC T2 processor up close
An up-close look at Sun's UltraSPARC T2 processor. Here you can plainly see the eight processor cores in the center, flanked on the left and right by shared resources such as caching and embedded microcode. Among the features of this microcode are built-in resources for expediting cryptography algorithms. (Courtesy Sun Microsystems)

In one critical respect, Sun's multithreading and Intel's hyperthreading are quite similar. The way Sun sees it, software doesn't have to know it's being executed in parallel. In other words, you don't have to compile it as multithreaded. As Sun's designed the T2, each of the eight cores is capable of maintaining a virtual machine state (all that work with Java finally paid off) for as many as eight independent threads apiece. Each of those VMs maintains a separate set of registers and resources to give threads the appearance of having a single-threaded processor all to themselves. In fact, Sun's documentation likens the effect to multiplying a single first-generation UltraSPARC processor.

Each virtual machine state, complete with its pipeline full of instructions waiting to be executed and the registers it needs to maintain that state, is referred to as a strand. In fact, Sun might have called its new processor "multi-stranded" if not for the negative connotation. And in the sense that we tend to say a CPU executes a set of instructions, in the virtualized model of UltraSPARC parallelism, the device that executes instructions is called a CMT. It stands for...something. Chip MultiThreaded, or "Chip MultiThreading," or maybe something else. Never mind the ambiguity; these are the three letters Sun chose.

Sun's documentation describes the setup like this: "In general, each virtual processor of a CMT processor behaves functionally as if it was an independent processor. This is an important aspect of CMT processors because user code running on a virtual processor does not need to know whether or not that virtual processor is part of a CMT processor."

That's seven "processors" just in one paragraph. But to put it another way, each virtual machine has all the digitally-represented resources necessary to enable running software to be given the impression that it's running on a dedicated, single thread. By default, software is non-privileged, which means it doesn't know of the existence of other threads or even other CMTs. But system software (such as the Solaris operating system) and security programs may be given higher privileges.

Is this necessarily a good thing? Originally, the impetus for developing parallelism for processors is so that software that encountered relatively heavier tasks could break down those tasks into easier-to-digest chunks, and distribute those chunks among its logic units. A program compiled for Itanium, for instance, can break itself down and distribute its functionality among the available SMT threads.

But as Intel discovered with HT as opposed to SMT, compartmentalizing single threads into two pigeon-holes didn't lead to performance improvements across the board. In fact, some testers in 2005 discovered that the faster an HT chip was clocked, the slower it performed certain HT benchmarks - evidence of a real bottleneck.

Could Sun's choice of architecture multiply that problem by four? The question is one of scalability: Specifically, can UltraSPARC T2's performance scale upward in rough proportion with its thread propagation? Initial performance test numbers revealed by Sun yesterday may leave us scratching our heads. First, Sun made sure we knew these were performance estimates, which may not mean they're based on direct observations. But it's reporting that the T2 scored a 78.3 (we'll assume Sun means "peak" performance and not "base") using the latest SPECint_rate2006 (integer tasks) benchmark, and a 62.3 on the SPECfp_rate2006 (floating-point tasks).

How good is that? In the latest performance rankings from the SPEC organization, an HP ProLiant DL360 G5 using 2.66 GHz Intel Xeon X5355 processors scored a peak observed score of 82.1 in SPECint_rate2006 and a 58.6 on SPECfp_rate2006. Of course, that's with two quad-core chips, not one octo-core. Still, that single chip was capable of running 64 threads, and yet was pretty much matched by a pair of chips that could run a total of 8.

There could still be a payoff, however, in the price department. Sun hasn't announced official prices for UltraSPARC T2, but promises a roster that's "starting well below $1,000." Which could mean processor power could be half as expensive as Xeon. It doesn't necessarily mean servers based on this chip will be half as expensive; and if they're not, then all of Sun's promises of an 8x8 multi-strand future for processors may lead it into a sadly familiar corner of the marketplace where SPARC has gone before.

Comments

View comments by with a score of at least

You missed the big feature of Sun's CMT technology; the T1 and T2 can supposedly do a single cycle thread switch so that a thread that is blocked waiting for a cache fill can be replaced by one that is ready to run. As to the performance, Specrate aren't likely to be the best candidates for the T2 since they aren't heavily threaded benchmarks. The best case benchmarks for T2 are going to be things like Java app server and threaded apache servers running https. The fact that the T2 can match a dual-quad Xeon on specrate is outstanding especially since the T2 will do it with 1/2 the power.

The big unknown is price. If Sun prices the T2 chips and the systems that use them in the same ballpark as quad-core Xeons they will sell like hotcakes. However, since this is Sun we are talking about, I doubt that they can resist the temptation to gouge their customers. If so, another outstanding engineering job will indeed be just an interesting footnote in computing history.

Score: 0

|

Small correction: it is called chip multi-threading (not chip multi-threaded). The key message around throughput computing that Sun has been advocating for a while is not really spelled out in this review. (To sum up in a few words: in a not so distant future the exponential growth of communicating devices will require architectures that can process lot of data, decrypt it, process it, encrypt it and send it again). This is where a processor like the T2 fits into that vision. (I am not trying to judge whether this vision is right or wrong). The T2 has two 10 GB Ethernet interface on the chip and 1 crypto accellerator/processor core. Comparing the SPECint benchmark of this chip to an HP Server is not fair given the typical workload of server machines. It is unknown at this point what kind of performance you would see if you had taken into account moving/encrypting/decrypting data as well (once Sun comes out with a T2 system, then you could those comparisons as well). Those running data centers are also interested in the electric consumption of these systems: would the HP server fare there as well as systems with the T2 chip? Time will tell.

Score: 0

|

If Sun is trying to fill a specific niche, then they should market it that way. Despite the different uses for servers, they seem to be lumped together as one generic commodity in the marketing crap (they might mention this or that advantage, but they make it damn hard to compare). For example if I'm looking for a single server to run an Oracle database on Win2003, I want TPM-C numbers for each server I'm considering and I'll pick the one with top performance. But if I'm building a huge datacenter, then watts/TPM-C would be the critical spec for me. I guess this is an old rant, but I'm in a ranting mood.

Score: 0

|

what's a TPM-C? For instance, if I were a laymen and I wanted to know what server to buy for my MySQL DBs, where could I find such numbers and how could I compare them

Score: 0

|

TPC-C (www.tpc.org) reported unit of measure is tpmC. Transactions Per Minute(tpm[Benchmark code])

Score: 0

|

Microsoft's Ray Ozzie: 'Nobody's going to be 100% open'

The mobile apps ecosystems of the world may converge over time, led by apps being ported over across platforms, according to the Chief Software Architect.

Will Firefox beat IE9 to Direct2D rendering?

Just days after Microsoft executives gave conference attendees a peek at a new rendering technology, a Mozilla contributor revealed he's working on the same thing.

Where there's smoke: Apple warranty stance raises troubling questions

Carmi Levy | Wide Angle Zoom: Smoking can be dangerous not only for your lungs, it appears, but for your Apple hardware warranty.

The fallacy of Facebook privacy

Carmi Levy | Wide Angle Zoom: If an insurance company learns something interesting about its client through the Internet, is that snooping?

Microsoft 'worked with Apple' for Silverlight on iPhone, says Goldfarb

By not making such a big deal out of trying to stream video to the iPhone, Microsoft got a big deal out of it, revealed the Silverlight product manager.

Clicker.com cuts through the Web video chaos

In a world where homemade video and Hollywood movies travel the same pipeline, it's good to have a real search engine to cut through the clutter.

A case study in improving software: What Office 2010 can learn from Notion 3

A music composition product gambles with a complete overhaul, in an effort to make headway against two well-known competitors in a tough market.

Kindle 2 update adds battery life, native PDF reader

Amazon has pushed out an update to the Kindle 2 e-reader that lengthens battery life and adds a native PDF viewer.

Safari on iPhone gets competition from a $1 browser app

Apple likes to say it gives iPhone users a full browsing experience, but a new competitor tries to incorporate more desktop browser features.

Action Replay maker sues Microsoft for Xbox 360 'predatory technological barriers'

Third-party video game accessory maker Datel has filed an antitrust lawsuit against Microsoft over the Xbox 360's recent Dashboard update.