AMD: Will More CPU Cores Always Mean Better Performance?

By Scott M. Fulton, III | Published August 14, 2007, 4:57 PM

The company that helped inaugurate the multicore era of CPUs has begun studying the question, will more cores always yield better processing? Or is there a point where the law of diminishing returns takes over? A new tool for developers to take advantage of available resources could help find the answers, and perhaps make 16 cores truly feel more powerful than eight cores.

Two years ago, at the onset of the multicore era, testers examining how simple tasks took advantage of the first CPUs with two on-board logic cores discovered less of a performance boost than they might have expected. For the earliest tests, some were shocked to discover a few tasks actually slowed down under a dual-core scheme.

While some were quick to blame CPU architects, it turned out the problem was the way software was designed: If a task can't be broken down, two or four or 64 cores won't be able to make sense of it, and you'll only see performance benefits if you try to do other things at the same time.

So when AMD a few months back debuted the marketing term "mega-tasking," defining it to refer to the performance benefits you can only really see when you're doing a lot of tasks at once, some of us got skeptical. Maybe the focus of architectural development would be diverted for awhile to stacking tasks atop one another, rather than streamlining the scheme by which processes are broken down and executed within a logic core.

Today, AMD gave us some substantive reassurance with the announcement of what's being called lightweight profiling (LWP). The idea is to give programmers new tools with which to aid a CPU (specifically, AMD's own) in how best their programs can utilize their growing stash of resources. In a typical x86 environment, CPUs often have to make their own "best guesses" about how tasks can be split up among multiple cores.

Low-level language programmers do have compiler tools available to them that can help CPUs make better decisions, but they don't often choose to use them. So as CPU testers often discover, many tasks that ran on one core before, continue to run on one core today.

Earl Stahl, Vice President for Software Engineering, AMD"We think the hardware needs to work together with the software, to enable new ways to achieve parallelism," remarked Earl Stahl, AMD's vice president for software engineering, in an interview with BetaNews. "Lightweight Profiling is what we're going to be releasing as a first step in this direction."

As Stahl explained to us, software that truly is designed to take advantage of multiple cores will set up resources intentionally for that purpose: for example, shared memory pools, which a single-threaded process probably wouldn't need. But how much shared memory should be established? If this were an explicit multithreading environment like Intel's Itanium, developers would be making educated guesses such as this one in advance, on behalf of the CPU.

So LWP tries to enable the best of both worlds, implicit and explicit parallelism: It sets up the parameters for developers to create profiles for their software. AMD CPUs can then use those profiles on the fly to best determine, based on a CPU's current capabilities and workload, how threads may be scheduled, memory may be pooled, and cache memory may be allocated.

"Lightweight Profiling is a new hardware mechanism that will allow certain kinds of software within real time to dynamically look at performance data provided by the CPU as it's executing," Stahl told us, "and then can take action on that performance data to better optimize its own processing."

AMD believes about 80% of the potential usefulness of LWP will be realized by just two software components: Sun's Java Virtual Machine, and Microsoft's .NET Runtime module. While operating system drivers will not be necessary for operating systems to take advantage of LWP, it's AMD's hope that developers who are using high-level, just-in-time-compiled languages anyway will be able to automatically benefit from LWP, at least for the most part.

Earl Stahl shared with us an example: It involves two data objects being instantiated by the Java VM, which would normally be located in the most convenient spots available at the time. But depending upon how a multicore environment is currently being utilized, those locations that might seem convenient to Java may actually cause performance pressure points that wouldn't have been realized if Java had the system all to itself.

An example of memory allocation by the Java Virtual Machine, where AMD's LWP relocates a data object for better performance.  (Courtesy AMD)

"As code is executing within a Java virtual machine, at some point a couple of objects [A and B] may get allocated by the memory management system within the JVM," Stahl explained. "As those objects are being accessed, periodically a JVM can use Lightweight Profiling to identify the potential hot spots, which may actually point to objects A and B actually causing a significant number of misses in the cache - which turns out to be a big performance bottleneck if that occurs. By being able to identify that dynamically, the garbage collector or heap manager can actually move one of those objects then, in response to that information, and improve the performance in a dynamic way without the higher-level applications having to know about that.

"There are significant gains to be had," he continued, "when even in this simple example where objects are conflicting at the cache level, you lose the benefit of the cache and start having to go back to the much slower memory access on each write or read from that object - it can have a significant impact on software."

Historically, when developers have been given tools that enable them to either conserve what resources there may yet be available or "go for the gusto," they've chosen the latter. This has worked against systems that try to budget or schedule times or allocation blocks appropriately, as applications all tended to shout that they deserved the highest priority.

So the question arises with respect to LWP, will this actually enable processes to be conservative, especially in the eight-core era and beyond? As we've already seen, when gigabytes of memory and data storage become cheaply available, the tendency is for applications to consume them. What's to prevent developers from turning up the volume knobs, if you will, on LWP profiles to the highest setting?

For now, AMD's Earl Stahl isn't really sure. "In general, I think your question is a good one, which is really more a question about, how will runtime and applications utilize this?" he responded. "Will it be to make a single thread run faster or to make multiple threads of execution run faster, or perhaps have a more efficient set of those? I think that will be actually driven by the software itself. It's more domain-specific, because [LWP] is not limited to either of those. It could help to achieve either one, and we're not pre-supposing which of those it would most benefit."

Later, he suggested we may just have to implement LWP first, in order for us to find the answer for ourselves. "LWP as a hardware extension - looking ahead at this new era we envision where there's some new techniques in software that have to emerge for achieving parallelism - is within those new software techniques [where] there'll be either implicit or explicit abilities to make some of those decisions," he said.

"Part of what limits things today in the world of many cores is that it is explicitly on the shoulders of the software developer to have to figure out ways to exploit those, and it becomes a level of complexity that is really just not worth the investment. So you're describing and discussing a topic that says, where do the new software techniques take us? How does the developer ensure that they can exploit appropriately whatever the available hardware is? And there are new techniques in the software ecosystem that are going to emerge, to help achieve that."

Comments

View comments by with a score of at least

I think it's sort of backward to WANT applications to use multiple cores. Shouldnt the point of multiple cores be MULTITASKING? ....So that you can run a game on one core, while other applications are in the background and dont steal CPU cycles from the game.

Score: 0

|

Yeah, but if this game runs on 3-4 CPUs, it will have much more FPS. While other processes could steal some resources from each part of the game on each CPU, overral performance would be better than if game ran on single CPU exclusively.
For example, encoding video with x264 gains something like 80-90% fps boost due to multithreading on dual-core system.
People like to have something to run as fast as possible, without really caring about other processes in system, and multithreading (or, in case of AMD, code separation) should really make a process to run faster.

Score: 0

|

Like others it appears AMD is trying to twist what many programmers already know. Unless "we" as in programmers take advantage of multiple cores, the user won't see the kind performance increase they might get otherwise. I don't think we have reached the number of cores where the return isn't worth it at this point. I think that will happen when were talking about 64+ cores.

Of course it might all be mute if they can teach Tera-Hetz speeds sometime in the near future. Which might not be that far off if we can figure a way to make a device stable enough to provide those kinds of speeds.

To be honest I think at those speeds and the the future of solid-state memory storage we might not need multiple cores.

Think of the following:
1 Tera-Hertz Processer with TB of cache.

You would only need an external storage device for your data. Once your system was fully running you could run anything within your memory and cache.

In an industry that were selling 2MB memory sticks for $200 in 1994 and 10 years later were selling 2GB memory sticks I think we can see similar results in 2014.

I will not even get into the fact we went from 300mhz to 3.4ghz processors. Plus the amount of changes to the processor itself ( faster, less power, ect ) the next couple years will be interesting.

Sim City 2000 $225 ( 2MB of memory + games ) the joys of 1994

Score: 0

|

google Amdahl's law. betanews doesn't let me link the wikipedia article.

Score: 0

|

It could give AMD the edge again if it turns out more cores does not necessarily mean better performance.

They redirect their efforts to doing something different then figuring how to add more cores to a single die and come up with something different, either way as long as AMD is still in business the consumer does not suffer.

Score: 0

|

Wait... so first AMD was breaking their necks to catch up with Intel in the quad-core race, and now they're down-playing the entire concept?

Score: 0

|

I can see the point right now that more cores does not always = more performance (although 2 cores on XP made a huge difference to my percieved user experience) I think the magic number for extreme performance is 4 cores right now on a desktop (some one who does amature video and some photo shop ?) The software isnt there yet to realy take advantage of multi core so untill the software is written I dont think anyone will know the limits of multi core design :)

And as always some tasks fit better then others for multi threadedness, Games for instance most of them still use only single cores. I guess that too will change one day.

I for one welcome our multicore mega tasking overlords ! lol (Slashdot reference)

PS. this could also be either the reason AMD hasnt released a 4 Core CPU yet or an excuse as to why they havent, depending on how you look at it lol

Score: 0

|

You're absolutely right. I think the tasks that benefit the most right now are batch processing jobs such as audio/video conversion/processing.

PS. this could also be either the reason AMD hasnt released a 4 Core CPU yet or an excuse as to why they havent, depending on how you look at it lol

Yeah, that's what I was poking fun at with my comment. My guess is it's PR to cover for the latter.

Score: 0

|

No offense, but windows runs just fine on multiprocessor machines. There are a lot of IT professionals that use software on way more than 4 cores, and a lot of programmers too. We're well aware of how to write multithreaded applications quite well. AMD is just trying PR spin on their dismal execution of bringing multicore to the desktop so all the mom and pops can finally see what some of us have for the past 8+ years.

Score: 0

|

I don't think anyone said Windows didn't run well on multi-processor machines. I've seen Windows server running on an eight-way at my job. It definitely handles it fine. The point was that most software doesn't really take advantage of all the extra cores right now (4+).

Score: 0

|

Hmmm I wonder what happened with that 4X4 thing hehehehe... Sorry too the AMD supporters but that idea was almost as bad as Netburst (well almost... lol)
I dont know why AMD hasnt made some sort of 4 core super chip ? I would think that ultra cool Hyper Transport would be perfect to stuff 4-8 cores on... That would realy stick it to Intel when the multi threaded software hits ! (I tend to cheer for both sides of a fight hmmm "Rip her s*** off !!!" LOL

When the software rolls out thats when that bus will kick in, if AMD can hang on untill then things should be better for them. One dark spot in the silver lining is if Intel decides to do the same thing, they will probably do it better on a smaller node...

"Yeah, that's what I was poking fun at with my comment. My guess is it's PR to cover for the latter."

I think so too. If AMD could pop out a 4 core chip right now, they would. I actualy think they could do it right now but the production cost for them might be too high ? Im not sure as I havent been paying too much attention but I think Intel started out 4 cores on a 65nm chip but either already has or is moving to 45nm.

Edit:I feel the reason Intel could get away with 4 cores at 65nm is that whole on die memory controller deal.... It has to take up some space on the die after all. AMD might be having trouble fitting all that goodness on the amount of space they have to work with ? Hmmm that and maybe getting 4 cores to run fast enough ? they all have to match speed wise I would think.

Score: 0

|

the OS runs fine with the multicore, its the applications that are the issue. on systems with 4+ processors you're usually running server level software like databases, webservers, or other programs that require many concurrent connections and these are written for multi procs usually.

desktop systems dont usually have multiprocs so you get software like MS WORD or MS paint and its gonna run on one core only usually. other desktop software and games are single core coded still. there are only a few games that were coded/patched for multicores.

AMDs point was being able to take code that was single core coded and (in hardware) make it run on multicores. kind of reminds me how the Cell compiler works. just scans the code and sees what operations it could run at the same time and coded it for multicore then.

Score: 0

|

Apparently you havent been reading your tech news. the 4X4 is using HT to talk between the procs. as for their TRUE 4 core proc (unlike Intels fake dual-dual core, i mean quad core), the Phenom chip is being released end of this year.

Score: 0

|

after this, i think i have a new excuse to get a new computer later =)

Score: 0

|

Any excuse is a good one !!!! :) hehehehehe

Always love having a fresh computer to break in and optimize...

Score: 0

|

Latest Firefox 3.6 beta fixes 133 bugs, promises faster page load times

A once-sluggish beta testing process has kicked into overdrive, with astonishing success at finding serious bugs. Will Mozilla be able to fix all the others in time?

Apple invokes DMCA, claims Psystar is 'trafficking in circumvention devices'

In trying to close the book on possibly the last attempt at a Mac clone, Apple cites from its own landmark case...but may actually be misinterpreting it.

The fallacy of Facebook privacy

Carmi Levy | Wide Angle Zoom: If an insurance company learns something interesting about its client through the Internet, is that snooping?

Microsoft 'worked with Apple' for Silverlight on iPhone, says Goldfarb

By not making such a big deal out of trying to stream video to the iPhone, Microsoft got a big deal out of it, revealed the Silverlight product manager.

Confirmed: Office 2010 to ship in June

Two weeks after Microsoft had been expected to draw a clearer roadmap for its principal applications suite, it's finally ready to commit to the end of H1.

New EU antitrust commissioner will oversee Microsoft, Oracle+Sun, Intel issues

As one of Europe's most prominent politicians shifts positions in January, her replacement remains a question mark over technology's biggest issues.

Without its own 'iTablet' yet, is Apple missing the boat?

Steve Jobs is on record as dissing "single-purpose" devices like e-readers. But given their recent popularity, was that a mistake?

Not-so-mobile battery life: Time to force the issue

Carmi Levy | Wide Angle Zoom: If power efficiency is important when you buy a car or even a motorcycle, why shouldn't it matter for a smartphone?

Clicker.com cuts through the Web video chaos

In a world where homemade video and Hollywood movies travel the same pipeline, it's good to have a real search engine to cut through the clutter.

Microsoft's Ray Ozzie: 'Nobody's going to be 100% open'

The mobile apps ecosystems of the world may converge over time, led by apps being ported over across platforms, according to the Chief Software Architect.

A case study in improving software: What Office 2010 can learn from Notion 3

A music composition product gambles with a complete overhaul, in an effort to make headway against two well-known competitors in a tough market.