PDC 2009: Scuttling huge chunks of Vista architecture for a faster Windows 7

By Scott M. Fulton, III | Published November 17, 2009, 2:45 AM

PDC 2009 story bannerThe reason Windows Vista seemed slow, and somehow, strangely seemed even slower over time, is now abundantly clear to Microsoft's architects: The evolution of computer hardware, particularly the CPU, exceeded anyone's expectations at the time of Vista's premiere in early 2007. But the surge in virtualization, coupled with the rise of the multicore era, produced a new reality where suddenly Vista found itself managing systems with more than 64 total cores.

Architects had simply not anticipated that the operating system would be managing this many cores, this soon -- at least, that appears to be the underlying message we're receiving here at PDC 2009 in Los Angeles. As independent scientists were speculating about possible performance drop-offs after 8 cores, server administrators were already seeing it. There were design tradeoffs for Windows Vista -- tradeoffs in efficiencies that could have been obtained through complex methods, for simplicity.

Those tradeoffs were fair enough for the dual-core era, but that only lasted a short while. Quad-core processors are quickly becoming commonplace, even in laptops. So with Vista's architecture, users could actually feel the lack of scalability. In fact, they were making investments in quad-core systems earlier in Vista's lifecycle than originally anticipated, and they when they didn't see four cores as right around double the performance of two cores...and later when they saw Vista's lag times slow down their computers over time, some critical elements of Vista's architecture became not an advantage but a burden.

Microsoft performance expert Mark Russinovich is one of the more popular presenters every year at PDC, mainly because he demonstrates from the very beginning of his talks that he absolutely understands what they're going through. It's difficult for a performance expert to put a good face on Vista...and Russinovich, to his credit, didn't even try.

After having quizzed the audience as to how many used Windows 7 on a daily basis (virtually all of the crowd of about 400 people), Russinovich quizzed them, "How many people are sticking with Windows Vista because that's so awesome?" He pretended to wait for an answer, and just before everyone's hands had descended, he answered his own question: "Yea, that's what I thought.

"One of the things we had decided to do with Windows 7 was, we got a message loud and clear, especially with the trend of netbooks, on top of [other] things," he went on. "People wanted small, efficient, fast, battery-efficient operating systems. So we made a tremendous effort from the start to the finish, from the design to the implementation, measurements, tuning, all the way through the process to make sure that Windows 7 was fast and nimble, even though it provided more features. So this is actually the first release of Windows that has a smaller memory footprint than a previous release of Windows, and that's despite adding all [these] features."

To overcome the Vista burden, Windows 7 had to present scalability that everyday users could see and appreciate.

As kernel engineer Arun Kishan explained, "When we initially decided to be able to support 256 logical processors, we set the scalability goal to be about 1.3 - 1.4x, up at the high end. And our preliminary TPCC number was about 1.4x scalability on 128 LPs [logical processors], when compared to a 64 LP system. So that's not bad; but when we dug into that, we saw that about 15% of the CPU time was spent waiting for a contended kernel spinlock." What Kishan means by that term is, while one thread is executing a portion of the kernel, other threads have to wait their turn. About the only way they can do that and remain non-idle is by spinning their wheels, quite literally -- a kind of "running in place" called spinlock.

"If you think about it, 15% of the time on a 128-processor system is, more than 15 of these CPUs are pretty much full-time just waiting to acquire contended locks. So we're not getting the most out of this hardware."

The part of the older Windows kernel that had responsibility for managing scheduling was the dispatcher, and it was protected by a global lock. "The dispatcher database lock originally protected the integrity of all the scheduler-related data structures," said Kishan. "This includes things like thread priorities, ready queues, any object that you might be able to wait on, like an event, semaphore, mutex, I/O completion port timers, asynchronous procedure calls -- all of it was protected by the scheduler, which protected everything by the dispatcher lock.

Microsoft Windows core engineer Arun Kishan speaks to a Windows 7 workshop during Day 0 of PDC 2009."Over time, we moved some paths out of the dispatcher lock by introducing additional locks, such as thread locks, timer table locks, processor control block locks, etc.," Kishan continued. "But still, the key thing that the dispatcher lock was used for was to synchronize thread state transitions. So if a thread's running, and it waits on a set of objects and goes into a wait state, that transition was synchronized by the dispatcher lock. The reason that needed a global lock was because the OS provides pretty rich semantics on what applications can do, and an application can wait on a single object, it can wait on a single object with a timeout, it can wait on multiple objects and say, 'I just want to wait on any of these,' or it can say, 'I just want to wait on all of these. It can mix and match types of objects that it's using in any given wait call. So in order to provide this kind of flexibility, the back end had to employ this global dispatcher lock to manage the complexity. But the downside of that, of course, was that it ended up being the most contended lock in most of our workloads, by an order of magnitude or more as you went to these high-end systems."

In the new kernel for Win7 and Windows Server 2008 R2, the dispatcher lock is completely gone -- a critical element of Windows architecture up until Vista, absolutely erased. Its replacement is something called fine-grained locking, with eleven types of locks for the new scheduler -- for threads, processors, timers, objects -- and rules for how locks may be obtained to avoid what engineers still call, and rightly so, deadlock. Synchronization at a global level is no longer observed, Kishan explained, so many operations are now lock-free. In its place is a kind of parallel wait path made possible by transactional semantics -- a complex way for threads, and the LPs that execute them, to be negotiated symbolically.

But the threads themselves won't really "know" about the change. "Everything works exactly as it did before," Kishan said, and this is a totally under-the-covers transparent change to applications, except for the fact that things scale better now."

Next: Speeding processes up by putting processors to sleep...

1 | 2 | Next Page →

Add a Comment

You must be logged in to post comments.

View comments by with a score of at least

Is this article true??? So, what Microsoft Engineers are saying is that all other Windows OS before 7 where that crapy that had a GLOBAL LOCK to control virtual memory access, that they did not know that putting a processor to sleep is better than a do_nothing_loop(), or that a better "tradeoff" is to make a complex design than a less efficient???? That working with 2 or 2000 processors is just the same (on programming world)???
Linux, MAC, or any other OS does those things since they were designed (and that´s a lot more than a decade!!!).
After this article, I´ve decided I will never use Windows again :-)

Score: -2

|

If you think Linux or OS X (or worse, MacOS, as you indicate "a decade") have a better SMP scaling story than NT (and now 7), you are very mistaken.

Score: 2

|

Again, none of this (except possibly for the timer coalescing and power management) has anything to do with why Vista was poorly received or why 7 is being better received. Everything you said about Vista applies to XP and previous just as well, and the rap on Vista had to do with its performance on lower-cost and smaller form-factor hardware, and there it was more about RAM than CPU. People complaining about Vista being slow weren't all running 64-core machines.

Your title and lead-in are misleading, and embody the unfortunate journalistic habit of trying to squeeze everything into a preexisting narrative.

Score: 0

|

Exactly--and very well-put. The title and conclusion (written at the beginning) miss the point--these improvements aren't noticed until you get into hardware that most people just don't have, even today. These are big iron limitations... consumers certainly haven't run into those limits in Vista yet. That day is coming, but to characterize this as "scuttling huge chunks of Vista" is just plain wrong on many levels. These are very important changes, but they hardly constitute "huge chunks" of Vista's code or architecture. And they certainly have nothing to do with the allegation that Vista slows down over time.

Score: 1

|

Wow, this was worth reading. This is one of the best in-depth articles on Windows 7 that I read so far.

Score: 6

|

i'm one of those rare people (i guess), that actually had no probs with Vista, infact, Vista performance got much better after each service pack!

My video card crapped out,but once i replace it I will be using vista in my office.

I have mac/Windows 7 in my living room TV... but I use XP instead of windows 7, since windows 7 runs horrible on Vmware.

win7 is cute., certainly not enough reason to actually run out and upgrade. Stick with XP/Vista if it works for you. I see no reason to upgrade at all until sp1!

flame on...

Score: -2

|

There is a good video on Channel9 from Mark on this subject from earlier this year (late last year?) that talks about the changes in some specific and generic terms that make would help in this making more sense to the average user/developer.

One example Mark gave was how SQL server on previous Windows versions had to self schedule operations aside from what the kernel was doing because of the granularity and lock speed SQL server needed. (This is why dedicated SQL Servers were significantly more efficient, as SQL was running in a special mode.)

The end result is not only is Win7's kernel significantly smarter and more forgiving, it now allows for things like SQL to run as a normal application and let the OS fully handle the scheduling/threads/locking.

So this is pretty impressive that the lock and scheduling efficiency has reached this level of performance, and something as specific as SQL Server no longer has to deal with this itself or run in a special mode on the Win7 kernel, and still have the same level of performance.

The advances to better multi-core support is related and part of this as well, freeing up cores and threads and thus reducing the overhead of managing a 'large' amount of cores in large servers.

The overhead of multi/CPU/Core processing starts to hit ceilings fairly early on where performance of the additional cores are offset by the kernel's ability to manage and schedule the threads on the cores.

And this is important that Win7 is doing 1.4 at 128. A good contrast would be previous NT versions or other kernel technologies; for example in Linux you can easily define the kernel to use 256 cores/CPUs, but at about 16 CPUs/Cores you almost fully negate all the multi-core/multi-CPU advantages as the kernel overhead starts to choke the performance of the system and it ends up dealing with handing out threads to CPUs more than it deals with general processing.

(This is why most multi-CPU server technologies are cluster farms because one machine running 64 or 128 or 256 CPUs with a single kernel has always been horrible in terms of performance. - There are also virtual and multi-kernel technologies that run multiple OSes on a machine to break drop in multi-CPU performance scaling.)

The Win7 changes should set the stage not only for better mult-core/CPU performance for personal systems, but will greatly reduce the complexity of implementing large scale multi-processor Servers.

In theory now moving to the performance of a 128 or 256 core system should be as easy as getting the supporting hardware and using a single version of Windows Server on the system. So instead of 10-20 copies of Windows Server or Linux running servers to meet this level of performance, you could have ONE SERVER with Windows Server, which in terms of mangaement is crazy simple and makes low end supercomputing available to even an average geek with the coins for the hardware.

And the side effect is a very efficient and highly granular kernel for desktop users with i7 and other multi-core CPU technologies as users are already feeling with Windows 7. (The extra 'intelligence' of how Win7 handles virtual or 'HT' cores also chips in for a faster and smoother experience, that even Atom based Netbook users can feel the performance difference, as the common Atom is a HT enabled CPU.)

Go find Mark's video on this on Channel 9 and any paper he has written about it in the last year. I am spitting what I write here off the top of my head, so I don't guarantee 'technical' accuracy, just a 'bigger' picture of how important this is and how it will affect the next few years in redefining both desktop and server SMP performance.

Score: 2

|

PDC 2009: What have we learned this week?

There was the freebie that no one will forget, the heebie-jeebies courtesy of Scott Guthrie, and a teensy bit clearer picture of how this cloud thingie should work.

Live report: Will Google Chrome OS change Linux?

The mysteries of just what Chrome OS is, and how much of an operating system it truly is, may be resolved today.

PDC 2009: Microsoft cares about Web browser performance

The effort to give users of the world's dominant Web browser the impression of quality, is a personal one for the man who leads that battle.

Nokia re-affirms its commitment to Symbian, sort of

Maemo won't necessarily be replacing Symbian in the Nokia N-Series, but that's definitely a place where it will be found.

E-book readers will be in short supply this holiday season

E-readers are hot this year, and a lot of compelling new products have been released, but are there enough electrophoretic displays to go around?

Sony looks to finally open a single storefront for downloads

Sony has had many different download portals for movies, music, e-books, and games, and now it's looking to make a single shop for all of it.

Tuning out the tablet: Time to give the endless speculation a rest

Wide Angle Zoom: Wishing and hoping and thinking and praying....won't put an iTablet on the market.

Five improvements for IT managers in 2010

If businesses are to improve their efficiency for next year, they need to stop and reassess the basic tenets of their job.

AOL's spinoff from Time Warner to shed 2,500 jobs

As AOL moves toward become an independent company again, it will cut nearly a third of its workforce.

Gartner: SMS-based money transfer will be bigger than mobile browsing, search

Gartner issues its predictions for the 10 things our phones will be doing in 2012.

Don't forget to upgrade to Firefox 3.6 beta 3 today

Mozilla has released the latest beta its Firefox 3.6 browser software, just over one week after beta 2.