Microsoft: Patch Tuesday Didn't Cause Thursday Skype Outage

By Scott M. Fulton, III | Published August 21, 2007, 12:32 PM

Yesterday afternoon, Microsoft's security engineers formally ruled out the possibility of its regular monthly "Patch Tuesday" update sequence as having triggered a worldwide outage of Skype VoIP communication service on Thursday, which lasted for about two days.

In a company blog post yesterday morning, Skype engineers disclosed that they suspected a wave of client-side reboots were triggered by Patch Tuesday at roughly the same time. The temporary reduction in P2P traffic capacity that followed, Villu Arak said, triggered a failure of the Skype VoIP network. This was after Arak had blamed an internal server software glitch, during the time of the outage.

An official explanation written for Microsoft's Security Response Center blog by Christopher Budd was phrased as though it was ruling out what didn't cause the outage, in order to reveal what might have done so. "First, we checked to see if there were any issues introduced by the security updates that could have caused the situation," Budd wrote, "and we found that there were no issues introduced by the security updates themselves."

That sounded a bit shifty, at least taken unto itself. But then Budd continued, Microsoft checked to see if there was anything unusual about the size of the patches, the duration of the reboots, or the timing of the distribution that could have led to Skype's problems. "We confirmed that there is nothing unusual in this month's release that could have contributed to this situation," he added.

Public response to Skype's altered explanation, coupled with Budd's comments about Microsoft's response, led Skype's Villu Arak this morning to adopt a more gracious tone. "Some reactions to the explanation," he wrote, "have reminded us of one of the basic tenets of communication: It's not what you say. It's what they hear."

It was a mea culpa on Skype's part, acknowledging that the Windows updates were merely the catalyst for a problem that could have conceivably been triggered by something else, if that something else were a massive, widely installed operating system that needed security patches every month. Arak's description of the cause this time around more closely approximated his original reports of last week: "a previously unseen fault in the P2P network resource allocation algorithm Skype used."

Arak acknowledged Skype's use of supernodes - algorithmically chosen clients that are promoted by Skype's servers, and charged with extra duties to marshal VoIP traffic. Normally, being able to promote and distribute supernodes on the fly helps Skype respond quickly to variations in service loads. During previous Patch Tuesdays, this system has been able to respond to and tune itself for load disruptions caused by reboots. Not this time, he said, since there had never before been such a high usage load during the time supernodes were rebooting.

So it wasn't the fact that Skype clients everywhere were rebooting - just the supernodes. By virtue of their having been selected in the first place, they may be high-performance systems, raising their likelihood that they'd be set for automatic patching during the week.

In a presentation for a Recon security conference last year, engineers Fabrice Desclaux and Kostya Kortchinsky presented data they had extrapolated from a professional reverse engineering of Skype traffic (PDF available here). There, they said they discovered one of the factors Skype servers use for determining the viability of a node for promotion to a supernode, besides a good connection and high bandwidth, is the absence of a firewall.

This lends credence to John Bambenek's theory, which we presented yesterday: that supernodes affected by system reboots may not be highly customized systems. Rather, they could be set to system defaults, which would make their behaviors when compared to one another more uniform, and more likely to cause a problem all at once.

A portion of a Skype e-mail to users delivered August 13.A portion of an e-mail to Skype users, which appeared in their inboxes on August 13. Clicking on the attached link begins a software upgrade process to version 3.5.0.214.

But one factor which Arak didn't mention is that on Monday, August 13, Skype users received e-mails notifying them of software upgrades to version 3.5.0.214. The e-mail promised improved sound quality and an updated front end. In BetaNews' tests, we saw the upgrade procedure trigger a separate process - not unlike what happens when upgrading Firefox - which polls Skype's servers separately to check for new add-ons. This is not a P2P operation at all, but a centralized function.

Certainly, this wasn't the first Skype upgrade to receive an e-mail blast to users, some of whom would undoubtedly have been supernodes, at least from time to time. Arak's explanation this morning did not mention the client software upgrade, though it did reference a "perfect storm" of events to which Patch Tuesday was but a contributor.

Comments

View comments by with a score of at least

Thanks for the follow up, Scott. Very informative.

Does anyone else find it strange that Microsoft tried to diagnose Skype's problem for them? Don't they have enough Vista problems to work on? =p

It was a mea culpa on Skype's part, acknowledging that the Windows updates were merely the catalyst for a problem that could have conceivably been triggered by something else, if that something else were a massive, widely installed operating system that needed security patches every month. Arak's description of the cause this time around more closely approximated his original reports of last week: "a previously unseen fault in the P2P network resource allocation algorithm Skype used."

Yeah. That's what I was trying to point out on yesterday's story.

Score: 0

|

I never got an e-mail from Skype and I have four accounts with four different e-mail addresses. (Pfft! They're still liars.)

Score: 0

|

Yeah, I never got the e-mail either. And I maintain 3 different accounts. Odd.

Score: 0

|

Even if computers rebooted at the same time... I have seen pc´s that reboot in 20 seconds thru 20 minutes. Each one of them has different quantity of Ram, different installed programs, different CPU, and so on. It is very difficult that pc's come to the outgoing petition time to skype network at the same time. Not to mention the 24 DIFFERENT timezones existing on earth.

Score: 0

|

Yeah, but they were all at least off at the same time, which was the largest contributor to the problem. If your supernode is still rebooting, you can't connect.

Score: 0

|

Wow. Even if patch Tuesday did cause the outage (which I doubt), that exposes a problem deep within the Skype model itself.

Perhaps if everyone in Mexico flushes their toilets at the same time, the Gulf will drain, and Hurricane Dean will lose its power.

Score: 0

|

Perhaps if everyone in Mexico flushes their toilets at the same time, the Gulf will drain, and Hurricane Dean will lose its power.

*laughs*

Toilets in Mexico....

Silly man.

Score: 0

|

Wow PC_Tool. That was harsh, even for you...

Score: 0

|

It was a joke. Relax, man. :)

Score: 0

|

3.5.0.214 came out on the 17th not the 13th.

Even if WU was only one trigger in the so-called perfect storm, it sounds like it was the major push since it's been the focus of their recent explanations. So I still don't see why the problem didn't start Wednesday.

I think before people will start to buy this explanation that they need to list some of the other trigger factors as well. What made it happen last week as opposed to any other week? Has their internal bug not been there all along?

They say "there had not been such a combination of high usage load during supernode rebooting." Really? Skype *always* has 6, 7, 8, 9 million people on it as far as I've ever seen. Usage is quite constant by all appearances.

Score: 0

|

Scott, I think your suggestions are damn right. Even more, I would like to mention that the topology of the network resembles FastTrack (no wonder Zennstrom and Friis created both networks), only the packets contain different sort of data. AFAIK, MPAA & RIAA many times tried to attack or block Kazaa supernodes but the network has been very resistant to any such attempts. It proves that something had to happen from within rather than from outside of the network. In that sense I fully agree with PC_Tool. It seems that Skype should predominantly or solely blame itself.

Score: 0

|

*laughs*

So, to paraphrase...

"Sorry, guys. We put all of our eggs in a basket we had no understanding or control over, and amazingly, got burned. Sorry we tried to blame someone else for the fact that our entire operation hinges on a network of user systems we have no control over, no backup for, and no way to provide reliable or managed service without."

/me being thankful I haven't jumped on the VoIP bandwagon quite yet.

Score: 0

|

Actually PC_Tool, it's okay to jump on. (Really, the water's fine.) Although Skype went down for me, I never missed receiving or placing a call, not one.

You see, I did what Skype didn't do--built redundancy into my own systems. :^)

Score: 0

|

Skype has always hailed their P2P technology as their main advantage over the other guys. It allows them to offer a great service with amazing reliability that doesn't strain their servers to the breaking point (this recent snafu notwithstanding).

They've never made a secret of how their systems work and how they rely on P2P. They do have control over the network distribution, but there was a bug in their software that made this one case balloon into a major outage.

I don't think the paraphrase is a fair statement.

Score: 0

|

Microsoft launches Office 2010 technical beta a few days early

A big week for Microsoft starts off with an out-of-sync surprise: the early release of the Office Technical Beta ahead of the launch keynote.

PDC 2009 Day 0: Vista is through

If there was any doubt in your mind that Microsoft is putting Vista behind it, the first session at PDC would eliminate it for good.

Windows Marketplace for Mobile launches on WinMo 6.0 and 6.1

No longer isolated to Windows Mobile 6.5, the Windows Phone app store has opened up to older versions of Windows Mobile.

Samsung releases another Android: where will it fit in with Bada approaching?

Samsung today announced the Galaxy Spica, sequel to its first Android handset destined for Europe and Asia.

Twitter to abandon 'politically biased' suggested user list

Twitter's suggested list of users to follow will be going away, says co-founder Biz Stone.

The Internet can still be a positive force, World Wide Web Foundation says

Sir Tim Berners-Lee's World Wide Web Foundation has launched worldwide operations.

Blockbuster's way down, but poised for a comeback

Though it took a serious beating in 2009, Blockbuster CEO Jim Keyes says the company can turn it around.

iTunes Preview doesn't go far enough to create Web-based option for store

Apple has rolled out iTunes Preview, a Web interface for browsing iTunes.

PDC 2009 Preview: The move to Office 2010 and Visual Studio 2010

The major focus of Microsoft's conference next week will likely be explaining why two pillars of its software sales strategy deserve to remain where they are.

Dell's first smartphone aids the Android onslaught

Longtime PC leader Dell has finally announced its Android-based smarphone.

After the Intel + AMD armistice: Do we really want a level playing field?

Scott Fulton On Point: One by one, the reasons for us to continue suspending the course toward open and fair competition in IT, are dropping like flies.