Login:
Password:

Microsoft: Patch Tuesday Didn't Cause Thursday Skype Outage

By Scott M. Fulton, III, BetaNews

August 21, 2007, 12:32 PM

Yesterday afternoon, Microsoft's security engineers formally ruled out the possibility of its regular monthly "Patch Tuesday" update sequence as having triggered a worldwide outage of Skype VoIP communication service on Thursday, which lasted for about two days.

In a company blog post yesterday morning, Skype engineers disclosed that they suspected a wave of client-side reboots were triggered by Patch Tuesday at roughly the same time. The temporary reduction in P2P traffic capacity that followed, Villu Arak said, triggered a failure of the Skype VoIP network. This was after Arak had blamed an internal server software glitch, during the time of the outage.

An official explanation written for Microsoft's Security Response Center blog by Christopher Budd was phrased as though it was ruling out what didn't cause the outage, in order to reveal what might have done so. "First, we checked to see if there were any issues introduced by the security updates that could have caused the situation," Budd wrote, "and we found that there were no issues introduced by the security updates themselves."

That sounded a bit shifty, at least taken unto itself. But then Budd continued, Microsoft checked to see if there was anything unusual about the size of the patches, the duration of the reboots, or the timing of the distribution that could have led to Skype's problems. "We confirmed that there is nothing unusual in this month's release that could have contributed to this situation," he added.

Public response to Skype's altered explanation, coupled with Budd's comments about Microsoft's response, led Skype's Villu Arak this morning to adopt a more gracious tone. "Some reactions to the explanation," he wrote, "have reminded us of one of the basic tenets of communication: It's not what you say. It's what they hear."

It was a mea culpa on Skype's part, acknowledging that the Windows updates were merely the catalyst for a problem that could have conceivably been triggered by something else, if that something else were a massive, widely installed operating system that needed security patches every month. Arak's description of the cause this time around more closely approximated his original reports of last week: "a previously unseen fault in the P2P network resource allocation algorithm Skype used."

Arak acknowledged Skype's use of supernodes - algorithmically chosen clients that are promoted by Skype's servers, and charged with extra duties to marshal VoIP traffic. Normally, being able to promote and distribute supernodes on the fly helps Skype respond quickly to variations in service loads. During previous Patch Tuesdays, this system has been able to respond to and tune itself for load disruptions caused by reboots. Not this time, he said, since there had never before been such a high usage load during the time supernodes were rebooting.

So it wasn't the fact that Skype clients everywhere were rebooting - just the supernodes. By virtue of their having been selected in the first place, they may be high-performance systems, raising their likelihood that they'd be set for automatic patching during the week.

In a presentation for a Recon security conference last year, engineers Fabrice Desclaux and Kostya Kortchinsky presented data they had extrapolated from a professional reverse engineering of Skype traffic (PDF available here). There, they said they discovered one of the factors Skype servers use for determining the viability of a node for promotion to a supernode, besides a good connection and high bandwidth, is the absence of a firewall.

This lends credence to John Bambenek's theory, which we presented yesterday: that supernodes affected by system reboots may not be highly customized systems. Rather, they could be set to system defaults, which would make their behaviors when compared to one another more uniform, and more likely to cause a problem all at once.


A portion of a Skype e-mail to users delivered August 13.A portion of an e-mail to Skype users, which appeared in their inboxes on August 13. Clicking on the attached link begins a software upgrade process to version 3.5.0.214.


But one factor which Arak didn't mention is that on Monday, August 13, Skype users received e-mails notifying them of software upgrades to version 3.5.0.214. The e-mail promised improved sound quality and an updated front end. In BetaNews' tests, we saw the upgrade procedure trigger a separate process - not unlike what happens when upgrading Firefox - which polls Skype's servers separately to check for new add-ons. This is not a P2P operation at all, but a centralized function.

Certainly, this wasn't the first Skype upgrade to receive an e-mail blast to users, some of whom would undoubtedly have been supernodes, at least from time to time. Arak's explanation this morning did not mention the client software upgrade, though it did reference a "perfect storm" of events to which Patch Tuesday was but a contributor.

Add a Comment (14 Comments)

BetaNews reserves the right to remove any comment at any time for any reason. Please keep your responses appropriate and on topic. Foul language and personal attacks will not be tolerated.

Name (required):

E-mail (required):

Enter Your Comment:

By wincement

posted Aug 21, 2007 - 9:24 PM

Thanks for the follow up, Scott. Very informative.

Does anyone else find it strange that Microsoft tried to diagnose Skype's problem for them? Don't they have enough Vista problems to work on? =p

It was a mea culpa on Skype's part, acknowledging that the Windows updates were merely the catalyst for a problem that could have conceivably been triggered by something else, if that something else were a massive, widely installed operating system that needed security patches every month. Arak's description of the cause this time around more closely approximated his original reports of last week: "a previously unseen fault in the P2P network resource allocation algorithm Skype used."

Yeah. That's what I was trying to point out on yesterday's story.

Score: 0

By ZenWarrior

posted Aug 21, 2007 - 9:21 PM

I never got an e-mail from Skype and I have four accounts with four different e-mail addresses. (Pfft! They're still liars.)

Score: 0

By wincement

edited Aug 21, 2007 - 9:25 PM

Yeah, I never got the e-mail either. And I maintain 3 different accounts. Odd.

Score: 0

By arq_carlos1

posted Aug 21, 2007 - 4:13 PM

Even if computers rebooted at the same time... I have seen pc´s that reboot in 20 seconds thru 20 minutes. Each one of them has different quantity of Ram, different installed programs, different CPU, and so on. It is very difficult that pc's come to the outgoing petition time to skype network at the same time. Not to mention the 24 DIFFERENT timezones existing on earth.

Score: 0

By wincement

posted Aug 21, 2007 - 9:26 PM

Yeah, but they were all at least off at the same time, which was the largest contributor to the problem. If your supernode is still rebooting, you can't connect.

Score: 0

By frankwick

posted Aug 21, 2007 - 3:38 PM

Wow. Even if patch Tuesday did cause the outage (which I doubt), that exposes a problem deep within the Skype model itself.

Perhaps if everyone in Mexico flushes their toilets at the same time, the Gulf will drain, and Hurricane Dean will lose its power.

Score: 0

By PC_Tool

posted Aug 21, 2007 - 4:00 PM

Perhaps if everyone in Mexico flushes their toilets at the same time, the Gulf will drain, and Hurricane Dean will lose its power.

*laughs*

Toilets in Mexico....

Silly man.

Score: 0

By wincement

posted Aug 21, 2007 - 9:32 PM

Wow PC_Tool. That was harsh, even for you...

Score: 0

By PC_Tool

posted Aug 22, 2007 - 9:06 AM

It was a joke. Relax, man. :)

Score: 0

By rseiler

posted Aug 21, 2007 - 3:09 PM

3.5.0.214 came out on the 17th not the 13th.

Even if WU was only one trigger in the so-called perfect storm, it sounds like it was the major push since it's been the focus of their recent explanations. So I still don't see why the problem didn't start Wednesday.

I think before people will start to buy this explanation that they need to list some of the other trigger factors as well. What made it happen last week as opposed to any other week? Has their internal bug not been there all along?

They say "there had not been such a combination of high usage load during supernode rebooting." Really? Skype *always* has 6, 7, 8, 9 million people on it as far as I've ever seen. Usage is quite constant by all appearances.

Score: 0

By yokozuna

posted Aug 21, 2007 - 2:41 PM

Scott, I think your suggestions are damn right. Even more, I would like to mention that the topology of the network resembles FastTrack (no wonder Zennstrom and Friis created both networks), only the packets contain different sort of data. AFAIK, MPAA & RIAA many times tried to attack or block Kazaa supernodes but the network has been very resistant to any such attempts. It proves that something had to happen from within rather than from outside of the network. In that sense I fully agree with PC_Tool. It seems that Skype should predominantly or solely blame itself.

Score: 0

By PC_Tool

posted Aug 21, 2007 - 2:04 PM

*laughs*

So, to paraphrase...

"Sorry, guys. We put all of our eggs in a basket we had no understanding or control over, and amazingly, got burned. Sorry we tried to blame someone else for the fact that our entire operation hinges on a network of user systems we have no control over, no backup for, and no way to provide reliable or managed service without."

/me being thankful I haven't jumped on the VoIP bandwagon quite yet.

Score: 0

By wincement

posted Aug 21, 2007 - 9:31 PM

Skype has always hailed their P2P technology as their main advantage over the other guys. It allows them to offer a great service with amazing reliability that doesn't strain their servers to the breaking point (this recent snafu notwithstanding).

They've never made a secret of how their systems work and how they rely on P2P. They do have control over the network distribution, but there was a bug in their software that made this one case balloon into a major outage.

I don't think the paraphrase is a fair statement.

Score: 0

By ZenWarrior

posted Aug 21, 2007 - 9:26 PM

Actually PC_Tool, it's okay to jump on. (Really, the water's fine.) Although Skype went down for me, I never missed receiving or placing a call, not one.

You see, I did what Skype didn't do--built redundancy into my own systems. :^)

Score: 0