Skype: Users Rebooting Brought Service Down

By Scott M. Fulton, III, BetaNews

August 20, 2007, 12:03 PM

A system outage that impacted Skype users for about 48 hours last week has officially been attributed to a multitude of Windows-based clients receiving critical security patches and rebooting at roughly the same time. According to the company, the reboots triggered a flood of logon requests that collided at Skype's network hub, like a circumstantial form of denial-of-service.

Coupled with a reduction in the P2P capacity of the Internet at the time those Windows reboots were going on, there simply wasn't enough capacity in the network to handle the network traffic, as the company's Villu Arak explained this morning.

"The high number of restarts affected Skype's network resources," Arak stated on the company's blog. "This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact."

It's an unusual explanation, which omits the obvious fact that Patch Tuesday happened...on a Tuesday, while the Skype failure upon which the reboots were blamed happened on a Thursday. The new explanation doesn't appear to coincide with status reports given by Arak to customers during the outage.

Last Thursday, while connection problems were on the rise, he attributed the problem to Skype server software as though it had already been diagnosed there. "This problem occurred because of a deficiency in an algorithm within Skype networking software. This controls the interaction between the user's own Skype client and the rest of the Skype network."

This morning's explanation shifts the blame from the server to problems on the client side, or at least server problems that were triggered by unforeseen network conditions on the client side.

Immediately, one of the world's most trusted security researchers - University of Illinois, Urbana-Champaign programmer John C. Bambenek - saw evidence of a possible huge security hole, if Arak's explanation proves to be accurate.

Obviously, the first problem is the need for huge monthly security patches in the first place. But simply addressing that need points to a sadly common consumer behavior, Bambenek pointed out: "The second interesting note, is that if Skype's explanation is true," he wrote, "that means that vast majority of Skype users have machines that don't require a login on boot. Those machines simply happily login as the default user (and I bet almost all have full admin rights) and the login in to Skype (and their other start-on-boot applications)."

If Skype were an application installed mostly on servers as opposed to clients, admins might be cautious enough to reset the defaults so patches and updates didn't always get requested at 3:00 am. That way, requests wouldn't collide and the network would run smoother. (Microsoft did not report any problems in actually serving patches at roughly the same time.) Typical customer behavior, Bambenek pointed out, is for consumers to leave software set to their defaults. Download patches at the default time...reboot computers in the default way.

But being a security engineer, Bambenek was smart enough to ask first, all things being equal, why did all those reboots take 48 hours?

At its height, the outage prompted some comments from one of Skype's competitors in the P2P conferencing field, SightSpeed. CTO and founder Aron Rosenberg remarked to one of his company's bloggers that Skype's network infrastructure is actually a kind of star/P2P hybrid topology, in which systems in-between the hub and general clients act as supernodes. The identity or location of supernodes isn't planned in advance; rather, certain client systems with higher capacity for marshaling and regulating P2P conferencing traffic get promoted on the fly.

A Cornell University study in 2006 concluded that Skype's supernode architecture was key to minimizing bandwidth use across the network, while at the same time reducing the quantity of network traffic in which noise reduction algorithms needed to be used. This could be the key to Skype's relatively high quality of service, last week notwithstanding.

SightSpeed doesn't think so. "In theory this is a good idea," writes SightSpeed's Peter D. Csathy, "but the problem happens if your network starts to destabilize. Skype, as a company, has no physical or programmatic control over the most vital piece of its product. Skype instead is at the mercy of and vulnerable to the people who unknowingly run the SuperNodes."

Theoretically, since there would be a fewer number of supernodes than general nodes in the P2P network, it would take less leverage on the part of a massive software update by Microsoft or anyone else to force those supernodes to reboot at about the same time.

Readers of the independent blog Skype-Watch have expressed skepticism about the company's ability to communicate well with its customer base, especially since it was purchased by eBay. One of its readers writes for its forum, "Microsoft has released several forced reboot patches in the past, including the last three years - the timeframe that the Skype Bug was present. So what makes this one different? And since there was no massive forced Skype Update, then the Skype community can have this happen [again] until there is one, next month or next time MS releases a forced update security patch. When will Skype update the vast majority of clients to avoid this ticking time bomb? It would seem Skype is not very worried about this happening again, yet doing nothing to update the hive to protect it."

Once again, as was the case with AOL Instant Messenger at the turn of the decade, users who pay nothing find themselves paying the price.

Add a Comment (33 Comments)

BetaNews reserves the right to remove any comment at any time for any reason. Please keep your responses appropriate and on topic. Foul language and personal attacks will not be tolerated.

Name (required):

E-mail (required):

Enter Your Comment:

By keir

posted Aug 21, 2007 - 8:31 PM

mine? I fail to see what your comment has to do with the outage.

Score: 0

By NULLedge

posted Aug 21, 2007 - 11:25 AM

unlike AIM, skype does charge its customers for things like the phone calls they make and often times at $10 a pop, so i fail to see the relevance of the last comment.

Score: 0

By keir

posted Aug 21, 2007 - 5:24 AM

what about people like me who don't need a password to login to their PC but don't let Skype auto-run on strartup either? They need to be taken into account too.

Score: 0

By ZenWarrior

posted Aug 20, 2007 - 9:56 PM

Virtually no one on the net anywhere in the world, especially true geeks who actually know a thing or two, is buying Skype's explanation. It appears Skype may have simply created another PR nightmare with its decidedly vacuous explanation.

As for me, whatever trust I had in Skype did not vanish with The Great Outage of 2007. It vanished with the not-so-great explanation.

Score: 0

By jbaltz69

posted Aug 20, 2007 - 6:05 PM

What a load of BS.

Score: 0

By rseiler

posted Aug 20, 2007 - 4:54 PM

I nominate the term "Skypegate" as handy shorthand. Is there one better?

Score: 0

By Program86

posted Aug 20, 2007 - 2:48 PM

You'd have to be a complete moron to believe that lame azz excuse.

LOL@computernoobexcuses

Score: 0

By wincement

edited Aug 20, 2007 - 2:07 PM

From another article I read about this:

"Skype has said that they have a self healing system built into their network, but a software bug knocked that out and prevented a network resource algorithm from working."

So, no, just because the outage happened two days after the Windows update, it doesn't mean they're lying.

Also, the way the Skype client is built is it just keeps trying to logon until it's successful. It doesn't timeout or ask you if you want to abandon the attempt. Knowing this, that means the already crippled Skype network was being hammered with logon requests the *ENTIRE* time.

Try looking a little deeper before jumping to conclusions.

Score: 0

By SMFulton3

posted Aug 20, 2007 - 2:55 PM

Well, digging deep is what we do for a living, wincement. I didn't mean to infer Skype was lying, although it does sound like its initial sequence of explanations are a bit contradictory to one another...and that 48-hour gap is really tough to explain. Now, if something about the updates themselves made mincemeat (or "mincement," to coin a phrase) of Skype, then that might make sense...except I've upgraded several systems this week without apparent damage to Skype.

We're not at the point where we can jump to conclusions yet, but what we've dug up is a fairly odd assortment, so that's why I shared it with you.

-SF3

Score: 0

By wincement

edited Aug 20, 2007 - 4:56 PM

I'm sorry if it sounded like I was talking to you. I was talking to the comments on the story. That was my mistake for not clarifying (after all, I was commenting on your story).

On the contrary, I believe your article was very responsible in the way you handled the "suspicious" nature of the story. The comments however, sound like conspiracy theorists desperate for a new conspiracy.

I won't deny more explanation is needed. I just don't think the "This is PR BS" comments are justified yet.

EDIT:
I didn't notice this before:

...It would seem Skype is not very worried about this happening again, yet doing nothing to update the hive to protect it."

Again... written by a user. They have no basis to say that. Who says Skype is "doing nothing" about it?

Score: 0

By computershack

posted Aug 20, 2007 - 2:39 PM

"So, no, just because the outage happened two days after the Windows update, it doesn't mean they're lying."

OF COURSE THEY ARE. How many other 'Patch Tuesdays' have there been? ONE A MONTH, every month for as long as Skype has been on the go.

Yet this is the first such an outage has happened and it's the fault of all those PCs rebooting due to the Windows updates?

And how is that different to people turning on their PC? Not everyone keeps them running and logged on 24/7 so Skype would be getting the same number of logon requestsas normal.

TRY LOOKING A LITTLE DEEPER BEFORE BUYING THE LIES.

Score: 0

By BAlGaInTl

posted Aug 21, 2007 - 9:00 AM

Of course... there have probably also been more and more Skype users each of those months.

It may have hit a breaking point.

Score: 0

By wincement

edited Aug 20, 2007 - 3:34 PM

Please don't shout at me. I didn't shout at you.

If you had bothered to read the quote I put just before that, you might have drawn a conclusion similar to:

"Skype said the problem was a combination of the rebooting users and a bug in their network recovery software."

See how easy that was?

And how is that different to people turning on their PC? Not everyone keeps them running and logged on 24/7 so Skype would be getting the same number of logon requestsas normal.

That just shows you haven't even begun to understand the first part of the problem. The point was that everyone rebooted around the exact same time .

Score: 0

By ZenWarrior

posted Aug 20, 2007 - 2:19 PM

On the other hand, Skype posted to its blog that users should keep their Skype clients running and waiting for login.

If a flood of login requests was the problem, then why ask for the offending behavior to continue?

On another note, I want to congratulate Scott and BetaNews for the best article I've seen about this on all the net. Real reporting there with the right questions asked. Thanks!

Score: 0

By wincement

posted Aug 20, 2007 - 3:36 PM

On the other hand, Skype posted to its blog that users should keep their Skype clients running and waiting for login.

You're right. That is strange. However, we must also remember that they didn't know themselves what the problem was when this was happening.

Score: 0

By BAlGaInTl

posted Aug 20, 2007 - 1:11 PM

I'm not buying it either. The downtime was just too long for that to be the problem.

Score: 0

By Anastasia2007

posted Aug 20, 2007 - 12:41 PM

So.. I suppose they think that nobody uses passwords on their machines. Even the biggest computer idoits I know use multiple accounts for family members which make the system not auto-login.

Score: 0

By wincement

edited Aug 20, 2007 - 3:45 PM

You must know some smart "idoits" =p

Seriously though, my experience has been a polar opposite from yours. Maybe I just know a bunch of dumb people. I don't know. But if someone told me that 80% of Windows users used a single administrator account with no password, I would believe them.

Score: 0

By ZenWarrior

posted Aug 20, 2007 - 12:33 PM

First, thanks for getting the story up, Scott.

Yea, this sounds like total BS from Skype. As the article points out, something just doesn't sound right. In fact, something downright stinks and I think it's the smell of Skype trying to cover its butt.

Bottom line (IMHO): Skype was the victim of a serious attack. But even more, it's still vulnerable and thus the inconsistent (nonsensical?) explanations and Skype vs. Microsoft update time frames.

Score: 0

By kbsoftware

posted Aug 20, 2007 - 12:30 PM

I think Skype is just pushing some very bad pr b.s. to hide the real problem. Unfortunately their pr has created a whole new set of problems.

I'm going to be taking a closer look at Skype now, I'm rather curious at just how week the entire system really is and if it could be exploited.

Score: 0

By wincement

posted Aug 20, 2007 - 2:02 PM

I think you meant "weak".

Score: 0

By kbsoftware

posted Aug 20, 2007 - 3:22 PM

It's good to see you are using your powers for good and not evil lol

Score: 0

By yokozuna

posted Aug 20, 2007 - 12:25 PM

The world has 24 time zones and I do not think that all users updated their systems at once. Moreover, it would be easy to solve the the problem but the outage was long lasting (AFAIK 53 hours).

Score: 0

By wincement

edited Aug 20, 2007 - 2:05 PM

Correct me if I'm wrong, but the large majority of Skype users are in the U.S. That would mean at 3a.m., GMT -0500, -0600, -0700, and -0800, most users were rebooting and reconnecting.

The part about no password protected logins is right-on too. I have *very* rarely worked on a person's computer that had their one and only account protected by password. I would almost venture to say "never," but there must have been one in there ...somewhere.

Score: 0

By yokozuna

posted Aug 20, 2007 - 3:12 PM

I afraid that you are wrong. The traffic looks like this: http://skypejournal.com/...6/why_skype_peaks_1.php The users of the service come from the following countries: http://eurotelcoblog.blo...ay-stumbled-across.html In fact most traffic is generated by users from Europe. If you check the history of the outage (but Skype forums, not heartbeat.skype.com!) you will see that the outage started in the early morning GMT, what does not make any sense if you take the claims of Skype seriously.

Score: 0

By wincement

posted Aug 20, 2007 - 3:42 PM

Umm... you link an article from June 1, 2005 and expect it to be an accurate representation of current usage?

I'd like to see more recent stats before I take that as Gospel.

Score: 0

By yokozuna

edited Aug 20, 2007 - 5:59 PM

The difference is rather minor: http://homepage.mac.com/...growth/skypegrowth.html and http://beyondthebleeding...king-by-country-us.html

edit: bad link

Score: 0

By dhjdhj

posted Aug 20, 2007 - 2:02 PM

How long do you think it should have taken to find out WHAT happened? Then how long do you think it should have taken to figure out how to fix it? And then how long to test it?

Score: 0

By yokozuna

posted Aug 20, 2007 - 3:04 PM

OK, I will tell you what I do think about. I am very suspicious which Skype gave to their users because they have changed their mind twice what you can see if you trace rather laconic communiques they gave. It rather clearly means that they hide something and do not want to say the truth. P2P analysts already said that if Skype believes that they are right supernodes should not be switched off. To some extend Skype works like BitTorrent or eMule - if trackers or servers are down no new users will be able to connect. But these users who are already connected through kademlia/supernode routing should be able to exchange data or voice packets. It means that the network of Skype has completely diifferent topology that the company claims or the company does not tell the truth about the nature of the outage. Tertium non datur.

Score: 0

By dhjdhj

posted Aug 20, 2007 - 4:09 PM

What the '****ium' is 'Tertium non datur'?

Did you mean 'Veritas' (truth) not given/provided etc?

Score: 0

By yokozuna

posted Aug 20, 2007 - 5:54 PM

I mean that you can tell the truth or not. You can be pregnant or not. You cannot be pregnant a little bit or pregnant in a sense. Get it?

Score: 0

By dhjdhj

posted Aug 20, 2007 - 9:00 PM

I know what you were TRYING to say ---- but it's not what you actually said!

(sigh)

Score: 0

By ZenWarrior

posted Aug 20, 2007 - 12:35 PM

Excellent points, especially the time required to fix the problem.

Score: 0