Amazon investigating problem after S3 suffers 8-hour outage

By Tim Conneally | Published July 21, 2008, 5:40 PM

Amazon's Simple Storage Service (S3) was down for more than eight hours over the weekend, affecting many prominent sites, and the company is still investigating the cause of the problem.

Cloud-based services such as those offered by Amazon provide cost effective solutions in computing and storage. However, the oft-cited drawback of relying on such offerings is that customers are left with little or no control if something goes wrong. The only option is to wait -- and in cases like this, wait nearly half a day.

Amazon's S3 Simple Storage Service which was introduced in 2006 is a part of the Amazon Web Services (AWS) suite, also consisting of the Elastic Compute Cloud (EC2) and SimpleDB services.

On July 20, the S3 component of AWS was down for more than 8 hours, affecting sites like SmugMug, Twitter, Centernetworks, and many of Amazon's own sites. The Amazon Web Service Health Dashboard shows that the Simple Storage Service and Simple Queue service experienced a "service disruption."

In a communication with the company, GigaOM's Om Malik received a rather general explanation as to why the service was down: "As a distributed system, the different components of S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide which redundant physical storage server to route a request to."

"We experienced a problem with those internal system communications, leaving the components unable to interact properly, and customers unable to successfully process requests. After exploring several alternatives, the team determined it had to take the service offline to restore proper communication and then bring service online again."

"These are sophisticated systems and it generally takes a while to get to root cause in such a situation -- we will be providing our customers with more information when we've fully investigated the incident," the company added.

Many companies utilize AWS, so a loss of functionality has the potential to affect a huge number of services. Both Red Hat and Sun utilize EC2, which has also experienced various outages. Consumer-aimed services like HP's Upline have faced numerous outages as well.

Comments

View comments by with a score of at least

I noticed the outage as a jungledisk user, but I've got to be honest S3 is the fastest, cheapest, most awesome online backup solution out there right now. 8 hours out of the 2 months I've been using it wasn't a problem. Now for me it's a backup solution, I don't host data I need to access immediately up there (unless of course my drives fail and then I would :).

Score: 0

|

Microsoft's Bob Muglia and Ray Ozzie on Silverlight vs. standards

Bob Muglia: "We're trying to provide people with an environment that has capabilities that you just simply can't do today in the standards-based world."

Sony looks to finally open a single storefront for downloads

Sony has had many different download portals for movies, music, e-books, and games, and now it's looking to make a single shop for all of it.

PDC 2009: What have we learned this week?

There was the freebie that no one will forget, the heebie-jeebies courtesy of Scott Guthrie, and a teensy bit clearer picture of how this cloud thingie should work.

Tuning out the tablet: Time to give the endless speculation a rest

Wide Angle Zoom: Wishing and hoping and thinking and praying....won't put an iTablet on the market.

Microsoft's .NET Micro Framework is now free and open source

The latest version of Microsoft's .NET Micro framework is now in the hands of the FOSS community.

Google's value proposition for Chrome OS: Should we feel insulted?

For a search engine that has direct access to all the world's online history, it appears to have taught Google nothing about selling a machine.

E-book readers will be in short supply this holiday season

E-readers are hot this year, and a lot of compelling new products have been released, but are there enough electrophoretic displays to go around?

Five improvements for IT managers in 2010

If businesses are to improve their efficiency for next year, they need to stop and reassess the basic tenets of their job.

Live report: Will Google Chrome OS change Linux?

The mysteries of just what Chrome OS is, and how much of an operating system it truly is, may be resolved today.

AOL's spinoff from Time Warner to shed 2,500 jobs

As AOL moves toward become an independent company again, it will cut nearly a third of its workforce.

PDC 2009: Microsoft cares about Web browser performance

The effort to give users of the world's dominant Web browser the impression of quality, is a personal one for the man who leads that battle.