Amazon investigating problem after S3 suffers 8-hour outage

By Tim Conneally | Published July 21, 2008, 5:40 PM

Amazon's Simple Storage Service (S3) was down for more than eight hours over the weekend, affecting many prominent sites, and the company is still investigating the cause of the problem.

Cloud-based services such as those offered by Amazon provide cost effective solutions in computing and storage. However, the oft-cited drawback of relying on such offerings is that customers are left with little or no control if something goes wrong. The only option is to wait -- and in cases like this, wait nearly half a day.

Amazon's S3 Simple Storage Service which was introduced in 2006 is a part of the Amazon Web Services (AWS) suite, also consisting of the Elastic Compute Cloud (EC2) and SimpleDB services.

On July 20, the S3 component of AWS was down for more than 8 hours, affecting sites like SmugMug, Twitter, Centernetworks, and many of Amazon's own sites. The Amazon Web Service Health Dashboard shows that the Simple Storage Service and Simple Queue service experienced a "service disruption."

In a communication with the company, GigaOM's Om Malik received a rather general explanation as to why the service was down: "As a distributed system, the different components of S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide which redundant physical storage server to route a request to."

"We experienced a problem with those internal system communications, leaving the components unable to interact properly, and customers unable to successfully process requests. After exploring several alternatives, the team determined it had to take the service offline to restore proper communication and then bring service online again."

"These are sophisticated systems and it generally takes a while to get to root cause in such a situation -- we will be providing our customers with more information when we've fully investigated the incident," the company added.

Many companies utilize AWS, so a loss of functionality has the potential to affect a huge number of services. Both Red Hat and Sun utilize EC2, which has also experienced various outages. Consumer-aimed services like HP's Upline have faced numerous outages as well.

Comments

View comments by with a score of at least

I noticed the outage as a jungledisk user, but I've got to be honest S3 is the fastest, cheapest, most awesome online backup solution out there right now. 8 hours out of the 2 months I've been using it wasn't a problem. Now for me it's a backup solution, I don't host data I need to access immediately up there (unless of course my drives fail and then I would :).

Score: 0

|

Google Chrome 4: Yes, it's fast, but is it usable?

As Betanews readers have responded to our stories about Chrome's JavaScript superiority...Does that mean we'd actually use this browser? Well...

Video: Netflix on PlayStation 3

Netflix has come to the PlayStation 3 via Blu-ray and BD-Live.

Verizon Wireless launches new Android, Chocolate, and ruggedized phones

The lower-priced Eris joins the Droid, while the Chocolate gets a touchscreen and more music playback.

Early sales figures for Windows 7 nicely high, but do we know why?

Fans of triple-digit surges in figures quoted by Betanews will love this one, as it appears Microsoft rediscovered how to pull off a software launch.

Myka announces its latest Linux-based 'net top box'

Myka's ION brings Boxee, XMBC, and much more to HDTVs.

What hath Mac wrought? A remembrance after a quarter-century

The reason there's a Macintosh today is not because of some brilliant flash of engineering genius, but because Apple had the audacity to learn from its mistakes.

Early build of Moblin 2.1 improves connectivity, but not device support

The Linux Foundation's Atom-centric OS yesterday received a major overhaul with the project release of Moblin 2.1 for netbooks and nettops.

The iPhone's China syndrome: Sales of 5,000 and climbing

There's actually a country where Apple's device is not a godsend, where sales can be measured in the dozens.

New European counterpart to FCC will ensure 'a more neutral net'

Late Thursday night, the ruling telecom administrators of the EU's member nations signed away their final authority to a new entity overseen by the EC.

Sophos study suggests Windows 7 UAC's default setting is self-defeating

Without any anti-virus installed, a Sophos test showed, User Account Control was only capable of thwarting just one malware package out of ten samples chosen.

Indiscreet tweet trips awareness of Web SSL vulnerability

A group of high-level security engineers had been making progress on thwarting a low-level threat to the Web, until somebody blurted it all out on Twitter.