Facebook apologizes for worst outage in 4 years
TAIPEI, 24 SEPTEMBER 2010 -
Many Facebook users were unable to access
the social networking site for up to two and a half hours on Thursday,
the worst outage the website has had in over four years, Facebook said
in a posting.
The problems were traced back to a change made by Facebook in one of its systems.
The
change was made to a piece of data that was called upon whenever an
error-checking routine found invalid data in Facebook's system. The
piece of data was itself interpreted as invalid, which caused the system
to try and replace it with the same piece of data and so a feedback
loop began.
The loop resulted in hundreds of thousands of queries
per second being sent to Facebook's database cluster, overwhelming the
system.
The result for users was a "DNS error" message and no access to the site.
"The
way to stop the feedback cycle was quite painful - we had to stop all
traffic to this database cluster, which meant turning off the site,"
wrote Robert Johnson, director of software engineering at Facebook, in a
post on the site. "Once the databases had recovered and the root cause
had been fixed, we slowly allowed more people back onto the site."
The
problem hasn't been entirely fixed. Johnson said Facebook had to turn
off the automated system to get the website back up and running. But
that system does play an integral role in protecting the website.
Facebook is now exploring new ways to handle the situation so it won't lead to another feedback loop.
"We
apologize again for the site outage, and we want you to know that we
take the performance and reliability of Facebook very seriously," he
wrote.
It's the second day Facebook was brought down for some
users. On Wednesday, Facebook blamed a third-party networking provider
for making the site inaccessible to some.
- wong chee tat :)
No comments:
Post a Comment