Tuesday, November 10, 2009

Router Ripples Cited in GMail Outage

Google has published an update on this afternoon’s Gmail downtime. “Today’s outage was a Big Deal, and we’re treating it as such,” writes Ben Treynor, Google’s VP of Engineering and Site Reliability Czar. “We’re currently compiling a list of things we intend to fix or improve as a result of the investigation.”

The problem? Treynor says Google underestimated the load that routine maintenance on “a small fraction” of Gmail servers would place on the routers supporting the application. “At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system ’stop sending us traffic, we’re too slow!’,” Treynor wrote. “This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded.”

No comments: