Tuesday, November 10, 2009

Google Sorts 1 Petabyte of Data in 6 Hours

Google has rewritten the record book and perhaps extended the benchmark for sorting massive volumes of data. The company said Friday that it had sorted 1 terabyte of data in just 68 seconds, eclipsing the previous mark of 209 seconds established in July by Yahoo. Google’s effort included 1,000 computers using MapReduce, while Yahoo’s effort featured a 910-node Hadoop cluster.

Then, just for giggles, they expanded the challenge: “Sometimes you need to sort more than a terabyte, so we were curious to find out what happens when you sort more and gave one petabyte (PB) a try,” wrote Grzegorz Czajkowski of the Google Systems Infrastructure Team. “It took six hours and two minutes to sort 1PB (10 trillion 100-byte records) on 4,000 computers. We’re not aware of any other sorting experiment at this scale and are obviously very excited to be able to process so much data so quickly.”

No comments: