Your chance to download the internet (kind of)
Get 80 terabytes of archived web crawl data
It’s happened to all of us at some time or another – you’re out somewhere, and you have to show somebody a cat video you saw online somewhere, but there’s no internet connectivity. Maybe it’s because you’re in a submarine, or in the middle of a dramatic hostage situation, or you’ve been abducted by space monsters and taken through a worm hole to a distant galaxy or something. The details are inconsequential, but the point is – what now?
Fortunately, the people over at the Internet Archive have the ultimate solution – 80 terabytes of archived web crawl data that you can download and keep forever just in case.
“We are interested in exploring how others might be able to interact with or learn from this content if we make it available in bulk,” reads an update over on the Internet Archive blog.
“To that end, we would like to experiment with offering access to one of our crawls from 2011 with about 80 terabytes of WARC files containing captures of about 2.7 billion URIs. The files contain text content and any media that we were able to capture, including images, flash, videos, etc.”
The archive spans 9 March 2011 to 23 December 2011, and includes:
- Number of captures: 2,713,676,341
- Number of unique URLs: 2,273,840,159
- Number of hosts: 29,032,069
So, approximately 79.9 terabytes of porn plus a few hundred megabytes of cat videos, rage comics, and Call of Duty frag montages. That should cover you for most of the weekend, at least.