Sunday, 27 September 2015

Breakthrough in wikipedia spidering project - 3 million links checked

Today a scan of 3 million links finished. That in itself isn't a breakthrough because I've previously made such a scan, but this time thanks to a very small change at the very heart of the new v6 crawling engine,  at the end of this 3 million link scan, the app was still working within expected resources and both the app and the Mac were still responsive.
 That's a very large number of broken links? But it does seem that this is a true result:
So we're now 'game on' for a 5 million link crawl. The aim of the game is to find out whether the 'six degrees' theory is true (whether you can really reach any of the 5 million English language pages within six clicks).

No comments:

Post a comment