I love starting new things. This project uses the Integrity V6 Engine for the crawling which means that I could get right on and build the output functionality.
I noticed that this is something people have been trying to use Scrutiny's search functionality to achieve. Scrutiny will report which pages contain (or don't contain) your term in the text or the entire code. And you can export results to csv and choose columns.
But Scrutiny (currently) can't extract data from particular css classes or ids.
This is where WebScraper comes in. It quickly scan a website, and can output the data (currently) as csv or json. (Anyone want xml?) The output can include various meta data (more choices to be added), the entire content of each page (as text, html or markdown) and can extract parts of the pages (currently a named class or id of divs or spans).
Webscraper is new and in beta. Please use it for free and please get in touch with any requests, bug reports or observations.
There's a short demo video here
Awesome! I'm excited to see how this turns out!ReplyDelete