Friday, July 21, 2006

 

Broken Examples

So it appears that some of the examples on the website are now broken I haven't been checking them lately and hadn't noticed! This is a fundamental problem encountered when screen scraping, when the target screen format changes your scrapes may well stop working.

As PageScrape uses regular expressions it is possible to design scrape expressions that are quite robust in the face of simple HTML format changes, but it is obviously never possible to protect against all changes, and so it becomes necessary to periodically check that your scrapes are still working. The good news is that this process can be automated, however it appears that I am just to bone lazy to automate the testing of the PageScrape examples!

I hope to get around to fixing the broken example soon... I might remove the stock quote examples as they are a bit complex, a bit boring, and just a bit cliched and I am getting loads of stock related spam!

Comments:
Then use Pagescrape in combination with a free link validation service linke http://www.linktiger.com

They sent you an e-mail if links are broken so you can fix that :)
 
Why cant I scrape a web page that starts with https: ?
Seems dumb to build a tool that fails here..
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?