Friday, July 21, 2006
Broken Examples
So it appears that some of the examples on the website are now broken I haven't been checking them lately and hadn't noticed! This is a fundamental problem encountered when screen scraping, when the target screen format changes your scrapes may well stop working.
As PageScrape uses regular expressions it is possible to design scrape expressions that are quite robust in the face of simple HTML format changes, but it is obviously never possible to protect against all changes, and so it becomes necessary to periodically check that your scrapes are still working. The good news is that this process can be automated, however it appears that I am just to bone lazy to automate the testing of the PageScrape examples!
I hope to get around to fixing the broken example soon... I might remove the stock quote examples as they are a bit complex, a bit boring, and just a bit cliched and I am getting loads of stock related spam!
As PageScrape uses regular expressions it is possible to design scrape expressions that are quite robust in the face of simple HTML format changes, but it is obviously never possible to protect against all changes, and so it becomes necessary to periodically check that your scrapes are still working. The good news is that this process can be automated, however it appears that I am just to bone lazy to automate the testing of the PageScrape examples!
I hope to get around to fixing the broken example soon... I might remove the stock quote examples as they are a bit complex, a bit boring, and just a bit cliched and I am getting loads of stock related spam!
Comments:
<< Home
Then use Pagescrape in combination with a free link validation service linke http://www.linktiger.com
They sent you an e-mail if links are broken so you can fix that :)
Post a Comment
They sent you an e-mail if links are broken so you can fix that :)
<< Home
