Friday, July 21, 2006
Formatting Fiasco
One of PageScrape's stalwart users has reported a bug in the output formatting option (-f). The functionality only works for the first 9 regular expression buffers, it does not work for buffers 10 and higher. For example consider the following:
pscrape -u"www.webscrape.com" -e"([^<]*)" -f"The Title is: \$1"
This will return the title of the webscrape web page as follows:
The Title is: PageScrape - A HTML Screen Scrape Utility for Web Pages
In the format string, \$1 refers to the first result buffer and causes the contents of this buffer to be inserted into the output string on a successful match. This works, and so will \$9 for the ninth buffer (if there is one in the corresponding Regular Expression), however it will not work for buffers donated by \$10 and higher.
The formatting algorithm only checks for one decimal digit!
This bug should be corrected soon.
pscrape -u"www.webscrape.com" -e"([^<]*)" -f"The Title is: \$1"
This will return the title of the webscrape web page as follows:
The Title is: PageScrape - A HTML Screen Scrape Utility for Web Pages
In the format string, \$1 refers to the first result buffer and causes the contents of this buffer to be inserted into the output string on a successful match. This works, and so will \$9 for the ninth buffer (if there is one in the corresponding Regular Expression), however it will not work for buffers donated by \$10 and higher.
The formatting algorithm only checks for one decimal digit!
This bug should be corrected soon.