tag:blogger.com,1999:blog-314554422008-03-29T11:42:41.215ZPageScrapePageScrapehttp://www.blogger.com/profile/03568203859429081101noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-31455442.post-1154082980277518382006-07-28T11:06:00.000+01:002006-07-28T11:37:53.060+01:00PageScrape SMSAs part of the afore mentioned experiment into using PageScrape to build a simple SMS text message based weather forcast system I registered with <a href="http://www.clickatell.com">Clickatell</a>. They run a Gateway that allows clients to send SMS messages to mobile phones. Clients can submit message for sending via HTTP GET requests --> Enter PageScrape!<br /><br />It's free to register and they give you 10 credits free. It seems that sending txt messages mostly costs about 0.8 credits each, with each credit costing 0.04 cent. So, sending a txt to Vodafone or o2 costs about 0.032c which isn't too bad....<br /><br />The minimum number of credits you can purchase is 200 which works out at about a minimum payment of €8.80 to get going, again this is grand, some gateways expect you to spend way more money!<br /><br />Anyway to send a txt you just need your ApId, username and Password, target mobile number and a message, you can send a txt by issuing the following GET:<br /><br />http://api.clickatell.com/http/sendmsg?api_id=26437&user=username&password=passwd&to=&text=The Message!<br /><br />Using PageScrape:<br /><br />pscrape -u"http://api.clickatell.com/http/sendmsg?api_id=26437&user=username&password=passwd&to=&text=The Message!" -e"(.*)" -g<br /><br />It seems to work really well, the messages arrive quickly &c.<br /><br />They have a full HTTP API with methods for querying your credit balance and the like, all of these could be invoked using PageScrape.<br /><br />Anyway that's the outgoing part of the Weather Forecast System more or less squared away, next, how do we receive SMS messages? Although it costs to send the weather forecast txts, we want to make sure that it costs us _nothing_ to receive the initial request SMSsesses!<br /><br />Now where's that box of dusty old mobiles....PageScrapehttp://www.blogger.com/profile/03568203859429081101noreply@blogger.comtag:blogger.com,1999:blog-31455442.post-1153748821198343372006-07-24T14:27:00.000+01:002006-07-28T16:04:55.450+01:00Gosh it's HotWell, it's has been boiling here lately, very uncharacteristically warm and sunny! This has resulted in 'the weather' overtaking 'the house prices' as the topic of most conversations (be they in the pub or elsewhere).<br /><br />Anyway, sticking to the current conversational norm - Met Éireann has updated its web page, its new web page is much nicer than its old one, but it also means that we need a new PageScrape expression to scrape the weather forecast, The following does the trick, it gets Today's weather....<br /><br />pscrape -u"www.met.ie/forecasts/" -e"> Today.*p>([^<]*)<"<br /><br />This should result in something like --><br /><br />"Most places will have another warm, dry day with very good sunshine. However outbreaks of rain and drizzle will affect western and northwester areas throughout the day, some thundery downpours are possible. Highest <br />temperatures of 20 to 25 degrees."<br /><br />Here's a question -- armed with the above information, how would you go about building an SMS txt message based system which would send out this weather forecast text to a requester by SMS. The system where possible should use old stuff (perhaps mobile phone(s)?) found lying around the house at the bottom of boxes and dusty drawers as well as Broadband!<br /><br />Here's one interesting starting point - <a href="http://www.gnokii.org/">gnokii</a>PageScrapehttp://www.blogger.com/profile/03568203859429081101noreply@blogger.comtag:blogger.com,1999:blog-31455442.post-1153739454265792472006-07-24T11:33:00.000+01:002006-07-24T12:14:03.776+01:00CamScrapeThese days many devices (such as routers, mp3 streamers, IP cameras &c.) have little embedded HTTP web servers, they allow themselves to be configured and controlled via these servers. Control and communication with the device normally occurs through the user doing stuff from a web browser connected to the device.<br /><br />As usual (In the PageScrape spirit) there is no reason why these HTTP requests cannot be issued programmatically, and if the device relies primarily on GET requests there is no reason why PageScrape can't be used to control the device (and analyise the returned data).<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/7477/3404/1600/at.gif"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/7477/3404/320/at.png" border="0" alt="" /></a><br /><br />A few years ago I used PageScrape to control Turtle Beach's AudioTron, a network based mp3 streamer. The user could have full control of the features of the player all through simple GET requests, list tracks, albums etc, play, pause etc etc, it was all available via a published set of commands. I was easily able to implement a WiFi based remote control by invoking PageScrape with different GET requests and associated RegularExpressions.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/7477/3404/1600/peak_ipcam.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/7477/3404/200/peak_ipcam.jpg" border="0" alt="" /></a><br /><br />More recently I have been playing with a really nice WiFi IP Camera (Peak SOHO) that I came across in a <a href="http://www.reghardware.co.uk/2006/04/25/peak_wireless_enhanced_soho_internet_camera/">review</a> by <a href="http://www.theregister.com/">ElReg</a>.<br /><br />It's an amazing camera for the price, and once again deals with the world via a HTTP interface. So now I am wondering what changes need to be made to PageScrape to allow for it to acquire an image from the camera and then save it to disk? I want to automatically acquire images through a script and I hope that with only a few small changes, PageScrape can help out.....<br /><br />By the way I found it quite hard to find a decent distributer in the UK to purchase the camera from, the distributers listed on the <a href="http://www.peakhardware.com/index.aspx">Peak Website</a> were useless to a man!<br /><br />In the end I bought it from <a href="http://www.bigpockets.co.uk/">BigPockets</a>.PageScrapehttp://www.blogger.com/profile/03568203859429081101noreply@blogger.comtag:blogger.com,1999:blog-31455442.post-1153493449046159972006-07-21T15:39:00.000+01:002006-07-21T15:54:33.113+01:00Formatting FiascoOne of PageScrape's stalwart users has reported a bug in the output formatting option (-f). The functionality only works for the first 9 regular expression buffers, it does not work for buffers 10 and higher. For example consider the following:<br /><br /><b><span style="font-family:Times New Roman;font-size:100%;">pscrape -u"www.webscrape.com" -e"([^<]*)" -f"The Title is: \$1"<br /><br /></span></b>This will return the title of the webscrape web page as follows:<br /><br /><span style="font-style: italic;">The Title is: PageScrape - A HTML Screen Scrape Utility for Web Pages</span><br /><br />In the format string, \$1 refers to the first result buffer and causes the contents of this buffer to be inserted into the output string on a successful match. This works, and so will \$9 for the ninth buffer (if there is one in the corresponding Regular Expression), however it will not work for buffers donated by \$10 and higher.<br /><br />The formatting algorithm only checks for one decimal digit!<br /><br />This bug should be corrected soon.PageScrapehttp://www.blogger.com/profile/03568203859429081101noreply@blogger.comtag:blogger.com,1999:blog-31455442.post-1153491313827076322006-07-21T15:02:00.000+01:002006-07-21T16:56:09.340+01:00Broken ExamplesSo it appears that some of the examples on the <a href="http://www.webscrape.com">website</a> are now broken I haven't been checking them lately and hadn't noticed! This is a fundamental problem encountered when screen scraping, when the target screen format changes your scrapes may well stop working.<br /><br />As PageScrape uses regular expressions it is possible to design scrape expressions that are quite robust in the face of simple HTML format changes, but it is obviously never possible to protect against all changes, and so it becomes necessary to periodically check that your scrapes are still working. The good news is that this process can be automated, however it appears that I am just to bone lazy to automate the testing of the PageScrape examples!<br /><br />I hope to get around to fixing the broken example soon... I might remove the stock quote examples as they are a bit complex, a bit boring, and just a bit cliched and I am getting loads of stock related spam!<br /><a href="http://www.webscrape.com/StockQuoteExample.htm"></a>PageScrapehttp://www.blogger.com/profile/03568203859429081101noreply@blogger.comtag:blogger.com,1999:blog-31455442.post-1153489593919867042006-07-21T14:43:00.000+01:002006-07-21T14:46:33.926+01:00Brave New WorldI have decided to set up a BLOG for posting any vaguely interesting guff relating to PageScrape and Screen Scraping in general. So welcome one and all!<br /><br />http://www.webscrape.comPageScrapehttp://www.blogger.com/profile/03568203859429081101noreply@blogger.com