The road near where I live is in the process of being gentrified, with the street itself being upgraded and made more pedestrian friendly. To keep up with the constant shifts in road closures and parking changes I have been monitoring the council web site for updates. However, it's a bit inconvenient having to remember to check the site every so often when there may be no change for a month, so I found a better solution whereby changes come to me. Remember
Yahoo Pipes?
In 2007 Yahoo launched Pipes to herald in a new era of web-generated mash-ups. After some initial
glowing reviews most people forgot about it, and I was surprised to find that it is still running today, seemingly only occasionally used by those ancient coders who care about RSS. After some experimentation I have joined this small group, and am now a believer in the power of the pipe.
Some time ago I linked my rarely used Yahoo account to my often Google account, so I when I came to log in to Yahoo it used my Google authentication. I wish all sites let me do this, instead of forcing me to remember more passwords. Plus, my Google account uses two-factor authentication, automatically making linked sites doubly secure. So, a seamless entrance to Pipes was a good start.
 |
| HTML of Page to Scrape |
The
page I was interested in converting to RSS had a "Latest News" section, with a bunch of news items underneath. See screenshot of the HTML. Luckily, these items were the only elements on the entire page using
h2 tags, allowing the creation of an amazingly simple Pipe to grab the page from the web site, parse it, search for elements within h2 tags, and output those elements to an RSS feed.
 |
| Yahoo Pipe Designer |
The Pipe I created is shown in its entirety here. It consists of an XPath Fetch Page module with two parameters, the URL of the page to parse, and the XPath to find the relevant elements within that page. I determined the XPath with the $x('') command in the Chrome Developer Tools Console. As you can see from the screenshot, it returns the elements of interest.
 |
| Validating the XPath in Chrome |
The output of the XPath Fetch Page module was piped into the Create RSS module, where the data elements were put into the correct location for the RSS. Here, the only output was into the content field of each item, so I used that to fill in the Title field of the RSS. For this simple Pipe I didn't bother with the description field, though I did later develop a fancier Pipe incorporating descriptions.
That was all! A person with some experience in developing Pipes would be able to create this web page to RSS pipe in about five minutes. Doing this in code, such as Python, PHP or even C#, would take much, much longer. For a short-lived feed like this (assuming these road-works eventually finish) it would not be worth the effort in a scripting or programming language, but it's the most perfect task for Yahoo Pipes. Did I mention it's free?