We often think about PowerShell v3 as being a management tool for the cloud. One new PowerShell v3 cmdlet that lends substance to this idea is Invoke-WebRequest. This is a handy for retrieving data from a web site resource. It might be a public web site or something on your intranet. For today’s fun I have a few lines of code I run to “scrape” information from http://manning.com. Since all of my recent books are through Manning I like to keep track of best sellers to see if any of my books make the list. Here’s how.
First, I need to grab the web page.
There is a potential memory leak you can run into if you run Invoke-Webrequest in the ISE so I recommend trying this in the console. The cmdlet returns a structured object which I’ll let you explore on your own. The fun part, is that the cmdlet creates a property called ParsedHTML. This property is the page structured in such as way that I can use DOM (document object model) methods like GetElementsbyTagName.
I looked at the source on manning.com and found the HTML code surrounding the best seller boxes. Knowing the tag information, I can use the DOM from the ParsedHTML property and retrieve the information I want. I know there are div tags with classname attributes of bestsellHeader and bestSellbox.
And what do you know? Learning PowerShell v3 in a Month of Lunches is the number 1 print bestseller. Thank you, by the way. This is a quick and dirty screen scrape but is just fine for my purposes. I have to admit I like using PowerShell to find out if my PowerShell books are best sellers.
I’d love to hear how you are using this new cmdlet.