Friday Fun: Scraping the Web with PowerShell v3

We often think about PowerShell v3 as being a management tool for the cloud. One new PowerShell v3 cmdlet that lends substance to this idea is Invoke-WebRequest. This is a handy for retrieving data from a web site resource. It might be a public web site or something on your intranet. For today’s fun I have a few lines of code I run to “scrape” information from http://manning.com. Since all of my recent books are through Manning I like to keep track of best sellers to see if any of my books make the list. Here’s how.

First, I need to grab the web page.

PS C:\> $data = Invoke-Webrequest "http://manning.com"

There is a potential memory leak you can run into if you run Invoke-Webrequest in the ISE so I recommend trying this in the console. The cmdlet returns a structured object which I’ll let you explore on your own. The fun part, is that the cmdlet creates a property called ParsedHTML. This property is the page structured in such as way that I can use DOM (document object model) methods like GetElementsbyTagName.

I looked at the source on manning.com and found the HTML code surrounding the best seller boxes. Knowing the tag information, I can use the DOM from the ParsedHTML property and retrieve the information I want. I know there are div tags with classname attributes of bestsellHeader and bestSellbox.

PS C:\> $data.ParsedHtml.getElementsByTagName("div") | Where "classname" -match "^bestsell" | Select -ExpandProperty InnerText
PRINT BESTSELLERS
December 20, 2012
Learn Windows PowerShell 3 in a Month of Lunches, Second Edition
Hello World!
Spring in Action, Third Edition
The Quick Python Book, Second Edition
The Well-Grounded Java Developer
C# in Depth, Second Edition
Windows PowerShell in Action, Second Edition
jQuery in Action, Second Edition
Hadoop in Action
Hadoop in Practice
MEAP BESTSELLERS
December 20, 2012
F# Deep Dives
Node.js in Action
AOP in .NET
Secrets of the JavaScript Ninja
HTML5 for .NET Developers
The Responsive Web
Taming Text
Single Page Web Applications
Play for Scala
Scala in Action

And what do you know? Learning PowerShell v3 in a Month of Lunches is the number 1 print bestseller. Thank you, by the way. This is a quick and dirty screen scrape but is just fine for my purposes. I have to admit I like using PowerShell to find out if my PowerShell books are best sellers.

I’d love to hear how you are using this new cmdlet.

Post to Twitter Post to Plurk Post to Yahoo Buzz Post to Delicious Post to Digg Post to Facebook Post to FriendFeed Post to Google Buzz Post to Ping.fm Post to Reddit Post to Slashdot Post to StumbleUpon Post to Technorati