2015-October Scripting Games Wrap-Up

The October puzzler was a tough one, especially if you're not used to dragging information from the Web in XML form. Adam Bertram provides our Celebrity Entry.

Celebrity Entry

Note: Adam's entry is helpfully posted in his GitHub repo.

As Don mentioned in the challenge, putting the functionality of finding items in a RSS feed should be put into a function. Why? It's because, chances are, you're probably going to want to do this for more than one RSS feed. The moment you think to yourself "Hmm..getting the feed for FOX News is nice but maybe I'd want to get it for CNN as well" would be the time you'd need to build a function out of it.

Every time I begin writing a function I brainstorm what attributes of that function might change over time. When you want to find items in an RSS feed what are the things that might be different? I'm probably going to want to get more than one RSS feed, right? Well, what are all the attributes that go into that? In our case, it was just a single attribute; the URL or URI (uniform resource identifier) to be specific. That became my first parameter. When I create parameters I'll always default to putting some kind of validation attribute on them. It's always better to limit input as tightly as possible and catch potential errors ahead of time. For a URL, the perfect validation attribute is the ValidatePattern attribute. Because I'm not the best at building regex patterns from scratch I looked up a good URL match found one on the web. I then tested it with a few different URLs using the -match operator and found that it was working just as I would have hoped so I used that one.

After I got the URI parameter in there, I began to work on the actual functionality to get the RSS feed items. The first cmdlet that came to mind was Invoke-WebRequest. Although not good for downloading large files and content (it can be pretty slow sometimes). It is, by far, the easiest and most straightforward way to download HTTP content and, technically, a RSS feed is just a propertly formatted HTML page to greatly simplify it. I found a RSS feed and pointed Invoke-WebRequest to it and start checking out the result. I soon found it contained all of the data I was looking for but I couldn't find the actual items anywhere.

Then I noticed that the data was in XML format. This gave me an idea. I then changed the type to [xml] which parsed out the XML and allowed me to finally find the items hidden in the rss.channel.item property. This got me all of the items I was looking for. At first, I was just casting the $result variable directly to [xml] when I thought "What if the HTTP response didn't succeed?". I needed some error handling in there so instead of doing [xml]$result I decided to first check for the HTTP status code on the return. A successful HTTP status code is 200 so I'm checking to ensure I got that before proceeding. If I didn't, I'm then throwing an exception out to my catch block which stops the function's execution.

After I can confirm that the status was successful only then do I cast $result to [xml]. Again, as an error handling step, I'm then ensuring that $xRss.rss.channel.item actually has anything in there and if so, I'm continuing.

I've now got all my items bundled up in a collection. I'm now able to iterate over them with a foreach loop as I'm doing. I could technically just send this out of the function but it's going to be in an XML element object. I prefer to create my own objects and tend to always use the [pscustomobject] type accelerator. By creating your own objects to send to the pipeline allows you more control. In my example here, I'm creating an $output hashtable for every item. I'm then choosing various attribute of each item that I would like to show. You can see that properties had multiple items in them such as the comments property. I had to break that up to show the link and the count separately. Also, since there might be multiple categories associated with an item I had to convert those into a string from an array by joining them together with a comma. When I've created the hashtable as I see fit the only thing left to do is create the object and send it out to the pipeline.

At this point, the function is pretty good but it still didn't have many configuration options so I decided to add some filtering options to it similar to Get-EventLog. It's always a good idea to try to think about similarities in your function to the default cmdlets. If you can find one that's similar, I'll use the exact same parameter names just to keep things simple. Get-EventLog has a Newest and Oldest parameter which does exactly what my function's parameters are doing. I could have called them GetTheNewest and changed up the functionality slightly but it would be confusing.

I decided to add the Newest, Oldest and Author parameters last. This allows the user to filter by parameter instead of using Where-Object to filter. Again, using as much parameter validation as possible I ensured both the Newest and Oldest parameters were both integers since this is how Get-EventLog is. I'm also using ValidateRange here in case someone tries to put a negative number or some crazy big number in there. Since the Author parameter could be just about any string I have no validation in there.

You'll also notice that I implemented some parameter sets. I didn't want someone trying to get the newest 10 items AND the oldest 2 items at the same time. This prevented them from using both in tandem.

You'll see that adding these parameters added quite a bit of code. Depending on the combination of these that were used depended on the parameters being used for the Select-Object, Where-Object and Sort-Object cmdlets. You'll notice that I didn't repeat code. It's always a good idea to create the default set of parameter as early as possible that apply to everything and then incrementally add to them as you go down. There were some instances where I technically didn't need the Where-Object cmdlet, for example, but to prevent code duplication I simply put an expression in there that would always evaluate to $true. You'll also noticed that I used $PSBoundParameters.ContainsKey() instead of just using the variable name. This is another best practice I use to be as specific as possible. This allows me to easily look down through the code and see that I want to know if this variable came from a parameter or not; not just from some variable created in the function somewhere.

Finally, the last piece to mention is line 68 where I'm using a Select-Object property. This was one of the last pieces I put in. Since I wanted the ability to output only the newest or the oldest X number of items I had to be able to sort them. To do this, I was going to do it by PubDate. However, PubDate was in a string format that couldn't be sorted by [datetime]. By replacing PubDate with PublishDate allowed me to sort the items.

This was fun!  I left my errors here so you can see that even those of us who pretty much do PowerShell for a living are just normal folks who typo from time to time, and rely on the tools we have built into the console to succeed. 

Official Answer

 

Adam's answer is actually really close to the official answer, so we're letting Adam's answer stand as official!

Posted in:
About the Author

PowerShell.org Announcer

Profile photo of PowerShell.org Announcer

This is the official account for PowerShell.org and sponsor announcements.