Parse info from a website

Welcome Forums General PowerShell Q&A Parse info from a website

This topic contains 9 replies, has 4 voices, and was last updated by

js
 
Participant
1 year, 4 months ago.

  • Author
    Posts
  • #74648

    Participant
    Points: 13
    Rank: Member

    As I have been unable to find a current RSS feed for windows updates I am trying to parse some data from the Microsoft support site. I can dynamically build the URL as the only bits that will change are the numbers of the knowledge base article. I am interested in the text below where it say Summary but cant find a method to extract this information with Invoke-Webrequest

    an example url is below

    $web= Invoke-WebRequest "https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title"

  • #74669

    Keymaster
    Points: 12
    Team Member
    Rank: Member

    So there's a couple of ways you could do it, but before I start jumping down the wrong rabbit hole, would this site get you the data you need?

    https://support.microsoft.com/en-us/gp/selectrss?target=rss

  • #74671

    Participant
    Points: 13
    Rank: Member

    Hi, Thanks for the reply. I looked at that site rss feeds but it would appear that this is no longer being updated

  • #74674

    Participant
    Points: 13
    Rank: Member

    I am trying

    $web ="https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title"
    $data = invoke-Webrequest $web
    $result = $data.ParsedHtml.body.getElementsByClassName('kb-summary-section section ng-scope.x-hidden-focus')

    As I only want the information in the summary but nothing is being passed back to $result

  • #74724
    js

    Participant
    Points: 207
    Helping Hand
    Rank: Participant

    invoke-restmethod?

  • #74734

    Participant
    Points: 0
    Rank: Member

    Hi Simon,

    it seemed to me that you are doing everything right, but when you output the complete raw result of the request, it doesn't seem to be anything useful. So I tried using the Internet Explorer COM Object through PS and it worked. Not pretty, but gets the result you are looking for:

       $ie = new-object -ComObject "InternetExplorer.Application"
       $ie.silent = $true
       $ie.navigate($web)
       while($ie.busy){ sleep 1 }
       $result = $ie.document.body.getElementsByClassName("kb-summary-section") | select -ExpandProperty innertext
       $ie.quit()
    

    Cheers
    Wilm

  • #74736

    Participant
    Points: 13
    Rank: Member

    Thanks Wilm that does the trick.

  • #74853

    Participant
    Points: 13
    Rank: Member

    Just when I thought it was safe to go back into the water 🙂
    When I use the following :-
    $ie = new-object -ComObject "InternetExplorer.Application"
    $ie.silent = $true
    $web ="https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title"
    $ie.navigate($web)
    $result = ""
    $result = $ie.document.body.getElementsByClassName("container section-body") | select -ExpandProperty innertext
    $kbarticle = $result -split "Symptom" | select -first 1
    $ws.cells.item($intRow,4) = $kbarticle
    $ws.cells.item($intRow,5) = $web

    It writes the contents of $kbarticle to the cell in excel (ok I have not included to code to open excel here) but there are 2 carriage returns at the top of the data so in order to see the data you have to click into the cell (I spent hours thinking it wasn't writing the data before I spotted the 2 Carriage returns 🙂 ). I have tried $kbarticle.Trim() but that does not seem to work. Any ideas

  • #74863

    Participant
    Points: 13
    Rank: Member

    I fixed the issue with

    $trimmedkbArticle = $kbarticle.ToString()
    $ws.cells.item($intRow,4) = $trimmedkbarticle.Trim()

    • #74896
      js

      Participant
      Points: 207
      Helping Hand
      Rank: Participant

      Using invoke-restmethod with an rss feed? This returns an array of [XmlElement]'s.

      $a = Invoke-RestMethod https://support.microsoft.com/en-us/rss?rssid=18165
      show-object $a  # PowershellCookbook module
      

The topic ‘Parse info from a website’ is closed to new replies.