Parse info from a website

This topic contains 9 replies, has 4 voices, and was last updated by  js 3 months, 1 week ago.

  • Author
    Posts
  • #74648

    Simon B
    Participant

    As I have been unable to find a current RSS feed for windows updates I am trying to parse some data from the Microsoft support site. I can dynamically build the URL as the only bits that will change are the numbers of the knowledge base article. I am interested in the text below where it say Summary but cant find a method to extract this information with Invoke-Webrequest

    an example url is below

    $web= Invoke-WebRequest "https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title"

  • #74669

    Will Anderson
    Keymaster

    So there's a couple of ways you could do it, but before I start jumping down the wrong rabbit hole, would this site get you the data you need?

    https://support.microsoft.com/en-us/gp/selectrss?target=rss

  • #74671

    Simon B
    Participant

    Hi, Thanks for the reply. I looked at that site rss feeds but it would appear that this is no longer being updated

  • #74674

    Simon B
    Participant

    I am trying

    $web ="https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title"
    $data = invoke-Webrequest $web
    $result = $data.ParsedHtml.body.getElementsByClassName('kb-summary-section section ng-scope.x-hidden-focus')

    As I only want the information in the summary but nothing is being passed back to $result

  • #74724

    js
    Participant

    invoke-restmethod?

  • #74734

    Wilm Reiche
    Participant

    Hi Simon,

    it seemed to me that you are doing everything right, but when you output the complete raw result of the request, it doesn't seem to be anything useful. So I tried using the Internet Explorer COM Object through PS and it worked. Not pretty, but gets the result you are looking for:

       $ie = new-object -ComObject "InternetExplorer.Application"
       $ie.silent = $true
       $ie.navigate($web)
       while($ie.busy){ sleep 1 }
       $result = $ie.document.body.getElementsByClassName("kb-summary-section") | select -ExpandProperty innertext
       $ie.quit()
    

    Cheers
    Wilm

  • #74736

    Simon B
    Participant

    Thanks Wilm that does the trick.

  • #74853

    Simon B
    Participant

    Just when I thought it was safe to go back into the water 🙂
    When I use the following :-
    $ie = new-object -ComObject "InternetExplorer.Application"
    $ie.silent = $true
    $web ="https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title"
    $ie.navigate($web)
    $result = ""
    $result = $ie.document.body.getElementsByClassName("container section-body") | select -ExpandProperty innertext
    $kbarticle = $result -split "Symptom" | select -first 1
    $ws.cells.item($intRow,4) = $kbarticle
    $ws.cells.item($intRow,5) = $web

    It writes the contents of $kbarticle to the cell in excel (ok I have not included to code to open excel here) but there are 2 carriage returns at the top of the data so in order to see the data you have to click into the cell (I spent hours thinking it wasn't writing the data before I spotted the 2 Carriage returns 🙂 ). I have tried $kbarticle.Trim() but that does not seem to work. Any ideas

  • #74863

    Simon B
    Participant

    I fixed the issue with

    $trimmedkbArticle = $kbarticle.ToString()
    $ws.cells.item($intRow,4) = $trimmedkbarticle.Trim()

    • #74896

      js
      Participant

      Using invoke-restmethod with an rss feed? This returns an array of [XmlElement]'s.

      $a = Invoke-RestMethod https://support.microsoft.com/en-us/rss?rssid=18165
      show-object $a  # PowershellCookbook module
      

You must be logged in to reply to this topic.