Multiple Invoke-WebRequest, XML result sorting

This topic contains 3 replies, has 2 voices, and was last updated by  Charles Downing 2 years, 8 months ago.

  • Author
  • #20593


    First post, be gentle. I'm trying to create a script using PS ISE 3.0 which takes a list of reference numbers, currently in a text file, adds them as an array in a variable, and takes each reference number, adds it into a Invoke-Webrequest URI, and then with the resulting XML, takes each 'result' (the webrequest is to an enterprise search tool) and outputs it to a .csv, formatted in columns.

    So far I've managed to get the script to output the full search results for each reference number in its own .xml file (named after the searched reference number), but I'm struggling to get it to output to a .csv file in a format I can pass to my user – they essentially want a report.

    This is what I have so far, I'm just trying to work with the first 20 numbers at the moment until I get it working, and the out-file within the Invoke-WebRequest is only there during development, unless I really need to do this another way and then process each XML file. I've had to omit the URL and some of the XML node names before posting this, but it shouldn't be an issue.

    ## Define Variables
    $SearchTermNumber = get-content C:\temp\Phones_JustNumbersDirty.txt
    ## Take number, add to search string, submit to Search
    $SearchResult = $SearchTermNumber[1..20] | ForEach-Object -Process {Invoke-WebRequest  -outfile C:\temp\Phones_Results_$_.xml -PassThru -URI "[Search tool URI" }
    ## Output to CSV file
    ForEach-Object -process {
    $SearchResult.[Search result XML Node Structure].hit.reference, 
    $SearchResult.[Search result XML Node Structure].hit.database, 
    $SearchResult.[Search result XML Node Structure].hit.content.document.bridgelink
    } | Out-File C:\temp\temp_results.csv

    I've tried several different iterations of the last section, I've managed to get a CSV with column headers using something like this;

    ForEach-Object -process {
        $SearchResultCurrent = new-object PSObject
        $SearchResultCurrent | Add-Member ReferenceNumber $SearchResult.[Search result XML Node Structure].hit.reference
        $SearchResultCurrent | Add-Member Source $SearchResult.[Search result XML Node Structure].hit.database 
        $SearchResultCurrent | Add-Member BridgeURL $SearchResult.[Search result XML Node Structure].hit.content.document.bridgelink
        $SearchResultCurrent | Export-Csv C:\temp\temp_results.csv -NoTypeInformation

    ...but I can't get it to format nicely with data in there. Can someone point me in the right direction?

  • #20596

    Charles Downing

    What does your CSV look like after running this code? I'm assuming it's empty? Or maybe it has only one row? The first thing that stands out is that you aren't looping through anything in your ForEach-Object. I would expect to see it look something like this so that you are looping through each object in the $SearchResult array.

    $SearchResult | ForEach-Object -process {
    	$SearchResultCurrent = new-object PSObject
    	$SearchResultCurrent | Add-Member ReferenceNumber $_.[Search result XML Node Structure].hit.reference
    	$SearchResultCurrent | Add-Member Source $_.[Search result XML Node Structure].hit.database
    	$SearchResultCurrent | Add-Member BridgeURL $_.[Search result XML Node Structure].hit.content.document.bridgelink
    	$SearchResultCurrent | Export-Csv C:\temp\temp_results.csv -NoTypeInformation

    Of course, that outputs a new CSV for each object in $SearchResult, which I don't think is what you want, either. To get all of the data into one CSV, you'll need to dump all of the results into an array and then export that array to a single CSV, or just do that using the pipeline.

    $SearchResult | ForEach-Object -process {
    	New-Object -TypeName PSObject -Property @{
    		ReferenceNumber = $($_.[Search result XML Node Structure].hit.reference)
    		Source = $($_.[Search result XML Node Structure].hit.database)
    		BridgeURL = $($_.[Search result XML Node Structure].hit.content.document.bridgelink)
    } |
    Export-Csv C:\temp\temp_results.csv -NoTypeInformation
  • #20920


    The CSV is just three entries, on one row, the column headers, no data. I see what you're saying, I wasn't actually looping through! I've tried the first code snippet above, and it returns the same, one row, just the headers. I know the XML nodes are correct as the -outfile I'm using created it as an XML file. The second code snippet does exactly the same.

  • #20937

    Charles Downing

    Ok! This is where I should have looked at the output of Invoke-WebRequest before responding... I was just looking at the code presented and focused on the loop. So, the second loop I posted should still be the one you want to go with, but what needs to change is how you are accessing the fields you want in the CSV. Instead of using the XML format that you see in the outfile, you're going to need to traverse what is returned by Invoke-WebRequest.

    After running the real version of "$SearchResult = $SearchTermNumber[1..20] | ForEach-Object -Process {Invoke-WebRequest -outfile C:\temp\Phones_Results_$_.xml -PassThru -URI "[Search tool URI" }", take a look at the output from $SearchResult[0]. That will give you the HtmlWebResponseObject returned by the first Invoke-WebRequest. You'll see that there are several different attributes related to the HTML response. (The code below is using on the examples in help for Invoke-WebRequest):

    PS C:\>  $r = Invoke-WebRequest -URI -OutFile c:\temp\iwrtest.xml -PassThru
    PS C:>  $r
    StatusCode        : 200
    StatusDescription : OK
    Content           : //; outerText=;
                        tagName=IMG; id=id_p; class=b_icon id_avatar; style=DISPLAY: none;
    InputFields       : {@{innerHTML=; innerText=; outerHTML=; outerText=; tagName=INPUT; spellcheck=false;
                        onfocus=_ge('b_header').className='b_focus';; id=sb_form_q; title=Enter your search term;
                        class=b_searchbox; maxLength=1000; value=how many feet in a mile; name=q; autocomplete=off;
                        autocorrect=off; autocapitalize=off}, @{innerHTML=; innerText=; outerHTML=;
                        outerText=; tagName=INPUT; tabIndex=0; id=sb_form_go; title=Search; class=b_searchboxSubmit;
                        type=submit; value=Submit Query; name=go}, @{innerHTML=; innerText=; outerHTML=; outerText=; tagName=INPUT; id=sa_qs; type=hidden; value=ds;
                        name=qs}, @{innerHTML=; innerText=; outerHTML=;
                        outerText=; tagName=INPUT; type=hidden; value=QBRE; name=form}...}
    Links             : {@{innerHTML=Web; innerText=Web; outerHTML=Web; outerText=Web; tagName=A; href=/?scope=web&FORM=HDRSC1;
                        h=ID=SERP,5032.1}, @{innerHTML=Images; innerText=Images; outerHTML=Images;
                        outerText=Images; tagName=A; href=/images/search?q=how+many+feet+in+a+mile&FORM=HDRSC2;
                        h=ID=SERP,5033.1}, @{innerHTML=Videos; innerText=Videos; outerHTML=Videos;
                        outerText=Videos; tagName=A; href=/videos/search?q=how+many+feet+in+a+mile&FORM=HDRSC3;
                        h=ID=SERP,5034.1}, @{innerHTML=Maps; innerText=Maps; outerHTML=Maps; outerText=Maps; tagName=A;
                        href=/maps/default.aspx?q=how+many+feet+in+a+mile&mkt=en&FORM=HDRSC4; h=ID=SERP,5035.1}...}
    ParsedHtml        : mshtml.HTMLDocumentClass
    RawContentLength  : 75540
    PS C:>  $r | gm
       TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject
    Name              MemberType Definition
    ----              ---------- ----------
    Equals            Method     bool Equals(System.Object obj)
    GetHashCode       Method     int GetHashCode()
    GetType           Method     type GetType()
    ToString          Method     string ToString()
    AllElements       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
    BaseResponse      Property   System.Net.WebResponse BaseResponse {get;set;}
    Content           Property   string Content {get;}
    Forms             Property   Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
    Headers           Property   System.Collections.Generic.Dictionary[string,string] Headers {get;}
    Images            Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
    InputFields       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
    Links             Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
    ParsedHtml        Property   mshtml.IHTMLDocument2 ParsedHtml {get;}
    RawContent        Property   string RawContent {get;}
    RawContentLength  Property   long RawContentLength {get;}
    RawContentStream  Property   System.IO.MemoryStream RawContentStream {get;}
    Scripts           Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
    StatusCode        Property   int StatusCode {get;}
    StatusDescription Property   string StatusDescription {get;}

    You're going to have to figure out what part of that response you actually want to include in your CSV. That data is probably going to come from the innerHTML property of the objects in the AllElements property of response. That may not be too bad, depending on the complexity of the html returned by your search tool...

    Someone here may have a better solution than using Invoke-WebRequest...

You must be logged in to reply to this topic.