Author Posts

April 9, 2018 at 6:51 pm

Hi,

I am trying to do web scrap to get a value for of a particular message. Below is the xml content i am interested of all in the webpage, I have to search only the href="browse.jsp;jsessionid=oillma25wtod1qzbhi3jenj5t?JMSDestination=Consumer.Siebel.VirtualTopic.catalog_changed_events" and within that I want only the first value 0.

I tried ParsedHtmlbytagname for Tr and tried with inner text but nothing working. Kindly let me know if anyone has any thoughts on this.

——————————

Consumer.Siebel.VirtualTopic.catalog_changed_ev... Consumer.Siebel.VirtualTopic.catalog_changed_events

0
10
0
0

Browse
Active Consumers
Active Producers

Send To
Purge
Delete

———————-

April 10, 2018 at 8:02 pm

As far as I can remember this forum doesn't deal well with XML code pasted into the post.
So it's better if you paste the XML into Gist and add the Gist URL in the post.

April 16, 2018 at 7:28 pm

gist:5909bda28943fde8d80c475c09a5e09d

April 16, 2018 at 7:29 pm

Thanks for your input. Below is the link for my xml.

April 17, 2018 at 3:20 pm

Not 100% sure what you're after.

But if you have the above data in a variable in my example called htmlData you could do something like this.

$htmlData = Get-Content test.html -Raw # I put your example into a file, so you would change this to whatever suits you.
$htmlValue = $htmlData | ConvertFrom-String | Select P7

You can skip the Select P7 just to see the layout of the data.

This will only work if the data is consistant, meaning that the value entry will always end up in P7, otherwise you would need something to identify the specific tag you're searching for.

Another but a bit crude option would be to:

$htmlValues = $htmlData -split ""

Edit: The split operator would be the /TD tag but I can't add the chevrons in the example, since it will be scrubbed for the same reason I mentioned above.

Which will create an array based on splitting the raw data on the /TD tag.

Otherwise you may want to check html parsers like Html Agility Pack and so forth.
But then you're kind of leaving the Powershell realm and go into C#, XPath and Linq.

April 17, 2018 at 5:54 pm

WOW. It worked. Can I ask one last help?.

From your script, below is the output of it:

————————
P7

0
————————

I want the value '0' that is between the tag 0. I tried the regular expression and -match or -pattern but nothing is working. Below is the output of the Get-Member of the variable storing the above value.

PS C:\Users\kd****> $htmldata | Get-Member

TypeName: Selected.System.Management.Automation.PSCustomObject

Name MemberType Definition
—- ———- ———-
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
P7 NoteProperty string P7=0

April 17, 2018 at 5:56 pm

gist:0d019e8c5050b352f3e189441086a6d2

April 17, 2018 at 5:58 pm

April 17, 2018 at 8:34 pm

You could do it in multiple ways, kind of depends on how easy you want to read it and so forth.
But here is an example.

$htmlData = Get-Content test.html -Raw # I put your example into a file, so you would change this to whatever suits you.
$htmlValue = $htmlData | ConvertFrom-String | Select -ExpandProperty P7
$htmlTagValue = $htmlValue[4]

So the extra steps are -ExpandProperty which will return just the content of P7, not the header itself.
Then you can decide how you want to extract the value.
The option above is kind of quick and dirty in the sense that if the data is not consistent (same issue with P7) every time you will get errors.
What the [4] do is taking the fifth value from the string, strings can be used as if they are an array of characters.

To make it a bit more robust and if the value you want only contains numbers then you could do a simple regex instead.

$htmlTagValue = $htmlValue -replace '\D'

But it depends possible values in the tag, what you need and can do and so forth.

April 17, 2018 at 9:07 pm

I sincerely thank you for your quick and detail response. It perfectly worked. Many Thanks Mr.Fredrik Kacsmarck.