This topic contains 9 replies, has 2 voices, and was last updated by
April 9, 2018 at 6:51 pm #98328
I am trying to do web scrap to get a value for of a particular message. Below is the xml content i am interested of all in the webpage, I have to search only the href="browse.jsp;jsessionid=oillma25wtod1qzbhi3jenj5t?JMSDestination=Consumer.Siebel.VirtualTopic.catalog_changed_events" and within that I want only the first value 0.
I tried ParsedHtmlbytagname for Tr and tried with inner text but nothing working. Kindly let me know if anyone has any thoughts on this.
April 10, 2018 at 8:02 pm #98392
As far as I can remember this forum doesn't deal well with XML code pasted into the post.
So it's better if you paste the XML into Gist and add the Gist URL in the post.
April 16, 2018 at 7:29 pm #98968
Thanks for your input. Below is the link for my xml.
April 16, 2018 at 7:28 pm #98965
April 17, 2018 at 3:20 pm #99051
Not 100% sure what you're after.
But if you have the above data in a variable in my example called htmlData you could do something like this.
$htmlData = Get-Content test.html -Raw # I put your example into a file, so you would change this to whatever suits you. $htmlValue = $htmlData | ConvertFrom-String | Select P7
You can skip the Select P7 just to see the layout of the data.
This will only work if the data is consistant, meaning that the value entry will always end up in P7, otherwise you would need something to identify the specific tag you're searching for.
Another but a bit crude option would be to:
$htmlValues = $htmlData -split ""
Edit: The split operator would be the /TD tag but I can't add the chevrons in the example, since it will be scrubbed for the same reason I mentioned above.
Which will create an array based on splitting the raw data on the /TD tag.
Otherwise you may want to check html parsers like Html Agility Pack and so forth.
But then you're kind of leaving the Powershell realm and go into C#, XPath and Linq.
April 17, 2018 at 5:54 pm #99061
WOW. It worked. Can I ask one last help?.
From your script, below is the output of it:
I want the value '0' that is between the tag 0. I tried the regular expression and -match or -pattern but nothing is working. Below is the output of the Get-Member of the variable storing the above value.
PS C:\Users\kd****> $htmldata | Get-Member
Name MemberType Definition
—- ———- ———-
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
P7 NoteProperty string P7=0
April 17, 2018 at 5:56 pm #99063
April 17, 2018 at 5:58 pm #99066
April 17, 2018 at 8:34 pm #99073
You could do it in multiple ways, kind of depends on how easy you want to read it and so forth.
But here is an example.
$htmlData = Get-Content test.html -Raw # I put your example into a file, so you would change this to whatever suits you. $htmlValue = $htmlData | ConvertFrom-String | Select -ExpandProperty P7 $htmlTagValue = $htmlValue
So the extra steps are -ExpandProperty which will return just the content of P7, not the header itself.
Then you can decide how you want to extract the value.
The option above is kind of quick and dirty in the sense that if the data is not consistent (same issue with P7) every time you will get errors.
What the  do is taking the fifth value from the string, strings can be used as if they are an array of characters.
To make it a bit more robust and if the value you want only contains numbers then you could do a simple regex instead.
$htmlTagValue = $htmlValue -replace '\D'
But it depends possible values in the tag, what you need and can do and so forth.
April 17, 2018 at 9:07 pm #99088
I sincerely thank you for your quick and detail response. It perfectly worked. Many Thanks Mr.Fredrik Kacsmarck.
The topic ‘Web Scraping. I want to pull only one value within one of the tag in the xml.’ is closed to new replies.