Regex help

This topic contains 18 replies, has 6 voices, and was last updated by Profile photo of Curtis Smith Curtis Smith 4 months ago.

Viewing 15 posts - 1 through 15 (of 19 total)
  • Author
    Posts
  • #41026
    Profile photo of ertuu85
    ertuu85
    Participant

    I have a multi lined string that I'm trying to grab a portion of such as:

    $body
    
    [html]
    whatever
    whatever
    whatever
    [table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
    ...
    ...
    ...
    [/table]
    whatever
    [/html]
    

    I've tried

    $body -match '[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"].*[/table]'
    

    Which just returns false. I imagine it's only returning one line and not reading until EOF. How can I get it to read everything between [table...[/table]?

    edited to remove and replace with [ ]

    • This topic was modified 4 months ago by Profile photo of ertuu85 ertuu85.
    #41039
    Profile photo of Don Jones
    Don Jones
    Keymaster

    -match is supposed to return True/False, but it also creates the $matches collection, which is what you'd look at to see what it matched. Whether it matches the first instance or continues to look for additional instances depends on whether your regular expression was written to do that. And honestly, for this purpose, you might find Select-String to be a bit more useful than -match.

    But to go further, -match is only designed to tell you _if it found a match or not_. If you want to _capture_ what it matched, you need to write a capturing (group) subexpression in your regex. That will populate $matches with what it captured. You can even give your capture group a name in your regex, and $matches will use that name, making it easier to reference what it found.

    #41052
    Profile photo of ertuu85
    ertuu85
    Participant

    Not sure how to use select-string here to grab and return my match, this below returns false...

    Select-String -InputObject $body -simplematch"[table class=`"MsoNormalTable`" border=`"1`" cellspacing=`"0`" cellpadding=`"0`" width=`"900`" style=`"width:675.0pt;border:solid black 1.0pt`"]*[/table]"
    
    #41057
    Profile photo of Don Jones
    Don Jones
    Keymaster

    Well, a couple of things. -SimpleMatch isn't a regular expression; it's just a wildcard match. And, by default, letting you know you have a match is all the cmdlet is supposed to do.

    Also, if you delimit your pattern in single quotes, you can use double quotes within and not have to escape them ;).

    You should also know a bit about how regular expressions and patterns work. They're fairly literal – meaning if the attributes in that TABLE tag are in a different order, it won't match them. I'm assuming you already thought of that, and that the HTML you're using is consistent. But a -SimpleMatch isn't intended to _capture_ anything. As I wrote earlier, you need a _capturing subexpression_ in a regex.

    That means using -Pattern to specify your pattern. And, instead of "*" to match the inside of the TABLE, you're probably going to want to use something like (*+). Keep in mind that * only matches a single character; *+ means match more than one. The (parentheses) create a capturing subexpression. However, that example is a _greedy_ subexpression. That means, if your HTML contains more than one TABLE, it'll match from the beginning of the first one to the end of the last one, and everything in between. I'm not sure what your HTML looks like, or what your goal is, but you may need to modify it to be a _non-greedy_ subexpression.

    You probably want to use the -AllMatches switch, also.

    What you're trying to do is certainly straightforward, I think, but regular expressions aren't as straightforward as I wish they were ;). It'd be worth some time to read up on capturing subexpressions and greedy vs. non-greedy subexpressions, so you can figure out what the right technique is to meet your goal.

    #41060
    Profile photo of ertuu85
    ertuu85
    Participant

    Here is an example of the HTML: http://pastebin.com/MtSa06ue

    Basically I just want to grab the pertinent table and analyze the data in it

    The table will always start

    [table class=`"MsoNormalTable`" border=`"1`" cellspacing=`"0`" cellpadding=`"0`" width=`"900`" style=`"width:675.0pt;border:solid black 1.0pt`"]
    

    I can get it to match on

    Select-String -InputObject $body -pattern "[table class=`"MsoNormalTable`" border=`"1`" cellspacing=`"0`" cellpadding=`"0`" width=`"900`" style=`"width:675.0pt;border:solid black 1.0pt`"].*"
    

    But its but I cant get it to return until it hits [/table].

    But I've wasted more than enough of your time and I'll do some more research on my own, I'm sure experienced users are saying 'HE TOLD YOU WHAT TO DO ALREADY!!' 😉

    Thanks Don!

    #41064
    Profile photo of Dan Potter
    Dan Potter
    Participant

    What do you aim to do with that string? Would it be easier to work with objects?

    $web = Invoke-WebRequest -Uri 'http://www.w3schools.com/html/html_tables.asp'
    $Web.ParsedHtml.getElementsByTagName("TABLE") | select -First 1

    #41071
    Profile photo of ertuu85
    ertuu85
    Participant

    I just need to grab the table starting on line 747 and ending on 868.

    I thought I could just use regex since it will always start (and should be unique) with:

    [table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt"]

    and all the text between it to include the [/table].

    So at the end I would have the complete [table]...[/table] which I could create reports/alerts for and send in email form

    #41081
    Profile photo of Don Jones
    Don Jones
    Keymaster

    You know, if it's consistently at those line numbers, it's easy:

    Get-Content filename.html | Select -skip 747 -first 121

    😉

    #41083
    Profile photo of ertuu85
    ertuu85
    Participant

    I wish it was that easy 😉

    The entire HTML will actually be the body of an email that was retrieved through powershell, never makes it to a file. And I'm not sure if it always starts on 747, but the table header should be unique.

    If $body is the entire powershell, and I do a

    $body -match '.*' it only matches the first line, how would I make it so it makes the entire string?

    #41089
    Profile photo of Dan Potter
    Dan Potter
    Participant
    
    $body = @'
    [html]
    whatever
    whatever
    whatever
    [table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
    ...
    ...
    ...
    [/table]
    whatever
    [/html]
    '@
    
    
    ($body -split 'table class' | ? {$_ -like "=*"}).trimstart('=')
    
    
    #41091
    Profile photo of random commandline
    random commandline
    Participant
    $body = '
    [html]
    whatever
    whatever
    whatever
    [table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
    random text 1
    [/table]
    whatever
    [/html]
    '
    $body -match "table(?'table'.*)\[/table" ; $Matches.table
    
    #41095
    Profile photo of Dan Potter
    Dan Potter
    Participant

    I guess if you just want that string and not what follows it.

    $body -split "`n" | ? {$_ -match 'table class'}

    #41097
    Profile photo of ertuu85
    ertuu85
    Participant

    Random Comandline, when I run your example, it comes back false

    #41099
    Profile photo of Dan Potter
    Dan Potter
    Participant
    
    Import-Module -Name "C:\Program Files\Microsoft\Exchange\Web Services\2.0\Microsoft.Exchange.WebServices.dll"
    
    $s = New-Object Microsoft.Exchange.WebServices.Data.ExchangeService([Microsoft.Exchange.WebServices.Data.ExchangeVersion]::Exchange2010_SP1)
    
    $s.Credentials = New-Object Microsoft.Exchange.WebServices.Data.WebCredentials('me', 'Password', 'domain')
    
    $s.AutodiscoverUrl([email protected]', { $true })
    
    $inbox = [Microsoft.Exchange.WebServices.Data.Folder]::Bind($s, [Microsoft.Exchange.WebServices.Data.WellKnownFolderName]::Inbox)
    
    $emails = $inbox.FindItems(1)
    
    $emails.load()
    
    $emails.body.text |ConvertTo-Html | Select-String -Pattern 'head' -Context 0,3
    
    
    • This reply was modified 4 months ago by Profile photo of Dan Potter Dan Potter.
    #41116
    Profile photo of random commandline
    random commandline
    Participant

    Make sure you run it in the consolehost not ISE.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic.