need help splitting text

This topic contains 6 replies, has 5 voices, and was last updated by Profile photo of Rob Campbell Rob Campbell 2 years, 6 months ago.

  • Author
    Posts
  • #15487
    Profile photo of John Mooper
    John Mooper
    Participant

    Hi everyone, I need to split text which contains random text with the dash '-' character surrounded by whitespaces and also dash without whitespaces. I need to split it around the dash which is not surrounded by whitespace. I assume this can be done with a regex, but I can't figure out the pattern.

    Sample text: "this is a – sample test to demonstrate-what i wrote – above"
    Desired split output: "this is a – sample test to demonstrate", "what i wrote – above"

    Any help is appreciated.

  • #15488
    Profile photo of Stein Petersen
    Stein Petersen
    Participant

    Hi John. You could use -split "\b-\b" to get the split you want.

  • #15490
    Profile photo of John Mooper
    John Mooper
    Participant

    Thank you Stein, that worked. Can you explain what exactly does it do? I found that \b means word boundary, but that doesn't really help me understand the regex.

  • #15498
    Profile photo of Martin Nielsen
    Martin Nielsen
    Participant

    "Matches at a position that is followed by a word character but not preceded by a word character, or that is preceded by a word character but not followed by a word character."

    http://www.regular-expressions.info/refwordboundaries.html

  • #15626
    Profile photo of John Mooper
    John Mooper
    Participant

    Thanks Dave, I guess I should spend some time on reg. expressions, I've been trying to avoid them until now 🙂

  • #15527
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    Keep in mind that "word characters" means letters, numbers and underscores (mostly; there's a little bit of variance here between different regex implementations.) Relying on \b might cause you problems if the hyphen has some other non-whitespace character on either side, such as parentheses, punctuation marks, etc.

    Here's a tweak to the pattern which uses negative lookbehind and lookahead assertions to make sure that the hyphen does not have a whitespace character on either side of it:

    $text = 'This - is - a-(test) - One - Two - Three'
    
    Write-Verbose -Verbose 'Split with \b pattern'
    $text -split '\b-\b'
    
    Write-Verbose -Verbose 'Split with whitespace negative assertions pattern'
    $text -split '(?< !\s)-(?!\s)'
    

    Reference on lookahead / lookbehind assertions: http://www.regular-expressions.info/lookaround.html

  • #15647
    Profile photo of Rob Campbell
    Rob Campbell
    Participant

    You can also use positive "Lookaround" regexes with \S

    '(?<=\S)-(?=\S)' Will match any – that is immediately preceeded and immediately followed by any non-whitespace character.

You must be logged in to reply to this topic.