removing multiline blocks from a text file based on a pattern

This topic contains 9 replies, has 5 voices, and was last updated by  John Mooper 1 week, 4 days ago.

  • Author
    Posts
  • #81794

    John Mooper
    Participant

    Hello. I have a text file that looks like this:
    {
    "something" "else"
    "even" "2"
    "moretext" "704 1696 -40"
    }
    {
    "text" "random"
    "odd" "1"
    "never" "more"
    }
    ...

    Basically it's a series of multiline blocks enclosed in curly brackets. There can be any number of line between the brackets.
    What I need is to search this file for a string and if the string is found then remove the block(s) in curly brackets (including them) that contain it. So for example if I search for 'more', it should delete the whole block:
    {
    "text" "random"
    "odd" "1"
    "never" "more"
    }
    If I search for '1', it should remove both blocks.

    I'm sure it can be done with regex, but my regex skills are too low and I can't figure this out.
    Any help is appreciated.

  • #81796

    Jeremy Murrah
    Participant

    Short answer:

    $($(get-content sample.txt) -join '').split('}').trimstart('{') | where-object {$_ -notlike "*random*"}
    

    long answer:
    get-content will import your text file with each line as an item in an array.

    get-content sample.txt
    
    {
    "something" "else"
    "even" "2"
    "moretext" "704 1696 -40"
    }
    {
    "text" "random"
    "odd" "1"
    "never" "more"
    }
    

    Step one would be to use -join to make one long string out of the input.

    $(get-content sample.txt) -join ''
    
    {"something" "else""even" "2""moretext" "704 1696 -40"}{"text" "random""odd" "1""never" "more"}
    

    Step 2 would be to then turn that string back into an array by splitting on the closing curly brace.

    $($(get-content sample.txt) -join '').split('}')
    
    {"something" "else""even" "2""moretext" "704 1696 -40"
    {"text" "random""odd" "1""never" "more"
    

    Step 3 we clean up the opening curly brace

    $($(get-content sample.txt) -join '').split('}').trimstart('{')
    
    "something" "else""even" "2""moretext" "704 1696 -40"
    "text" "random""odd" "1""never" "more"
    

    Step 4 we use where-object to filter out any items that have the magic word

    $($(get-content sample.txt) -join '').split('}') | where-object {$_ -notlike "*random*"}
    
    "something" "else""even" "2""moretext" "704 1696 -40"
    
  • #81803

    John Mooper
    Participant

    Jeremy, thank you,that works.
    I need to preserve the original file format though (haven't mentioned that explicitly above), how can I achieve that with your code?

  • #81806

    Hi John,

    You should use a RegEx like this:

    \{(.|\n)*?more(.|\n)*?\}

    In this case if it finds the word "more" it will mark the whole text within {}; however, it matches even part of the word as well. In your example, it will select both since the first group has the word "moretext" and the second one "more"

  • #81809

    John Mooper
    Participant

    Leandro, I'm not getting any matches using your regex. Maybe because Get-Content loads the file as array of lines?
    edit: but yes, it should include partial matches too, my example above was incorrect, I didn't notice both block contained 'more'

  • #81814

    Yeah, Get-Content reads the file as an array and the regex works for a full text per say; try doing a

    Get-Content -Path FILE_PATH -Raw
    
  • #81841

    Curtis Smith
    Participant

    You are going to run into a problem where it crosses multiple blocks with the \{(.|\n)*?more(.|\n)*?\} pattern.

    For example: \{(.|\n)*?never(.|\n)*?\} matches

    {
    "something" "else"
    "even" "2"
    "moretext" "704 1696 -40"
    }
    {
    "text" "random"
    "odd" "1"
    "never" "more"
    }

    Rather than just

    {
    "text" "random"
    "odd" "1"
    "never" "more"
    }

    What I would do is first find all of my blocks, then filter out the ones I don't want, then join all the remaining blocks back together.

    Somthing like this:

    cls
    $exclude = "odd"
    ((Get-Content -Path "D:\New Text Document.txt" -Raw | Select-String -Pattern "(?s)\{.*?\}" -AllMatches).matches.value | Select-String -Pattern $exclude -NotMatch) -join "`n"
  • #81860

    postanote
    Participant

    One more for your consideration...

    $RandomData = @'
    {
    "something" "else"
    "even" "2"
    "A new record" "704 1696 -40"
    }
    {
    "something" "else"
    "even" "2"
    "I want this one" "704 1696 -40"
    }
    {
    "text" "random"
    "odd" "1"
    "never" "more"
    }
    {
    "something" "else"
    "even" "2"
    "moretext" "704 1696 -40"
    }
    {
    "text" "random"
    "odd" "1"
    "never" "And this one also"
    }
    {
    "something" "else"
    "even" "2"
    "moretext" "704 1696 -40"
    }
    {
    "something" "else"
    "even" "2"
    "The last record" "704 1696 -40"
    }
    '@

    # Remove all record entries taht match the string 'more'

    # Validate pattern match
    $RandomData -match '.[^]\b[^?{]*(.*more.*)\b(.|\n)*?\}*}'
    # True

    # Get all matches
    [regex]::Matches($RandomData,'.[^]\b[^?{]*(.*more.*)\b(.|\n)*?\}*}').Value

    {
    "text" "random"
    "odd" "1"
    "never" "more"
    }
    {
    "something" "else"
    "even" "2"
    "moretext" "704 1696 -40"
    }
    {
    "something" "else"
    "even" "2"
    "m

    # Remove matches from the the data
    $RandomData -replace '.[^]\b[^?{]*(.*more.*)\b(.|\n)*?\}*}'

    {
    "something" "else"
    "even" "2"
    "A new record" "704 1696 -40"
    }
    {
    "something" "else"
    "even" "2"
    "I want this one" "704 1696 -40"
    }
    {
    "text" "random"
    "odd" "1"
    "never" "And this one also"
    }
    {
    "something" "else"
    "even" "2"
    "The last record" "704 1696 -40"
    }

    • #81871

      Curtis Smith
      Participant

      Nice, I never even considered using -replace, but it makes a lot of sense.

      Here is another regex that could be used with -replace and takes less steps to process.

      $exclude = "odd"
      (Get-Content -Path "D:\New Text Document.txt" -Raw) -replace "{[^\}]*$exclude[^\}]*}(\r\n|\n)"
  • #81880

    John Mooper
    Participant

    Thanks everyone for the input. I went with Curtis' code in the end, works well.
    At least I got some regexes to study.

You must be logged in to reply to this topic.