Group by block of lines between two strings

This topic contains 13 replies, has 4 voices, and was last updated by Profile photo of Curtis Smith Curtis Smith 5 months ago.

  • Author
    Posts
  • #44721
    Profile photo of Ricardo Bragança
    Ricardo Bragança
    Participant

    Good afternoon everyone,
    I have a .txt file like this:

    BLUE|        |     |     |1111  | BEGIN
    BLUE|        |     |     |1112  | BTATI
    YELL|        |     |     |1113  | CBDE2
    YELL|        |     |     |1114  | MATTT
    YELL|        |     |     |1115  | MATTT
    YELL|        |     |     |1116  | END00
    BLUE|        |     |     |1117  | BEGIN
    BLUE|        |     |     |1118  | BTATI
    BLUE|        |     |     |1119  | CBDE2
    BLUE|        |     |     |1110  | MATTT
    BLUE|        |     |     |1121  | END00

    I want to get the .txt file group by block lines that starts at "BEGIN" and ends with "END00" and the first word of the "BEGIN" line is BLUE and the first word line "END00" is YELL has which replace by YELL BLUE.

    BLUE|        |     |     |1111 | BEGIN
    BLUE|        |     |     |1112   | BTATI
    BLUE|        |     |     |1113  | CBDE2
    BLUE|        |     |     |1114  | MATTT
    BLUE|        |     |     |1115  | MATTT
    BLUE|        |     |     |1116  | END00
    BLUE|        |     |     |1117  | BEGIN
    BLUE|        |     |     |1118  | BTATI
    BLUE|        |     |     |1119  | CBDE2
    BLUE|        |     |     |1110  | MATTT
    BLUE|        |     |     |1121  | END00

    Important: The Output must be a .txt file with the same caractericas the initial (size fields, delimiter ...) but with the changes made.

    someone can help me?

  • #44856
    Profile photo of Don Jones
    Don Jones
    Keymaster

    I would use Import-CSV to import the file, specifying a -Delimiter "|" to have it break on the pipe characters. From there, a ForEach loop would let you enumerate each line one at a time and work with the column values.

    ForEach ($line in (Import-CSV whatever.txt -Delim "|")) {
    $line[0] # first column
    $line[5] # last column
    }

    Unfortunately, I'm not understanding what it is you want to do with the data. Additionally, Export-CSV, while it can use the | as a delimiter, might not preserve the column widths. You might need to manage that yourself.

    • #44868
      Profile photo of Ricardo Bragança
      Ricardo Bragança
      Participant

      Good evening, first of all thanks for the help.

      these lines within the foreach are not valid because if your realizing the example the "END" is not always in position 5, can appear in position 4 or position 6 or in any other.

      $line[0] # first column
      $line[5] # last column


      I think it should be something who searched the strings "BEGIN" and "END", but do not know what.

      What I want to do with the data is something something like this:

      foreach {IF (H5 like "BEGIN" and H1 like "BLUE" and H5 Like "END" and H1 like "YELL") {
      H1 replace "YELL", "BLUE"
      }
      }

      And change the initial document only these cases and keep the rest who do not obey this condition.

      How can I manipulate the size of the fields?

  • #44935
    Profile photo of random commandline
    random commandline
    Participant

    Do you need to change the word 'YELL' to 'BLUE'? If so, this should work.

    (Get-Content \\path\to\textfile) -replace 'YELL','BLUE' 
    

    Do all lines start with 'BLUE' or 'YELL'?
    If so, do you need to capture all lines between the 'BEGIN' and 'END00' lines of each text file?
    If not, do you need to capture ONLY lines that begin with 'BLUE' or 'YELL' that are between the
    'BEGIN' and 'END00' lines of each text file?

    Should your results look like this?

    Group 1
    BLUE|        |     |     |1111  | BEGIN
    BLUE|        |     |     |1112  | BTATI
    YELL|        |     |     |1113  | CBDE2
    YELL|        |     |     |1114  | MATTT
    YELL|        |     |     |1115  | MATTT
    YELL|        |     |     |1116  | END00
    
    Group2
    BLUE|        |     |     |1117  | BEGIN
    BLUE|        |     |     |1118  | BTATI
    BLUE|        |     |     |1119  | CBDE2
    BLUE|        |     |     |1110  | MATTT
    BLUE|        |     |     |1121  | END00
    
    • #45452
      Profile photo of Ricardo Bragança
      Ricardo Bragança
      Participant

      Yes, all lines start with "BLUE" or "YELL".

      I need to capture ONLY lines between the 'BEGIN' and 'END00' that has "BLUE" in 'BEGIN' and "YELL" in 'END00'.

      Next step is for this lines captured replace the "YELL" by "BLUE".

      Important: it is necessary that the columns of the output document have the same dimension of the same document import.

      The result I want are something like:

      BLUE|        |     |     |1111  | BEGIN
      BLUE|        |     |     |1112  | BTATI
      BLUE|        |     |     |1113  | CBDE2
      BLUE|        |     |     |1114  | MATTT
      BLUE|        |     |     |1115  | MATTT
      BLUE|        |     |     |1116  | END00
      BLUE|        |     |     |1117  | BEGIN
      BLUE|        |     |     |1118  | BTATI
      BLUE|        |     |     |1119  | CBDE2
      BLUE|        |     |     |1110  | MATTT
      BLUE|        |     |     |1121  | END00
  • #45500
    Profile photo of random commandline
    random commandline
    Participant

    If I am understanding, this should work.

    (Get-Content \\path\to\textfile1.txt) -replace 'YELL','BLUE' | Add-Content \\path\to\textfile2.txt
    
  • #45509
    Profile photo of Ricardo Bragança
    Ricardo Bragança
    Participant

    But first is missing that:

    I need to capture ONLY lines between the 'BEGIN' and 'END00' that has "BLUE" in 'BEGIN' and "YELL" in 'END00'.

    I need to pass only the lines corresponding to this condition and this code passes all the lines and corrects all "YELL" to "BLUE" even those who do not obey to this condition.

  • #45550
    Profile photo of random commandline
    random commandline
    Participant

    I think a switch statement would solve your problem. I am still confused as to why my previous post does not work for you.

    $files = Get-ChildItem "\\path\to\textfile1.txt"
    $count = $null
    switch -Regex -File $files
    {
        'BEGIN' {$count++;Write-Verbose "Group $count found" -Verbose} 
        '^BLUE|^YELL' {$_ -replace 'YELL','BLUE' | Add-Content -Path "\\path\to\textfile2.txt"}
        'END00' {continue}
    }
    
    • #45620
      Profile photo of Ricardo Bragança
      Ricardo Bragança
      Participant

      Good morning,

      sorry if I did not explain myself properly,

      The problem of the both post are both search every line of the .txt file and when find the string "YELL" replace "YELL" by "BLUE" independently the string are in the first column or in 10th.

      A big problem is if you have a string "YELLOW" with the both post the result are "BLUEOW" because it´s made the replace "YELL" by "BLUE"

      Another problem is I wont Only replace the "YELL" by "BLUE" When i have something like that:

      BLUE|        |     |     |1111  | BEGIN
      BLUE|        |     |     |1112  | BTATI
      YELL|        |     |     |1113  | CBDE2
      YELL|        |     |     |1114  | MATTT
      YELL|        |     |     |1115  | MATTT
      YELL|        |     |     |1116  | END00

      IF I have something like that I don't Want replace "YELL" by "BLUE":

      YELL|        |     |     |1111  | BEGIN
      YELL|        |     |     |1112  | BTATI
      YELL|        |     |     |1113  | CBDE2
      YELL|        |     |     |1114  | MATTT
      YELL|        |     |     |1115  | MATTT
      YELL|        |     |     |1116  | END00
  • #45623
    Profile photo of Curtis Smith
    Curtis Smith
    Participant

    Hi Ricardo,
    Here is an example of the logic you need. Basically you have to test each line to determine if is it the appropriate begin or end line and match them up to create your block of text. You can then do your replace on just that block. Let me know if you need explanation on what the below is doing.

    $data = @'
    BLUE|        |     |     |1111  | BEGIN
    BLUE|        |     |     |1112  | BTATI
    YELL|        |     |     |1113  | CBDE2
    YELL|        |     |     |1114  | MATTT
    YELL|        |     |     |1115  | MATTT
    YELL|        |     |     |1116  | END00
    BLUE|        |     |     |1117  | BEGIN
    BLUE|        |     |     |1118  | BTATI
    BLUE|        |     |     |1119  | CBDE2
    BLUE|        |     |     |1110  | MATTT
    BLUE|        |     |     |1121  | END00
    BLUE|        |     |     |2111  | BEGIN
    BLUE|        |     |     |2112  | BTATI
    YELL|        |     |     |2113  | CBDE2
    YELL|        |     |     |2114  | MATTT
    YELL|        |     |     |2115  | MATTT
    YELL|        |     |     |2116  | END00
    BLUE|        |     |     |2117  | BEGIN
    BLUE|        |     |     |2118  | BTATI
    BLUE|        |     |     |2119  | CBDE2
    BLUE|        |     |     |2110  | MATTT
    BLUE|        |     |     |2121  | END00
    '@ -split "`n"
    
    For ($i=0;$i -lt $data.count;$i++) {
        Switch -Regex ($data[$i]) {
            "^BLUE.*BEGIN$" {$bluebegin = $i}
            "END00$" {
                        If (($data[$i] -match "^YELL") -and (Get-variable bluebegin -ErrorAction SilentlyContinue)) {
                            $data[$bluebegin..$i] | ForEach-Object {$_ -replace "^YELL\|", "BLUE|"}
                        } elseif (Get-variable bluebegin -ErrorAction SilentlyContinue) {
                            Remove-Variable bluebegin
                        }
                     }
        }
    }
    

    Results:

    BLUE|        |     |     |1111  | BEGIN
    BLUE|        |     |     |1112  | BTATI
    BLUE|        |     |     |1113  | CBDE2
    BLUE|        |     |     |1114  | MATTT
    BLUE|        |     |     |1115  | MATTT
    BLUE|        |     |     |1116  | END00
    BLUE|        |     |     |2111  | BEGIN
    BLUE|        |     |     |2112  | BTATI
    BLUE|        |     |     |2113  | CBDE2
    BLUE|        |     |     |2114  | MATTT
    BLUE|        |     |     |2115  | MATTT
    BLUE|        |     |     |2116  | END00
    
    • #45632
      Profile photo of Ricardo Bragança
      Ricardo Bragança
      Participant

      Hi Curtis,

      Two questions for now if I want import a txt file instead of a multiple lines how do I do that?
      And for export for a txt file with same structure have in the initial files how do I do that?

  • #45710
    Profile photo of Curtis Smith
    Curtis Smith
    Participant

    Hi Richardo,
    You would use get-content to get the content of the text file.

    You would use out-file to write the data out to a file. Since the out-put data is being generated in a for loop, you would need to either output the data with out-file using the -append parameter inside the for loop, or collect all the data into a variable inside the for loop and then write it out to file once after the loop has completed.

    • #45837
      Profile photo of Ricardo Bragança
      Ricardo Bragança
      Participant

      Hi curtis,

      I do my changes in the code but the result is not the expected.
      I don't no why. I Think Did I do something wrong. This is what I have.

      $data = get-content C:\Users\ricardo.braganca\Desktop\NEW\*.txt 
      $Output = "C:\Users\ricardo.braganca\Desktop\NEW\dados.txt"
      
      
      For ($i=0;$i -lt $data.count;$i++) {
          Switch -Regex ($data[$i]) {
              "^FE.*MAI000$" {$bluebegin = $i}
              "MAIFIM$" {
                          If (($data[$i] -match "^01") -and (Get-variable bluebegin -ErrorAction SilentlyContinue)) {
                              $data[$bluebegin..$i] | ForEach-Object {$_ -replace "^FE\|", "01|"}
                          } elseif (Get-variable bluebegin -ErrorAction SilentlyContinue) {
                              Remove-Variable bluebegin
                          }
                       }
          }
         $_ |Add-Content $Output 
      }
  • #45898
    Profile photo of Curtis Smith
    Curtis Smith
    Participant

    Your Add-content is in the wrong place and adding the wrong content. Additionally, your replace logic is backward from the original example. Replacing FE with 01 is like replacing BLUE with YELL instead of YELL with BLUE. In any case. Here is the right way replacing 01 with FE instead of FE with 01 and using Add-Content in the correct spot.

    $data = Get-Content C:\temp\*.txt
    $output = "C:\temp\out.txt"
    
    For ($i=0;$i -lt $data.count;$i++) {
        Switch -Regex ($data[$i]) {
            "^FE.*MAI000$" {$bluebegin = $i}
            "MAIFIM$" {
                        If (($data[$i] -match "^01") -and (Get-variable bluebegin -ErrorAction SilentlyContinue)) {
                            $data[$bluebegin..$i] | ForEach-Object {$_ -replace "^01\|", "FE|" | Add-Content $output}
                        } elseif (Get-variable bluebegin -ErrorAction SilentlyContinue) {
                            Remove-Variable bluebegin
                        }
                     }
        }
    }
    

    Sample input files:
    1.txt

    FE|        |     |     |1111  | MAI000
    FE|        |     |     |1112  | BTATI
    01|        |     |     |1113  | CBDE2
    01|        |     |     |1114  | MATTT
    01|        |     |     |1115  | MATTT
    01|        |     |     |1116  | MAIFIM
    FE|        |     |     |1117  | MAI000
    FE|        |     |     |1118  | BTATI
    FE|        |     |     |1119  | CBDE2
    FE|        |     |     |1110  | MATTT
    FE|        |     |     |1121  | MAIFIM
    

    2.txt

    FE|        |     |     |2111  | MAI000
    FE|        |     |     |2112  | BTATI
    01|        |     |     |2113  | CBDE2
    01|        |     |     |2114  | MATTT
    01|        |     |     |2115  | MATTT
    01|        |     |     |2116  | MAIFIM
    FE|        |     |     |2117  | MAI000
    FE|        |     |     |2118  | BTATI
    FE|        |     |     |2119  | CBDE2
    FE|        |     |     |2110  | MATTT
    FE|        |     |     |2121  | MAIFIM
    

    Results: out.txt

    FE|        |     |     |1111  | MAI000
    FE|        |     |     |1112  | BTATI
    FE|        |     |     |1113  | CBDE2
    FE|        |     |     |1114  | MATTT
    FE|        |     |     |1115  | MATTT
    FE|        |     |     |1116  | MAIFIM
    FE|        |     |     |2111  | MAI000
    FE|        |     |     |2112  | BTATI
    FE|        |     |     |2113  | CBDE2
    FE|        |     |     |2114  | MATTT
    FE|        |     |     |2115  | MATTT
    FE|        |     |     |2116  | MAIFIM
    

You must be logged in to reply to this topic.