How to locate nth position of \"^\" in record, Cut 2 Cols & INS a EOR \"|\"

This topic contains 11 replies, has 4 voices, and was last updated by Profile photo of ron slaton ron slaton 3 years, 1 month ago.

  • Author
    Posts
  • #14704
    Profile photo of Carlos Vargas
    Carlos Vargas
    Participant

    I have over 100 text files with about 25 fields per record, fields delimited with "^".
    I need to loop through the directory for files matching *TAT*.txt (about 100+ files)
    For all records in each file:
    delete all columns after field #15 to the end of record
    Insert a "|" at the end of each record
    Overwrite original file.

    I have been searching for days, have been able to find the loop for files, the substring for the parsing, but I am unable to find a way to locate the position of the 15th delimiter "^", parse and overwrite.
    Have ZERO experience with Powershell, but a lot of programming exp.

    Please assist if you can.
    Thanks,
    CV

  • #14715
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    I'd probably do it like this, but I'm a regular expressions junkie. 🙂

    Get-Content -Path .\test.txt |
    ForEach-Object { $_ -replace '(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|' } |
    Set-Content -Path .\tempFile.txt
    

    I know it looks like gibberish, but that -replace operation basically says "take everything after the 15th ^-separated field on each line, and replace it with a | character". In this case, I didn't overwrite the original file, but you can modify the code to do that if you'd like. If you want it to work in a single pipeline, though, you need to put parentheses around the Get-Content command. Otherwise both Get-Content and Set-Content would try to open the same file at the same time, which doesn't work. For example:

    (Get-Content -Path .\test.txt) |
    ForEach-Object { $_ -replace '(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|' } |
    Set-Content -Path .\test.txt
    
    • #14798
      Profile photo of ron slaton
      ron slaton
      Participant

      Dave Wyatt;

      I am working with Carlos on this problem with files. Your code works great with one file but I have 100+ files and some are 40mg. I need a like script to read each file and overwrite each file with same name.

      Please assist if you can.
      Thanks

      Ron Slaton

  • #14800
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    To process multiple files, you'd just need a loop instead of hard-coding a file name. Assuming that you have an array of paths named $paths:

    foreach ($path in $paths)
    {
        (Get-Content -Path $path) |
        ForEach-Object { $_ -replace '(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|' } |
        Set-Content -Path $path
    }
    

    However, this approach reads the entire file into memory. 40MB is not a huge size, so that's probably not a big deal here. ForEach-Object is also extremely slow when you're processing lots of data, and you can improve the performance quite a bit with a filter:

    filter Get-FirstFifteenFields
    {
        $_ -replace '(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|'
    }
    
    foreach ($path in $paths)
    {
        (Get-Content -Path $path) |
        Get-FirstFifteenFields |
        Set-Content -Path $path
    }
    

    If you needed to process files so large that this approach produced an OutOfMemoryException, you can instead use a streaming approach to a temporary file, then move it back over the original file when done. In this example, I've also thrown in -ReadCount 1000 to process lines in large chunks and modified the regex slightly to account for that. This is another performance tweak when dealing with very large files.

    filter Get-FirstFifteenFields
    {
        $_ -replace '(?m)(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|'
    }
    
    foreach ($path in $paths)
    {
        $tempFile = [System.IO.Path]::GetTempFileName()
    
        Get-Content -Path $path -ReadCount 1000 |
        Get-FirstFifteenFields |
        Set-Content -Path $tempFile
    
        [System.IO.File]::Replace($tempFile, (Resolve-Path $path), [NullString]::Value)
    }
    
    • #14859
      Profile photo of ron slaton
      ron slaton
      Participant

      Dave,

      So I have a group of files in a specific directory with name like tmatat_fbhr.txt. How do I pass the path a file names to the script?

      Thanks again.

      Ron

    • #14883
      Profile photo of ron slaton
      ron slaton
      Participant

      Dave,

      So I have a group of files in a specific directory with name like tmatat_fbhr.txt. How do I pass the path a file names to the script?

      Thanks again.

      Ron

  • #14808
    Profile photo of Tore Groneng
    Tore Groneng
    Participant

    Dave, you ARE the regex king. You are beyond loving it 🙂

    If you know regex (really know how to use it) and frequently rely on it, it is great. I however have to think of the next guy most of the time and have never seen the need for it (yet). Then again maybe I am just surrounded by "regular" people 🙂

    Cheers

    Tore

  • #14937
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    The code I posted already has a loop; all it needs is an array of strings named $paths (though there are other ways you could change the code.) Here's one way you might accomplish that:

    $paths = Get-ChildItem -Path c:\your\directory\*.txt | Select-Object -ExpandProperty FullName
    
    • #15077
      Profile photo of ron slaton
      ron slaton
      Participant

      Dave,

      After editing code with new code with $path string I am getting this error related to NullString>

      Unable to find type [NullString]: make sure that the assembly containing this type is loaded.

      Thanks Ron

  • #15080
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    The NullString type was new in PowerShell 3.0, I believe. Without it, it's very awkward to call certain .NET methods (such as File.Replace), because PowerShell automatically converts $null to an empty string.

    If you need to execute this code on PowerShell 2.0, there are other options. You could maybe use Move-Item (though that might modify permissions, audit settings, etc on the original file location), or you could do something like this, to just replace the original file's contents without anything else being modified (and still keeping the memory advantages of streaming.)

    Get-Content $tempFile -ReadCount 1000 | Set-Content $path
    Remove-Item $tempFile
    
    • #15084
      Profile photo of ron slaton
      ron slaton
      Participant

      Thanks Dave for all your assistance.

  • #14708
    Profile photo of Tore Groneng
    Tore Groneng
    Participant

    hi,

    Not quite sure what your goal is, however I would look into building a function.

    To list all files matching *TAT*.txt

    
    get-childitem -Filter *TAT*.txt
    

    To read the contents of a file:

    
    get-content -path file.txt
    

    If we assign the content to a variable:

    
    $filecontent = get-content -path file.txt
    

    Then we can create an array every line by splitting at each new line(zero based):

    
    $Array = $filecontent -split [environment]::Newline
    

    Say we have this line "1^2^3^4^5^6^7^8^9^10^11^12^13^14^15^16" in a file, lets split it to an array and keep the 15 first items:

    
    [string]$line = "1^2^3^4^5^6^7^8^9^10^11^12^13^14^15^16"
    $linearray = $line.split("^")
    $ReducedArray = $linearray[0..14]
    

    So how do we go the other way, from array to a string, separating all fields with a "^" and adding an "|" to the end?

    
    [string]$newline = ($ReducedArray -join "^") + "|"
    

    Now instead of replacing each line, I would add each new line to a new array and save the array when I had processed all the lines in the file. To add an item to an array:

    
    $ArrayOfNewContent = @()
    $ArrayOfNewContent += $newline
    

    Next is creating an string to hold all the items in the ArrayOfNewContent:

    
    [string]$NewFileContent = ($ArrayOfNewContent -join "") + [environment]::NewLine
    set-content -path c:\temp\file.txt -value $NewFileContent
    

    You now have enough information to create this function:

    
    function Set-FileNewContent
    {
    [cmdletbinding()]
    Param(
            [string[]]$FilePath
    )
        
        foreach($File in $FilePath)
        {
            $NewContentArray = @()
            Write-Verbose "Processing file $file"
            $FileContent = Get-Content -path (Resolve-Path -path $file)
            $Array = $FileContent -split [System.Environment]::Newline
            Foreach ($line in $Array)
            {           
                $LineArray = $line.split("^")
                $ReducedArray = $linearray[0..14]
                [string]$newline = ($ReducedArray -join "^") + "|"
                $NewContentArray += $newline + [System.Environment]::NewLine
            }
            [string]$NewFileContent = ($NewContentArray -join "") 
            Write-Verbose "Saving file $file"
            Set-Content -Path $File -value $NewFileContent
        }
    }
    

    You can call the function like so:

    
    Set-FileNewContent -FilePath (Get-ChildItem -Filter "*tat*.txt") -Verbose
    

    Note the -verbose switch is not needed, however it gives you feedback as the script progresses.

    Cheers

    Tore

You must be logged in to reply to this topic.