Author Posts

April 22, 2014 at 12:15 pm

I have over 100 text files with about 25 fields per record, fields delimited with "^".
I need to loop through the directory for files matching *TAT*.txt (about 100+ files)
For all records in each file:
delete all columns after field #15 to the end of record
Insert a "|" at the end of each record
Overwrite original file.

I have been searching for days, have been able to find the loop for files, the substring for the parsing, but I am unable to find a way to locate the position of the 15th delimiter "^", parse and overwrite.
Have ZERO experience with Powershell, but a lot of programming exp.

Please assist if you can.
Thanks,
CV

April 22, 2014 at 2:17 pm

hi,

Not quite sure what your goal is, however I would look into building a function.

To list all files matching *TAT*.txt


get-childitem -Filter *TAT*.txt

To read the contents of a file:


get-content -path file.txt

If we assign the content to a variable:


$filecontent = get-content -path file.txt

Then we can create an array every line by splitting at each new line(zero based):


$Array = $filecontent -split [environment]::Newline

Say we have this line "1^2^3^4^5^6^7^8^9^10^11^12^13^14^15^16" in a file, lets split it to an array and keep the 15 first items:


[string]$line = "1^2^3^4^5^6^7^8^9^10^11^12^13^14^15^16"
$linearray = $line.split("^")
$ReducedArray = $linearray[0..14]

So how do we go the other way, from array to a string, separating all fields with a "^" and adding an "|" to the end?


[string]$newline = ($ReducedArray -join "^") + "|"

Now instead of replacing each line, I would add each new line to a new array and save the array when I had processed all the lines in the file. To add an item to an array:


$ArrayOfNewContent = @()
$ArrayOfNewContent += $newline

Next is creating an string to hold all the items in the ArrayOfNewContent:


[string]$NewFileContent = ($ArrayOfNewContent -join "") + [environment]::NewLine
set-content -path c:\temp\file.txt -value $NewFileContent

You now have enough information to create this function:


function Set-FileNewContent
{
[cmdletbinding()]
Param(
        [string[]]$FilePath
)
    
    foreach($File in $FilePath)
    {
        $NewContentArray = @()
        Write-Verbose "Processing file $file"
        $FileContent = Get-Content -path (Resolve-Path -path $file)
        $Array = $FileContent -split [System.Environment]::Newline
        Foreach ($line in $Array)
        {           
            $LineArray = $line.split("^")
            $ReducedArray = $linearray[0..14]
            [string]$newline = ($ReducedArray -join "^") + "|"
            $NewContentArray += $newline + [System.Environment]::NewLine
        }
        [string]$NewFileContent = ($NewContentArray -join "") 
        Write-Verbose "Saving file $file"
        Set-Content -Path $File -value $NewFileContent
    }
}

You can call the function like so:


Set-FileNewContent -FilePath (Get-ChildItem -Filter "*tat*.txt") -Verbose

Note the -verbose switch is not needed, however it gives you feedback as the script progresses.

Cheers

Tore

April 22, 2014 at 4:43 pm

I'd probably do it like this, but I'm a regular expressions junkie. 🙂

Get-Content -Path .\test.txt |
ForEach-Object { $_ -replace '(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|' } |
Set-Content -Path .\tempFile.txt

I know it looks like gibberish, but that -replace operation basically says "take everything after the 15th ^-separated field on each line, and replace it with a | character". In this case, I didn't overwrite the original file, but you can modify the code to do that if you'd like. If you want it to work in a single pipeline, though, you need to put parentheses around the Get-Content command. Otherwise both Get-Content and Set-Content would try to open the same file at the same time, which doesn't work. For example:

(Get-Content -Path .\test.txt) |
ForEach-Object { $_ -replace '(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|' } |
Set-Content -Path .\test.txt

April 24, 2014 at 11:55 am

Dave Wyatt;

I am working with Carlos on this problem with files. Your code works great with one file but I have 100+ files and some are 40mg. I need a like script to read each file and overwrite each file with same name.

Please assist if you can.
Thanks

Ron Slaton

April 24, 2014 at 12:07 pm

To process multiple files, you'd just need a loop instead of hard-coding a file name. Assuming that you have an array of paths named $paths:

foreach ($path in $paths)
{
    (Get-Content -Path $path) |
    ForEach-Object { $_ -replace '(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|' } |
    Set-Content -Path $path
}

However, this approach reads the entire file into memory. 40MB is not a huge size, so that's probably not a big deal here. ForEach-Object is also extremely slow when you're processing lots of data, and you can improve the performance quite a bit with a filter:

filter Get-FirstFifteenFields
{
    $_ -replace '(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|'
}

foreach ($path in $paths)
{
    (Get-Content -Path $path) |
    Get-FirstFifteenFields |
    Set-Content -Path $path
}

If you needed to process files so large that this approach produced an OutOfMemoryException, you can instead use a streaming approach to a temporary file, then move it back over the original file when done. In this example, I've also thrown in -ReadCount 1000 to process lines in large chunks and modified the regex slightly to account for that. This is another performance tweak when dealing with very large files.

filter Get-FirstFifteenFields
{
    $_ -replace '(?m)(?< =^[^\^]*(?:\^[^\^]*){14})\^.*$', '|'
}

foreach ($path in $paths)
{
    $tempFile = [System.IO.Path]::GetTempFileName()

    Get-Content -Path $path -ReadCount 1000 |
    Get-FirstFifteenFields |
    Set-Content -Path $tempFile

    [System.IO.File]::Replace($tempFile, (Resolve-Path $path), [NullString]::Value)
}

April 24, 2014 at 2:00 pm

Dave, you ARE the regex king. You are beyond loving it 🙂

If you know regex (really know how to use it) and frequently rely on it, it is great. I however have to think of the next guy most of the time and have never seen the need for it (yet). Then again maybe I am just surrounded by "regular" people 🙂

Cheers

Tore

April 28, 2014 at 6:07 am

Dave,

So I have a group of files in a specific directory with name like tmatat_fbhr.txt. How do I pass the path a file names to the script?

Thanks again.

Ron

April 29, 2014 at 7:16 am

Dave,

So I have a group of files in a specific directory with name like tmatat_fbhr.txt. How do I pass the path a file names to the script?

Thanks again.

Ron

May 1, 2014 at 8:01 pm

The code I posted already has a loop; all it needs is an array of strings named $paths (though there are other ways you could change the code.) Here's one way you might accomplish that:

$paths = Get-ChildItem -Path c:\your\directory\*.txt | Select-Object -ExpandProperty FullName

May 6, 2014 at 11:05 am

Dave,

After editing code with new code with $path string I am getting this error related to NullString>

Unable to find type [NullString]: make sure that the assembly containing this type is loaded.

Thanks Ron

May 6, 2014 at 11:19 am

The NullString type was new in PowerShell 3.0, I believe. Without it, it's very awkward to call certain .NET methods (such as File.Replace), because PowerShell automatically converts $null to an empty string.

If you need to execute this code on PowerShell 2.0, there are other options. You could maybe use Move-Item (though that might modify permissions, audit settings, etc on the original file location), or you could do something like this, to just replace the original file's contents without anything else being modified (and still keeping the memory advantages of streaming.)

Get-Content $tempFile -ReadCount 1000 | Set-Content $path
Remove-Item $tempFile

May 6, 2014 at 12:05 pm

Thanks Dave for all your assistance.