Author Posts

October 20, 2017 at 2:30 pm

I have a CSV that needs some formatting during import.

The first 4 lines need to be removed completely.
The 5th line includes the header.

I've tried to use the import-csv cmdlet but it's importing everything and haven't found a way to remove the lines.
Using GC takes a very long time.
I've used get-content c:/temp/filename.csv" | select -skip 4 | importfrom-csv
This also took a very long time and froze my machine

Adam Bertram suggested adding the "-RAW" switch to the GC cmdlet but it's still taking a long long time and freezing my box.

October 20, 2017 at 3:32 pm

You could use the readcount parameter for GC and set it to 0 or maybe 1000. You could also use .Net streamreader and write the output on the fly as you go through the input file so it isn't trying to store it all in ram. Here is an example:

$Hugefile = New-Object System.IO.StreamReader -Arg "MyHuuuugeFile.txt"
while ($Line = $Hugefile.ReadLine())
{
#example to do something with the line before saving
$Line = $Line -replace("aaaaa","bbb")

# save line to new file
$Line | Out-File 'MyNewHuuuuggeFile.txt' -Append

}
$Hugefile.Close()

October 20, 2017 at 3:32 pm

Have you tried using import-csv? How large is the file?

October 20, 2017 at 4:34 pm

Chrissy LeMaire has blogged on techniques for working with large CSV files. You might find this article has useful pointers:

Quickly Find Duplicates in Large CSV Files using PowerShell