parse csv files with posh

This topic contains 4 replies, has 2 voices, and was last updated by Profile photo of Глеб Боушев Глеб Боушев 10 months, 3 weeks ago.

  • Author
  • #32716

    Hello, I'm looking for a way to parse csv files with posh, really big files 1gb+. import-csv works kinda slow. what would be the alternative way to do so?

    Right now I'm doing this:

    $newcsv = Import-Csv -Path $_.FullName -Encoding Default -Delimiter ";"`
        -Header date,0,user,1,folder,session,2,pc,3,4 | Where-Object {($ -like "2015*")`
        -and ($_.session -like "opensession")}
        foreach ($_ in $newcsv) {
            $newcsv[[array]::indexof($newcsv,$_)].pc = $_.pc.split(" ").trim('(m)') | select -Last 1
        $newcsv | Select-Object user,pc -Unique | Export-Csv -Delimiter ";" -NoTypeInformation -Path "$folder\Unique\unique$name.csv" -Encoding Default
        $newcsv | Select-Object date,user,folder,session,pc| Export-Csv -Delimiter ";" -NoTypeInformation -Path "$folder\NotUnique\notunique$name.csv" -Encoding Default

    And it takes 30 minutes on a really good laptop (i7\16gb\240ssd) to parse 800mb file.

    I saw this thing (Microsoft.VisualBasic.FileIO.TextFieldParser), but i have no clue what to do with and, and can't find anything relevant. I have no background in coding and cant seem to figure this one out

  • #32717
    Profile photo of Matt Bloomfield
    Matt Bloomfield

    Have a look at Chrissy Le Maire's blog. She demonstrates several techniques for working with large CSV files:

    High-Performance Techniques for Importing CSV to SQL Server using PowerShell

    Quickly Find Duplicates in Large CSV Files using PowerShell

  • #32718

    wow, this is great, now i need to wrap my head around it! thank you!
    hm, with further reading all of this seems to require sql, is there any method of doing this without sql, as i need to give out this script to other people and it should just work, without the need to install sql or configure anything...

    well, if you got any ideas?

  • #32723
    Profile photo of Matt Bloomfield
    Matt Bloomfield

    The first example does use SQL server but only because it's demonstrating how to quickly import a CSV file into SQL. You should find the second script, for finding duplicates, works without any additional configuration which at least proves the concept could work for you.

    This method is using the ODBC drivers which are (probably) already installed. It's effectively treating your CSV file as a SQL database table.

    You will need to understand how to use SQL queries and how to call the driver's properties and methods though – that's the hard bit.

  • #32729

    ah, I see, I saw several sql queries in the file, so I assumed it does need sql database, thanks, I'll try to wrap my head around it!

You must be logged in to reply to this topic.