parse csv files with posh

This topic contains 4 replies, has 2 voices, and was last updated by  Глеб Боушев 1 year, 9 months ago.

  • Author
    Posts
  • #32716

    Hello, I'm looking for a way to parse csv files with posh, really big files 1gb+. import-csv works kinda slow. what would be the alternative way to do so?

    Right now I'm doing this:

    $newcsv = Import-Csv -Path $_.FullName -Encoding Default -Delimiter ";"`
        -Header date,0,user,1,folder,session,2,pc,3,4 | Where-Object {($_.date -like "2015*")`
        -and ($_.session -like "opensession")}
    
        foreach ($_ in $newcsv) {
            $newcsv[[array]::indexof($newcsv,$_)].pc = $_.pc.split(" ").trim('(m)') | select -Last 1
        }
        $newcsv | Select-Object user,pc -Unique | Export-Csv -Delimiter ";" -NoTypeInformation -Path "$folder\Unique\unique$name.csv" -Encoding Default
        $newcsv | Select-Object date,user,folder,session,pc| Export-Csv -Delimiter ";" -NoTypeInformation -Path "$folder\NotUnique\notunique$name.csv" -Encoding Default
    }

    And it takes 30 minutes on a really good laptop (i7\16gb\240ssd) to parse 800mb file.

    I saw this thing (Microsoft.VisualBasic.FileIO.TextFieldParser), but i have no clue what to do with and, and can't find anything relevant. I have no background in coding and cant seem to figure this one out

  • #32717

    Matt Bloomfield
    Participant

    Have a look at Chrissy Le Maire's blog. She demonstrates several techniques for working with large CSV files:

    High-Performance Techniques for Importing CSV to SQL Server using PowerShell

    Quickly Find Duplicates in Large CSV Files using PowerShell

  • #32718

    wow, this is great, now i need to wrap my head around it! thank you!
    hm, with further reading all of this seems to require sql, is there any method of doing this without sql, as i need to give out this script to other people and it should just work, without the need to install sql or configure anything...

    well, if you got any ideas?

  • #32723

    Matt Bloomfield
    Participant

    The first example does use SQL server but only because it's demonstrating how to quickly import a CSV file into SQL. You should find the second script, for finding duplicates, works without any additional configuration which at least proves the concept could work for you.

    This method is using the ODBC drivers which are (probably) already installed. It's effectively treating your CSV file as a SQL database table.

    You will need to understand how to use SQL queries and how to call the driver's properties and methods though – that's the hard bit.

  • #32729

    ah, I see, I saw several sql queries in the file, so I assumed it does need sql database, thanks, I'll try to wrap my head around it!

You must be logged in to reply to this topic.