Remove or Skip First Line of CSV for Massive Files

Welcome Forums General PowerShell Q&A Remove or Skip First Line of CSV for Massive Files

This topic contains 2 replies, has 2 voices, and was last updated by

 
Participant
1 month ago.

  • Author
    Posts
  • #171694

    Participant
    Topics: 13
    Replies: 42
    Points: 243
    Rank: Participant

    I have a CSV who's size is measured in Gigabytes. I need to read it in but skip the first line, since the first line doesn't actually contain the headers.

    I can use Get-Content -First 1 to get the first line of the file, but is there a way to do the opposite? Or, is there a way to simply remove the first line of the file?

    I'm aware I can simply Import-CSV and pipe it to Select-Object -skip 1 but that takes way too long for a huge file. If there is a way to do it using Get-Content, I'd still have to pipe it to ConvertFrom-CSV which would also take way too long.

    What's the fastest way to accomplish this?

    For some background: The program doesn't know whether or not the file it is given will have headers in the first line or not. I have logic that finds the row that has the headers. Now I just need to be able to read the file, starting at the row with the headers, in a way that doesn't eat up my 16GB of RAM.

  • #171700

    Participant
    Topics: 0
    Replies: 44
    Points: 235
    Helping Hand
    Rank: Participant

    Basically, if you know from where you need to use contains in the excel file you can simple start from that line number So, suppose you want to leave first 2 entries in CSV file then you can do something like below:-

    Example:
    $csv = Get-Content C:\file.csv
    $csv = $csv[2..($csv.count - 1)]
  • #171766

    Participant
    Topics: 13
    Replies: 42
    Points: 243
    Rank: Participant

    Basically, if you know from where you need to use contains in the excel file you can simple start from that line number So, suppose you want to leave first 2 entries in CSV file then you can do something like below:-

    $csv = Get-Content C:\file.csv
    $csv = $csv[2..($csv.count 1)]
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    That works well and is very quick. However, I still have to run ConvertFrom-CSV on it, which means I'm basically writing the data twice: once to read in the file, once to reformat it. ConvertFrom-CSV takes a solid 6 minutes for a 300MB file.

    Is there an alternative that allows the processing to be done in one go? For instance, if I could chop off the first few lines of the file and then run an Import-CSV on it, that would be much faster.

You must be logged in to reply to this topic.