Welcome Forums General PowerShell Q&A Remove or Skip First Line of CSV for Massive Files

Viewing 2 reply threads
  • Author
    Posts
    • #171694
      Participant
      Topics: 13
      Replies: 42
      Points: 243
      Rank: Participant

      I have a CSV who’s size is measured in Gigabytes. I need to read it in but skip the first line, since the first line doesn’t actually contain the headers.

      I can use Get-Content -First 1 to get the first line of the file, but is there a way to do the opposite? Or, is there a way to simply remove the first line of the file?

      I’m aware I can simply Import-CSV and pipe it to Select-Object -skip 1 but that takes way too long for a huge file. If there is a way to do it using Get-Content, I’d still have to pipe it to ConvertFrom-CSV which would also take way too long.

      What’s the fastest way to accomplish this?

      For some background: The program doesn’t know whether or not the file it is given will have headers in the first line or not. I have logic that finds the row that has the headers. Now I just need to be able to read the file, starting at the row with the headers, in a way that doesn’t eat up my 16GB of RAM.

    • #171700
      Participant
      Topics: 0
      Replies: 45
      Points: 243
      Helping Hand
      Rank: Participant

      Basically, if you know from where you need to use contains in the excel file you can simple start from that line number So, suppose you want to leave first 2 entries in CSV file then you can do something like below:-

      Example:
      $csv = Get-Content C:\file.csv
      $csv = $csv[2..($csv.count - 1)]
    • #171766
      Participant
      Topics: 13
      Replies: 42
      Points: 243
      Rank: Participant

      Basically, if you know from where you need to use contains in the excel file you can simple start from that line number So, suppose you want to leave first 2 entries in CSV file then you can do something like below:-

      $csv = Get-Content C:\file.csv
      $csv = $csv[2..($csv.count 1)]
      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

      That works well and is very quick. However, I still have to run ConvertFrom-CSV on it, which means I’m basically writing the data twice: once to read in the file, once to reformat it. ConvertFrom-CSV takes a solid 6 minutes for a 300MB file.

      Is there an alternative that allows the processing to be done in one go? For instance, if I could chop off the first few lines of the file and then run an Import-CSV on it, that would be much faster.

Viewing 2 reply threads
  • The topic ‘Remove or Skip First Line of CSV for Massive Files’ is closed to new replies.