Deleting occasional rows above header row in csv file

Welcome Forums General PowerShell Q&A Deleting occasional rows above header row in csv file

Viewing 4 reply threads
  • Author
    Posts
    • #205347
      Participant
      Topics: 1
      Replies: 0
      Points: -25
      Rank: Member

      Hey everyone.  I am a sql programmer who is taking a crash course in Powershell.  Brand new to Powershell and also brand new employee at my new job.  Been sinking into PowerShell the past couple days.  Apologize if this question is very basic and I am very open to feedback on how I can ask this question better.

      Here is my scenario: I work for a consulting company that receives client data.  So our data source is completely external.  We typically work with whatever they send us with no questions asked.  The clients typically provide us with a standard set of data that we ask for.  In other words, the columns are consistent from customer to customer.  Clients typically provide us with csv files filled with data.  On occasion, these csv files contain a line above the headers that may contain information.  For example, the latest files we received from a client has a row count on line 1.  Then on line 2 the headers begin.  Is there a way for PowerShell to identify where my headers begin and delete any rows above this.  Keep in mind that for some client data we receive, the headers will correctly be on row 1.  Each client is different.  I am looking for a way to remove any lines above the header.  Does (scenario 1) PowerShell have the flexibility to delete when data or rows are above the headers as well as (Scenario 2) know when there is no data above the headers and to do nothing?  Hope that makes sense.  I do see folks in the forums talking about deleting blank rows or deleting x number of rows.  I did not see any discussion that touched on my specific scenario.

      What data currently looks like:

      A B C D E F
      1 Co : ABC 8
      2 Entity Entity Name Year Employee ID Employee Name Payroll Period
      3 ABC ABC Company 2019 43445 John Doe 2019
      4 ABC ABC Company 2019 23441 Jane Doe 2019
      5 ABC ABC Company 2019 46810 Bill Doe 2019
      6 ABC ABC Company 2019 99981 Sally Doe 2019
      7 ABC ABC Company 2019 52221 Frank Doe 2019
      8 ABC ABC Company 2019 99432 Jim Doe 2019
      9 ABC ABC Company 2019 12345 Lise Doe 2019
      10 ABC ABC Company 2019 98765 Karl Doe 2019

       

      Below is how I need the file to look:

      A B C D E F
      1 Entity Entity Name Year Employee ID Employee Name Payroll Period
      2 ABC ABC Company 2019 43445 John Doe 2019
      3 ABC ABC Company 2019 23441 Jane Doe 2019
      4 ABC ABC Company 2019 46810 Bill Doe 2019
      5 ABC ABC Company 2019 99981 Sally Doe 2019
      6 ABC ABC Company 2019 52221 Frank Doe 2019
      7 ABC ABC Company 2019 99432 Jim Doe 2019
      8 ABC ABC Company 2019 12345 Lise Doe 2019
      9 ABC ABC Company 2019 98765 Karl Doe 2019

       

      Any feedback would be appreciated on whether or not Powershell could accomplish my ask.

       

      Thanks

       

      Frank Sweeney

       

    • #205359
      Participant
      Topics: 4
      Replies: 2249
      Points: 5,494
      Helping Hand
      Rank: Community MVP

      We typically work with whatever they send us with no questions asked. The clients typically provide us with a standard set of data that we ask for. In other words, the columns are consistent from customer to customer. Clients typically provide us with csv files filled with data.

      I know it’s not really what you was asking for but the proper way would be to ask the clients to provide valid CSV files with no extra rubbish in it.

      In other words, the columns are consistent from customer to customer. On occasion, these csv files contain a line above the headers that may contain information. For example, the latest files we received from a client has a row count on line 1. Then on line 2 the headers begin.

      Does this particular client provides the data always this way?

      Is there a way for PowerShell to identify where my headers begin and delete any rows above this.

      How do you distinguish between the header/data rows and the unnecessary extra rows? Is the delimiter always the same for all clients? How many columns do the CSV files usually have?

      For example: if the delimiter is always the comma and you usually have at least a certain amount of columns you could check the first row (or the first rows if there are some CSV files with more than one extra row) of each CSV file if it has the expected amount of delimiters in it.

      Keep in mind that for some client data we receive, the headers will correctly be on row 1.

      What brings my back to the second answer: if one client always provides the data in a consistent way – even if it’s with unnecessary extra rows – you could treat the data differently “by client”.

      Regardless of all that: When you crosspost a particular question at the same time to different forums you should at least post links to the other forums along with your question to avoid people willing to help you making their work twice or more.

      https://social.technet.microsoft.com/Forums/en-US/5fe171d5-5b5b-4602-ad57-bfd8b5c693c6/powershell-deleting-occasional-rows-above-header-row-in-csv-file?forum=winserverpowershell#348f7277-2146-41ed-a1c9-54e0f0e0a782

      … and BTW: when you post code or example data you should format both as code using the code tag button named “PRE“. That makes it possible to proper copy the code or example data and reproduce your issue.

    • #205368
      Moderator
      Topics: 3
      Replies: 222
      Points: 1,206
      Helping Hand
      Rank: Community Hero

      [replied before unspam the Olaf’s replay]

      Hi there,

      Did you try anything out here? There are plenty of approaches to accomplish this task.

      $dataFromCSV = Get-Content C:\Temp\data.csv
      
      #Approach 1
      $MinimumColumnCount = 3
      $dataYouRequired = $dataFromCSV | Where-Object { ( $_.split(‘,’) | Where-Object { $_ } ).Count -gt $MinimumColumnCount } | ConvertFrom-Csv
      
      #Approach 2
      $Header = ‘Entity,Entity Name,Year,Employee ID,Employee Name,Payroll Period’
      $HeaderIndex = $dataFromCSV | Select-String -Pattern $Header | ForEach-Object LineNumber
      $dataYouRequired = $dataFromCSV | Select-Object -Skip ($HeaderIndex – 1) | ConvertFrom-Csv
      

      Since you mentioned that you are a newbie, if you understood this then please go ahead and complete your script, if not then please start with the basics of PowerShell. I would suggest you to start with video tutorial below…

      https://channel9.msdn.com/Series/GetStartedPowerShell3/01

      This is still valid and will give you a clear cut overview on power of PowerShell.

      Thank you.

      • This reply was modified 3 months, 1 week ago by Kiran.
      • This reply was modified 3 months, 1 week ago by Kiran. Reason: Add comment [replied before unspam the Olaf's replay]
    • #205794
      Participant
      Topics: 5
      Replies: 322
      Points: 446
      Helping Hand
      Rank: Contributor
      # client data source path
      $cdata = '.\clientdata.csv'
      
      # If first line starts with header, create object
      # else skip first line and create object with existing headers
      If ((Get-Content $cdata -First 1) -match '^"Entity'){
          Get-Content $cdata | ConvertFrom-Csv
      } Else {
          Get-Content $cdata | Select-Object -Skip 1 | 
          ConvertFrom-Csv
      }
      
    • #205821
      Participant
      Topics: 4
      Replies: 2249
      Points: 5,494
      Helping Hand
      Rank: Community MVP

      just FYI : The OP already has several answers on the technet forum.

Viewing 4 reply threads
  • The topic ‘Deleting occasional rows above header row in csv file’ is closed to new replies.