Welcome Forums General PowerShell Q&A Remove entire rows from csv with same value under specific column heading.

Viewing 11 reply threads
  • Author
    Posts
    • #169642
      Participant
      Topics: 1
      Replies: 5
      Points: -18
      Rank: Member

      Hi folks,
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I hope someone will be able to help me with this.</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I have a csv file called ‘original.csv’. Within this file there are around 1000 rows of data. The issue I have is that there are many duplicate rows in this file whereby some rows are exactly the same as other rows except one column value being different – this column value (Col2) has a different time value but all other values in each of the columns are the same. Please see the example below:</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”> Col1,Col2,Col3,Col4,Col5,Col6</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>1,11:56,3,4,Smith,6</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>1,11:57,3,4, Smith,6</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>1,11:58,3,4, Smith,6</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I’d like to keep the most recent value (11.58) and export this row and only this row to a separate csv file based on the value of Col5 (Smith). So, I need to have the most recent row for each surname (Smith). At the moment I have multiple rows with the same values whereby only the time value for Col2 differs. I only want one record per surname and this record has to be the most recent.  </p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I have looked into input.csv, export csv, compare, sort and select but cannot get to the bottom of this. Any help would be greatly appreciated. My current csv file also has a header row at the top of the file.</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”> </p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Thanks in advance</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Tess</p>

    • #169651
      Participant
      Topics: 9
      Replies: 430
      Points: 730
      Helping Hand
      Rank: Major Contributor
      #region Input
      
      $CSV = '.\CSV1.csv'
      @'
      Col1,Col2,Col3,Col4,Col5,Col6
      1,11:56,3,4,Smith,6
      1,11:58,3,4, Smith,6
      1,11:57,3,4, Smith,6
      1,10:57,3,4, Tom,6
      1,10:58,3,4, Tom,6
      1,10:56,3,4,Tom,6
      '@ | Out-File $CSV
      
      #endregion
      
      #region read and View CSV
      
      $myData = Import-Csv $CSV 
      $myData | FT -a 
      <#
      Col1 Col2  Col3 Col4 Col5  Col6
      ---- ----  ---- ---- ----  ----
      1    11:56 3    4    Smith 6   
      1    11:57 3    4    Smith 6   
      1    11:58 3    4    Smith 6   
      1    10:56 3    4    Tom   6   
      1    10:57 3    4    Tom   6   
      1    10:58 3    4    Tom   6    
      #>
      
      #endregion
      
      #region Process
      
      $SurNameList = $myData.Col5 | select -Unique
      $SurNameList
      <#
      Smith
      Tom
      #>
      
      $myOutput = foreach ($SurName in $SurNameList) {
          ($myData | where Col5 -EQ $SurName | sort Col2)[-1] 
          # 'Where' selects the rows belonging to the current SurName in the loop
          # 'sort' sorts the resulting rows by Col2
          # [-1] selects the last member of this array
      }
      
      #endregion
      
      #region Output
      
      $myOutput | FT -a 
      <#
      Col1 Col2  Col3 Col4 Col5  Col6
      ---- ----  ---- ---- ----  ----
      1    11:58 3    4    Smith 6   
      1    10:58 3    4    Tom   6   
      #>
      
      #endregion
      
    • #169654
      Participant
      Topics: 9
      Replies: 430
      Points: 730
      Helping Hand
      Rank: Major Contributor

      less verbose..

      $CSV = '.\CSV1.csv'
      @'
      Col1,Col2,Col3,Col4,Col5,Col6
      1,11:56,3,4,Smith,6
      1,11:58,3,4, Smith,6
      1,11:57,3,4, Smith,6
      1,10:57,3,4, Tom,6
      1,10:58,3,4, Tom,6
      1,10:56,3,4,Tom,6
      '@ | Out-File $CSV
      
      
      
      $myData = Import-Csv $CSV 
      
      $myOutput = foreach ($SurName in ($myData.Col5 | select -Unique)) {
          ($myData | where Col5 -EQ $SurName | sort Col2)[-1] 
      }
      
      $myOutput | FT -a 
      
    • #169663
      Participant
      Topics: 1
      Replies: 5
      Points: -18
      Rank: Member

      Many thanks Sam for your speedy and very helpful reply. I already have a formatted csv file and I was therefore wondering how I would export the two lines in your filtered example to a new csv file. I’m trying to get to grips with PowerShell and my apologies if this seems very obvious. Also what does the | FT -a do?
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Thanks in advance.</p>

    • #169669
      Participant
      Topics: 9
      Replies: 430
      Points: 730
      Helping Hand
      Rank: Major Contributor

      FT -a is short for Format-Table -Autosize
      You would export the output to CSV by piping it to Export-Csv

    • #169672
      Participant
      Topics: 1
      Replies: 5
      Points: -18
      Rank: Member

      okay thanks, this is what I have at present:
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I have changed the variable names to better suit their purpose:</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>

      
      

      </p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>$CleanLog= “C:\_logs\CleanAppLog.csv”</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>$myData = Import-Csv $CleanLog</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>$myOutput = foreach ($recordRow in ($myData.HostIP | select -Unique)) {
      ($myData | where HostIP -EQ $recordRow | sort UserIdentifier)[-1] | Export-CSV $CleanLog
      # ‘Where’ selects the rows belonging to the current HostIP in the loop
      # ‘sort’ sorts the resulting rows by UserIdentifier column heading
      # [-1] selects the last member of this array
      }
      </p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”></p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Please note that I have now changed Surname to HostIP and Col5 to UserIdentifier.</p>
      <p style=”outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Thanks again.</p>
       

    • #169675
      Participant
      Topics: 1
      Replies: 5
      Points: -18
      Rank: Member

      I know that I’m currently overwriting the source file with the updated file but it would be better to create a completely new file using export csv rather than overwriting the source. Thanks

    • #169687
      Participant
      Topics: 2
      Replies: 1736
      Points: 3,551
      Helping Hand
      Rank: Community Hero

      Feel free to change the code suggestion you got here according to your needs. There is no copyright on it. 😉

    • #169693
      Participant
      Topics: 1
      Replies: 5
      Points: -18
      Rank: Member

      thanks, do you have any ideas how to solve this?

    • #169708
      Participant
      Topics: 2
      Replies: 1736
      Points: 3,551
      Helping Hand
      Rank: Community Hero

      … do you have any ideas how to solve this?

      Of course. Take a look at Sams last example code … instead of piping the result to FT (Format-Table) you could pipe it to Export-CSV.

      BTW: He already recommended that earlier on. 😉

    • #169801
      Participant
      Topics: 1
      Replies: 5
      Points: -18
      Rank: Member

      Thanks for the reply. I have tried piping it using export-csv to variable path $CleanLog (the code is located a few messages up) instead of a table but the output file does not contain the expected filtered/parsed data. This got me thinking that perhaps the export-csv was not the best/only way to go.

    • #169837
      Participant
      Topics: 2
      Replies: 3
      Points: 42
      Rank: Member

      If export-csv is giving you row counts or row numbers instead of your data you might need to add -NoTypeInformation, you could also tee-object or out-file depending on your script and it’s purpose.

Viewing 11 reply threads
  • The topic ‘Remove entire rows from csv with same value under specific column heading.’ is closed to new replies.