Comparing 2 CSV files

Welcome Forums General PowerShell Q&A Comparing 2 CSV files

Viewing 5 reply threads
  • Author
    Posts
    • #226686
      Participant
      Topics: 2
      Replies: 2
      Points: 50
      Rank: Member

      Hello,

      I have 2 CSV files, let’s call them old.csv and new.csv – it’s an automated file spat out by our HRIS every hour or so and has about 1000 rows and about 20 columns, and I’m comparing the new one with the one that’s an hour old.

      I need to create a third file called changes.csv which lists the entire row for any row where any of the fields in that row has a different value in the new.csv than in the old.csv, and also any new.csv row that isn’t in the old.csv

      Both old.csv and new.csv have a header row, and there’s a unique field in the EmployeeID column.

      What I’ve written so far doesn’t really bear posting here, as it’s tortuous and clunky, and I haven’t tried running it yet.

      As with many scriplets I’m sure someone somewhere has had the same need I do and thrown something together that works and then gets streamlined and then gets simplified until it’s a work of art, haha. So I was wondering if anyone here has something approaching a one-liner I can incorporate into our custom reporting scripts that creates the changes.csv I’m looking for.

      If not, any pointers/suggestions/gotchas would be most welcome.

      Thanks!

    • #226692
      Participant
      Topics: 4
      Replies: 2247
      Points: 5,484
      Helping Hand
      Rank: Community MVP

      What I’ve written so far doesn’t really bear posting here, as it’s tortuous and clunky, and I haven’t tried running it yet.

      So we have to start from scratch to create something for your, right? You could at least post a few sanitized but still representative rows of the source data you’re dealing with.

      As with many scriplets I’m sure someone somewhere has had the same need I do and thrown something together that works ….

      Did you try to search for it? I’m pretty sure you could have find something adaptable in StackOverflow or even here in this forum or in the PowershellGallery.

      Have you tried the cmdlet Compare-Object?

    • #226695
      Participant
      Topics: 2
      Replies: 2
      Points: 50
      Rank: Member

      Compare-Object was the way to go, thanks. I was comparing each entry in the hashtables in two foreach loops. Longwinded and tortuous, as I said.

    • #226713
      Participant
      Topics: 0
      Replies: 1
      Points: 20
      Rank: Member
    • #226725
      Participant
      Topics: 0
      Replies: 1
      Points: 5
      Rank: Member

      AMGPoker adalah situs Judi Online terpercaya se-Asia
      Kami juga menyediakan bonus untuk anda seperti :
      * Bonus new member
      * Bonus rollingan
      * Bonus referral
      WA : +855 1070 2288
      http://ref.mov168.xyz/?slug=“jty6r”&

    • #226761
      Participant
      Topics: 12
      Replies: 1622
      Points: 2,560
      Helping Hand
      Rank: Community Hero

      I’ve actually had similar requirements in the past trying to create reports that showed changes (e.g. deltas) from one HR sync to another. The goal was to get who and what the current and old value was. Compare-Object tells us something is different, but not exactly what changed, only the new value. Be curious if any other experts know a better way to get this information:

      function Get-ObjectDelta {
          [CmdletBinding()]
          param (
              [psobject]$ReferenceObject,
              [psobject]$DifferenceObject,
              [string]$Key,
              [string]$KeyDesc,
              [switch]$ExcludeNewRecord
          )
          begin {
              $exclude = @()
          }
          process {
              $result = foreach ($prop in $ReferenceObject[0].PSObject.Properties.Name) {
      
                  if ($prop -eq $Key) {
                      $cmp = Compare-Object -ReferenceObject $ReferenceObject -DifferenceObject $DifferenceObject -Property $prop -PassThru | 
                             Select @{Name='Key';Expression={$_ | Select -ExpandProperty $Key}}, 
                                    @{Name='KeyDesc';Expression={$_ | Select -ExpandProperty $KeyDesc}}, 
                                    @{Name='Property';Expression={$prop}},
                                    @{Name='NewValue';Expression={'New Record'}}, 
                                    @{Name='OldValue';Expression={'New Record'}}
      
                      $cmp
      
                      $exclude += $cmp.Key
      
                  }
                  else {
                      $cmp = Compare-Object -ReferenceObject $ReferenceObject -DifferenceObject $DifferenceObject -Property $prop -PassThru | 
                             Select @{Name='Key';Expression={$_ | Select -ExpandProperty $Key}},
                                    @{Name='KeyDesc';Expression={$_ | Select -ExpandProperty $KeyDesc}},
                                    @{Name='Property';Expression={$prop}},
                                    @{Name='NewValue';Expression={$_ | Select -ExpandProperty $prop}}, 
                                    @{Name='OldValue';Expression={$currentKey=$_.$Key;$old | Where{$_.$Key -eq $currentKey} | Select -ExpandProperty $prop}}
                      
                      if ($ExcludeNewRecord) {
                          $cmp | Where{$exclude -notcontains $_.Key}
                      }
                      else {
                          $cmp
                      }
                  }
              }
          }
          end {
              $result
          }
      }
      
      
      $old = @"
      EmployeeId,Name,Department,Title,CostCenter
      434242,Joe Smith,Marketing,Analyst I,50000003421
      434245,Julie Johnson,Legal,Director,50000004444
      434248,Terry Tripp,Security,Manager,50000001234
      "@ | ConvertFrom-CSV
      
      
      $new = @"
      EmployeeId,Name,Department,Title,CostCenter
      434242,Joe Smith,Marketing,Analyst I,50000003421
      434245,Julie Johnson,Information,Director,50000004141
      434248,Terry Tripp,Security,Manager,50000001234
      434260,Sal Goodman,Legal,Analyst II,50000004444
      "@ | ConvertFrom-CSV
      
      $params = @{
          ReferenceObject  = $old 
          DifferenceObject = $new 
          Key              = 'EmployeeId'
          KeyDesc          = 'Name'
          ExcludeNewRecord = $true
      }
      
      Get-ObjectDelta @params
      

      Output:

      Key      : 434260
      KeyDesc  : Sal Goodman
      Property : EmployeeId
      NewValue : New Record
      OldValue : New Record
      
      Key      : 434245
      KeyDesc  : Julie Johnson
      Property : Department
      NewValue : Information
      OldValue : Legal
      
      Key      : 434245
      KeyDesc  : Julie Johnson
      Property : CostCenter
      NewValue : 50000004141
      OldValue : 50000004444
      

      I would create PowerBi reports from this data, which would also include a Date to see what happened during each sync. The function could be made more efficient by possibly running a Compare-Object to get only changed rows and then process the properties, but curious how long this would take against 1000 rows * 20 properties. Another option is to use SQL as well:

      Different Ways to Compare SQL Server Tables Schema and Data

Viewing 5 reply threads
  • You must be logged in to reply to this topic.