Comparing two multi-dimensional arrays

This topic contains 13 replies, has 5 voices, and was last updated by  nickm 1 year, 3 months ago.

  • Author
    Posts
  • #27014

    Dan T
    Participant

    I'm trying to come up with the most efficient way to compare two arrays where there could be any number of columns in each array, but we are matching on a specified column from each array. I want to retain all data from each array. Some arrays could contain duplicates. I am finding that when I compare arrays with more than a few thousand lines, the compare can start to take a long time, over an hour in some cases.

    Is my Compare Function the most efficient way to do this?

    Here is my sample code:

    Function RJ-CombinedCompare() {
    [CmdletBinding()]
    PARAM([Parameter(Mandatory=$True)]$List1,$L1Match,$List2,$L2Match)
    $List = $List1 | %{[PSCustomObject]@{L1Data = $_; L2Data = 'NA'}}
    $List += $List2 | %{[PSCustomObject]@{L1Data = 'NA'; L2Data = $_}}
    foreach ($Object in $List.L1Data.$L1Match + $List.L2Data.$L2Match | select -Unique) {
    $Match1 = @()
    $Match2 = @()
    foreach ($Object1 in $List.L1Data -ne 'NA') {
    if ($Object1.$L1Match -eq $Object) {$Match1 += $Object1}
    }
    foreach ($Object2 in $List.L2Data -ne 'NA') {
    if ($Object2.$L2Match -eq $Object) {$Match2 += $Object2}
    }
    [PSCustomObject]@{MatchValue = $Object; L1Matches = $Match1.count; L2Matches = $Match2.count; List1 = $Match1; List2 = $Match2}
    }
    }

    $List1 = @(
    [PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
    [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
    [PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
    [PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
    [PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
    [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
    [PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
    [PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
    [PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
    )

    $List2 = @(
    [PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
    [PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
    [PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
    [PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
    [PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
    [PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
    [PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
    [PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
    )

    $Result = RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name

  • #27020

    Rob Simmers
    Participant

    Have you looked at Compare-Object?

    $List1 = @(
     [PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
     [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
     [PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
     [PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
     [PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
     [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
     [PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
     [PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
     [PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
     )
     
    $List2 = @(
     [PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
     [PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
     [PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
     [PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
     [PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
     [PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
     [PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
     [PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
     )
    
     Compare-Object -ReferenceObject $List1 -DifferenceObject $List2 -Property Place
    
  • #27021

    Dan T
    Participant

    Thanks for the response,

    That works to some degree, but it doesn't seem to maintain all the columns from each array. Can that be done with Compare-Object?

    E.g:
    Compare-Object -ReferenceObject $List1 -DifferenceObject $List2 -Property Place -IncludeEqual

    Place SideIndicator
    —– ————-
    1 ==
    1 ==
    2 ==
    6 ==
    2 ==
    5 =>
    4 =>
    7 =>
    3 <= 3 <= 3 <= 6 <= With my Function, each Array's content is preserved in the result: RJ-CombinedCompare -List1 $List1 -L1Match Place -List2 $List2 -L2Match Place | ft MatchValue L1Matches L2Matches List1 List2 ———- ——— ——— —– —– 1 2 2 {@{Alias=1; Place=1; ... {@{Name=1; Place=1; S... 3 3 0 {@{Alias=2; Place=3; ... {} 2 2 2 {@{Alias=3; Place=2; ... {@{Name=1; Place=2; S... 6 2 1 {@{Alias=5; Place=6; ... {@{Name=12; Place=6; ... 5 0 1 {} {@{Name=1; Place=5; S... 4 0 1 {} {@{Name=2; Place=4; S... 7 0 1 {} {@{Name=2; Place=7; S...

  • #27033

    Max Kozlov
    Participant

    Dan, your way absolutely ineffective, because you combine both arrays in beginning (for what reason? ) $List = $List1 | %{[PSCustomObject]@{L1Data = $_; L2Data = 'NA'}}
    $List += $List2 | %{[PSCustomObject]@{L1Data = 'NA'; L2Data = $_}}

    and later never use it in combined form, but several times filter it for original arrays:
    $Object in $List.L1Data.$L1Match +$List.L2Data.$L2Match
    foreach ($Object1 in $List.L1Data -ne 'NA') {
    if ($Object1.$L1Match -eq $Object) {$Match1 += $Object1}
    and so on.

    The code below 3 times more effective even on your sample. and should be more effective on large arrays. Btw, tell me how much..

    #Require -Version 4.0
    Function RJ-CombinedCompare() {
     [CmdletBinding()]
     PARAM(
    	#Every parameter must be mandatory
    	[Parameter(Mandatory=$True)]$List1,
    	[Parameter(Mandatory=$True)]$L1Match,
    	[Parameter(Mandatory=$True)]$List2,
    	[Parameter(Mandatory=$True)]$L2Match)
     #Fill HASH with arrays of data from both arrays, hash keys is value to compare
     $hash = @{}
     foreach ($data in $List1) {
        $hash[$data.$L1Match] += ,$data
     }
     foreach ($data in $List2) {
        $hash[$data.$L2Match] += ,$data
     }
     # filter every hash value by existance of $L1Match field in data.
     # {$_.$L1Match} - subject to change if $L1Match property exists in both $List1 and $List2
     # or may be $null
     foreach ($kv in $hash.GetEnumerator()) {
    	$m1, $m2 = $kv.Value.where( {$_.$L1Match}, 'Split')
        [PSCustomObject]@{
    		MatchValue = $kv.Key
    		L1Matches = $m1.Count
    		L2Matches = $m2.Count
    		List1 = $m1
    		List2 = $m2
    	}
     }
    }
    
    $List1 = @(
     [PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
     [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
     [PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
     [PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
     [PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
     [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
     [PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
     [PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
     [PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
     )
    
    $List2 = @(
     [PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
     [PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
     [PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
     [PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
     [PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
     [PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
     [PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
     [PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
     )
    
    RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name
    #Speed measurement
    measure-command { 1..10000 | %{ $Result = RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name } }
    
  • #27034

    Max Kozlov
    Participant

    should warn You that in my code List1 and List2 in result objects is not arrays but collections. But may be You do not even notice the difference 🙂

  • #27073

    Dan T
    Participant

    Hi Max,

    Thanks so much for your assistance. In my tests, your Function is extremely fast. The only issue I am having is when I run the compare where I am Matching on columns that have the same name, in the result its combining the result into list 1. I'm trying to see how I can modify your Function to handle that case.

    E.g.:

    
    RJ-CombinedCompare -List1 $List1 -L1Match Place -List2 $List2 -L2Match Place | ft
    
                             MatchValue                           L1Matches                           L2Matches List1                               List2                             
                             ----------                           ---------                           --------- -----                               -----                             
                                      7                                   1                                   0 {@{Name=2; Place=7; Somthing=d4}}   {}                                
                                      6                                   3                                   0 {@{Alias=5; Place=6; Extra=e}, @... {}                                
                                      5                                   1                                   0 {@{Name=1; Place=5; Somthing=a1}}   {}                                
                                      4                                   1                                   0 {@{Name=2; Place=4; Somthing=a3}}   {}                                
                                      3                                   3                                   0 {@{Alias=2; Place=3; Extra=a}, @... {}                                
                                      2                                   4                                   0 {@{Alias=3; Place=2; Extra=c}, @... {}                                
                                      1                                   4                                   0 {@{Alias=1; Place=1; Extra=c}, @... {}                    
    
  • #27090

    Max Kozlov
    Participant

    You can save data owner as

    $hash[$data.$L1Match] += ,[pscustomobject]@{Owner='l1';Value=$data }
    and for List2 too

    use it here
    $m1, $m2 = $kv.Value.where( {$_.Owner -eq 'l1'}, 'Split')

    but then need to modify result assinging:
    List1 = $m1.Value

    I do not test it, correct any erors if i make it 🙂

  • #27091

    Max Kozlov
    Participant

    for owner better use [bool] or [int] identifier, not a [string], I use it only for information purposes

  • #27277

    Dan T
    Participant

    Thanks for all you help Max.

    Here's the final Function. It's a lot more than 3 times faster, my large compare went from 45min to 5 seconds!

    Function RJ-CombinedCompare() {
        [CmdletBinding()]
        PARAM(
            #Every parameter must be mandatory
    	    [Parameter(Mandatory=$True)]$List1,
    	    [Parameter(Mandatory=$True)]$L1Match,
    	    [Parameter(Mandatory=$True)]$List2,
    	    [Parameter(Mandatory=$True)]$L2Match
        )
        #Fill HASH with arrays of data from both arrays, hash keys is value to compare
        $hash = @{}
        foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner='l1';Value=$($data)}}
        foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner='l2';Value=$($data)}}
        # filter every hash value by existance of $L1Match field in data.
        # {$_.$L1Match} - subject to change if $L1Match property exists in both $List1 and $List2
        # or may be $null
        foreach ($kv in $hash.GetEnumerator()) {
            $m1, $m2 = $kv.Value.where({$_.Owner -eq 'l1'}, 'Split')
            [PSCustomObject]@{
                MatchValue = $kv.Key
    		    L1Matches = $m1.Count
    		    L2Matches = $m2.Count
                L1MatchObject = $L1Match
                L2MatchObject = $L2Match
                List1 = $m1.Value
                List2 = $m2.Value
            }
        }
    }
    
  • #27331

    Max Kozlov
    Participant

    Congrats! 🙂

    but change everywhere 'l1' and 'l2' to 1 and 2, it save you some more execution time and memory eliminating string allocation.

    It's already memory hungry.

  • #27359

    Dan T
    Participant

    Thank you Sir! Its done:

    
    Function RJ-CombinedCompare() {
        [CmdletBinding()]
        PARAM(
    	    [Parameter(Mandatory=$True)]$List1,
    	    [Parameter(Mandatory=$True)]$L1Match,
    	    [Parameter(Mandatory=$True)]$List2,
    	    [Parameter(Mandatory=$True)]$L2Match
        )
        $hash = @{}
        foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner='1';Value=$($data)}}
        foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner='2';Value=$($data)}}
        foreach ($kv in $hash.GetEnumerator()) {
            $m1, $m2 = $kv.Value.where({$_.Owner -eq '1'}, 'Split')
            [PSCustomObject]@{
                MatchValue = $kv.Key
                L1Matches = $m1.Count
                L2Matches = $m2.Count
                L1MatchObject = $L1Match
                L2MatchObject = $L2Match
                List1 = $m1.Value
                List2 = $m2.Value
            }
        }
    }
    
    
  • #27360

    Warren Frame
    Participant

    Hello there!

    A short while back, I borrowed some code from Lucio Silveira (MSFT) and Dave Wyatt (superstar) and duct taped together a modified Join-Object.

    That link walks through using it in a number of scenarios. It has quickly becomer one of the 'most-used' functions in my PowerShell toolbox.

    Cheers!

  • #27380

    Max Kozlov
    Participant

    Dan, be more watchful 🙂

    you sould replace Owner='1'; to Owner=1; Without quotes! to convert owner field from [string] to [int]
    and of course $_.Owner -eq '1' to $_.Owner -eq 1

    and if you gonna more faster use arraylist, instead of array as in Join-Object.
    that modification get more time, but lit learn you well, if you need 🙂
    because each = ,obj assignment recreates new array

  • #54198

    nickm
    Participant

    The code works perfectly on v5. When running it on v3 I get the following exception:

    Method invocation failed because [System.Management.Automation.PSCustomObject] doesn't contain a method named 'where'.
    +         $m1, $m2 = $kv.Value.where({$_.Owner -eq 1}, 'Split')
    

    How can I re-write the line

    $m1, $m2 = $kv.Value.where({$_.List -eq 1}, 'Split')
    

    to run it on v3 without changing the functionality? (Installing v5 ist not possible.)

    • This reply was modified 1 year, 3 months ago by  nickm.

You must be logged in to reply to this topic.