Author Posts

July 1, 2015 at 11:20 am

I'm trying to come up with the most efficient way to compare two arrays where there could be any number of columns in each array, but we are matching on a specified column from each array. I want to retain all data from each array. Some arrays could contain duplicates. I am finding that when I compare arrays with more than a few thousand lines, the compare can start to take a long time, over an hour in some cases.

Is my Compare Function the most efficient way to do this?

Here is my sample code:

Function RJ-CombinedCompare() {
[CmdletBinding()]
PARAM([Parameter(Mandatory=$True)]$List1,$L1Match,$List2,$L2Match)
$List = $List1 | %{[PSCustomObject]@{L1Data = $_; L2Data = 'NA'}}
$List += $List2 | %{[PSCustomObject]@{L1Data = 'NA'; L2Data = $_}}
foreach ($Object in $List.L1Data.$L1Match + $List.L2Data.$L2Match | select -Unique) {
$Match1 = @()
$Match2 = @()
foreach ($Object1 in $List.L1Data -ne 'NA') {
if ($Object1.$L1Match -eq $Object) {$Match1 += $Object1}
}
foreach ($Object2 in $List.L2Data -ne 'NA') {
if ($Object2.$L2Match -eq $Object) {$Match2 += $Object2}
}
[PSCustomObject]@{MatchValue = $Object; L1Matches = $Match1.count; L2Matches = $Match2.count; List1 = $Match1; List2 = $Match2}
}
}

$List1 = @(
[PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
[PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
[PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
[PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
[PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
[PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
[PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
[PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
[PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
)

$List2 = @(
[PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
[PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
[PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
[PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
[PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
[PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
[PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
[PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
)

$Result = RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name

July 1, 2015 at 12:07 pm

Have you looked at Compare-Object?

$List1 = @(
 [PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
 [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
 [PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
 [PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
 [PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
 [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
 [PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
 [PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
 [PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
 )
 
$List2 = @(
 [PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
 [PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
 [PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
 [PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
 [PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
 [PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
 [PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
 [PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
 )

 Compare-Object -ReferenceObject $List1 -DifferenceObject $List2 -Property Place

July 1, 2015 at 12:46 pm

Thanks for the response,

That works to some degree, but it doesn't seem to maintain all the columns from each array. Can that be done with Compare-Object?

E.g:
Compare-Object -ReferenceObject $List1 -DifferenceObject $List2 -Property Place -IncludeEqual

Place SideIndicator
—– ————-
1 ==
1 ==
2 ==
6 ==
2 ==
5 =>
4 =>
7 =>
3 <= 3 <= 3 <= 6 <= With my Function, each Array's content is preserved in the result: RJ-CombinedCompare -List1 $List1 -L1Match Place -List2 $List2 -L2Match Place | ft MatchValue L1Matches L2Matches List1 List2 ———- ——— ——— —– —– 1 2 2 {@{Alias=1; Place=1; ... {@{Name=1; Place=1; S... 3 3 0 {@{Alias=2; Place=3; ... {} 2 2 2 {@{Alias=3; Place=2; ... {@{Name=1; Place=2; S... 6 2 1 {@{Alias=5; Place=6; ... {@{Name=12; Place=6; ... 5 0 1 {} {@{Name=1; Place=5; S... 4 0 1 {} {@{Name=2; Place=4; S... 7 0 1 {} {@{Name=2; Place=7; S...

July 1, 2015 at 11:12 pm

Dan, your way absolutely ineffective, because you combine both arrays in beginning (for what reason? ) $List = $List1 | %{[PSCustomObject]@{L1Data = $_; L2Data = 'NA'}}
$List += $List2 | %{[PSCustomObject]@{L1Data = 'NA'; L2Data = $_}}

and later never use it in combined form, but several times filter it for original arrays:
$Object in $List.L1Data.$L1Match +$List.L2Data.$L2Match
foreach ($Object1 in $List.L1Data -ne 'NA') {
if ($Object1.$L1Match -eq $Object) {$Match1 += $Object1}
and so on.

The code below 3 times more effective even on your sample. and should be more effective on large arrays. Btw, tell me how much..

#Require -Version 4.0
Function RJ-CombinedCompare() {
 [CmdletBinding()]
 PARAM(
	#Every parameter must be mandatory
	[Parameter(Mandatory=$True)]$List1,
	[Parameter(Mandatory=$True)]$L1Match,
	[Parameter(Mandatory=$True)]$List2,
	[Parameter(Mandatory=$True)]$L2Match)
 #Fill HASH with arrays of data from both arrays, hash keys is value to compare
 $hash = @{}
 foreach ($data in $List1) {
    $hash[$data.$L1Match] += ,$data
 }
 foreach ($data in $List2) {
    $hash[$data.$L2Match] += ,$data
 }
 # filter every hash value by existance of $L1Match field in data.
 # {$_.$L1Match} - subject to change if $L1Match property exists in both $List1 and $List2
 # or may be $null
 foreach ($kv in $hash.GetEnumerator()) {
	$m1, $m2 = $kv.Value.where( {$_.$L1Match}, 'Split')
    [PSCustomObject]@{
		MatchValue = $kv.Key
		L1Matches = $m1.Count
		L2Matches = $m2.Count
		List1 = $m1
		List2 = $m2
	}
 }
}

$List1 = @(
 [PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
 [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
 [PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
 [PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
 [PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
 [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
 [PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
 [PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
 [PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
 )

$List2 = @(
 [PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
 [PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
 [PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
 [PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
 [PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
 [PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
 [PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
 [PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
 )

RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name
#Speed measurement
measure-command { 1..10000 | %{ $Result = RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name } }

July 1, 2015 at 11:20 pm

should warn You that in my code List1 and List2 in result objects is not arrays but collections. But may be You do not even notice the difference 🙂

July 2, 2015 at 7:09 am

Hi Max,

Thanks so much for your assistance. In my tests, your Function is extremely fast. The only issue I am having is when I run the compare where I am Matching on columns that have the same name, in the result its combining the result into list 1. I'm trying to see how I can modify your Function to handle that case.

E.g.:


RJ-CombinedCompare -List1 $List1 -L1Match Place -List2 $List2 -L2Match Place | ft

                         MatchValue                           L1Matches                           L2Matches List1                               List2                             
                         ----------                           ---------                           --------- -----                               -----                             
                                  7                                   1                                   0 {@{Name=2; Place=7; Somthing=d4}}   {}                                
                                  6                                   3                                   0 {@{Alias=5; Place=6; Extra=e}, @... {}                                
                                  5                                   1                                   0 {@{Name=1; Place=5; Somthing=a1}}   {}                                
                                  4                                   1                                   0 {@{Name=2; Place=4; Somthing=a3}}   {}                                
                                  3                                   3                                   0 {@{Alias=2; Place=3; Extra=a}, @... {}                                
                                  2                                   4                                   0 {@{Alias=3; Place=2; Extra=c}, @... {}                                
                                  1                                   4                                   0 {@{Alias=1; Place=1; Extra=c}, @... {}                    

July 3, 2015 at 12:09 am

You can save data owner as

$hash[$data.$L1Match] += ,[pscustomobject]@{Owner='l1';Value=$data }
and for List2 too

use it here
$m1, $m2 = $kv.Value.where( {$_.Owner -eq 'l1'}, 'Split')

but then need to modify result assinging:
List1 = $m1.Value

I do not test it, correct any erors if i make it 🙂

July 3, 2015 at 12:15 am

for owner better use [bool] or [int] identifier, not a [string], I use it only for information purposes

July 7, 2015 at 10:02 am

Thanks for all you help Max.

Here's the final Function. It's a lot more than 3 times faster, my large compare went from 45min to 5 seconds!

Function RJ-CombinedCompare() {
    [CmdletBinding()]
    PARAM(
        #Every parameter must be mandatory
	    [Parameter(Mandatory=$True)]$List1,
	    [Parameter(Mandatory=$True)]$L1Match,
	    [Parameter(Mandatory=$True)]$List2,
	    [Parameter(Mandatory=$True)]$L2Match
    )
    #Fill HASH with arrays of data from both arrays, hash keys is value to compare
    $hash = @{}
    foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner='l1';Value=$($data)}}
    foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner='l2';Value=$($data)}}
    # filter every hash value by existance of $L1Match field in data.
    # {$_.$L1Match} - subject to change if $L1Match property exists in both $List1 and $List2
    # or may be $null
    foreach ($kv in $hash.GetEnumerator()) {
        $m1, $m2 = $kv.Value.where({$_.Owner -eq 'l1'}, 'Split')
        [PSCustomObject]@{
            MatchValue = $kv.Key
		    L1Matches = $m1.Count
		    L2Matches = $m2.Count
            L1MatchObject = $L1Match
            L2MatchObject = $L2Match
            List1 = $m1.Value
            List2 = $m2.Value
        }
    }
}

July 9, 2015 at 2:00 am

Congrats! 🙂

but change everywhere 'l1' and 'l2' to 1 and 2, it save you some more execution time and memory eliminating string allocation.

It's already memory hungry.

July 9, 2015 at 12:49 pm

Thank you Sir! Its done:


Function RJ-CombinedCompare() {
    [CmdletBinding()]
    PARAM(
	    [Parameter(Mandatory=$True)]$List1,
	    [Parameter(Mandatory=$True)]$L1Match,
	    [Parameter(Mandatory=$True)]$List2,
	    [Parameter(Mandatory=$True)]$L2Match
    )
    $hash = @{}
    foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner='1';Value=$($data)}}
    foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner='2';Value=$($data)}}
    foreach ($kv in $hash.GetEnumerator()) {
        $m1, $m2 = $kv.Value.where({$_.Owner -eq '1'}, 'Split')
        [PSCustomObject]@{
            MatchValue = $kv.Key
            L1Matches = $m1.Count
            L2Matches = $m2.Count
            L1MatchObject = $L1Match
            L2MatchObject = $L2Match
            List1 = $m1.Value
            List2 = $m2.Value
        }
    }
}

July 9, 2015 at 1:41 pm

Hello there!

A short while back, I borrowed some code from Lucio Silveira (MSFT) and Dave Wyatt (superstar) and duct taped together a modified Join-Object.

That link walks through using it in a number of scenarios. It has quickly becomer one of the 'most-used' functions in my PowerShell toolbox.

Cheers!

July 10, 2015 at 1:22 am

Dan, be more watchful 🙂

you sould replace Owner='1'; to Owner=1; Without quotes! to convert owner field from [string] to [int]
and of course $_.Owner -eq '1' to $_.Owner -eq 1

and if you gonna more faster use arraylist, instead of array as in Join-Object.
that modification get more time, but lit learn you well, if you need 🙂
because each = ,obj assignment recreates new array

September 16, 2016 at 8:06 am

The code works perfectly on v5. When running it on v3 I get the following exception:

Method invocation failed because [System.Management.Automation.PSCustomObject] doesn't contain a method named 'where'.
+         $m1, $m2 = $kv.Value.where({$_.Owner -eq 1}, 'Split')

How can I re-write the line

$m1, $m2 = $kv.Value.where({$_.List -eq 1}, 'Split')

to run it on v3 without changing the functionality? (Installing v5 ist not possible.)

  • This reply was modified 2 years, 1 month ago by  nickm.