Author Posts

April 22, 2017 at 9:11 am

Specs

• OS: Windows 7
• PSVersion 3.0

Situation

Comparing Objects (About 30.000 items) in an array with 4 other big arrays.
All the arrays have a unique ComputerName Value, but all dataset have other data that's needs to be added/combined with the 1st array and sometimes edited in the process.

The Problem

Speed! The solution I have now is slow, and I am looking for an faster approach.

Tryed solutions

1. Compare-Object can't do the trick in this situation, or I might have missed an lesson.
2. ForEach in an Foreach.
—a. This is Slow.
3. [array]::indexof('Dataset2','UniqueData1')
—a. This is Faster than the ForEach in an Foreach
—b. Readability is an down side.

.INPUT

    Example DataSet 1 in CSV up untill 5000 items

Username;Computername
UniqueUser1;UniqueComputer1
UniqueUser2;UniqueComputer2
UniqueUser3;UniqueComputer3

    Example DataSet 2 in CSV up untill 5000 items

Username;KeyData
UniqueUser1;KeyData101
UniqueUser2;KeyData102
UniqueUser3;KeyData103

Code Sample 1

    ForEach in an ForEach

#region  Foreach in an Foreach
$DataSet1 = Import-Csv -Path C:\_Temp\DataSet1.csv
$DataSet2 = Import-Csv -Path C:\_Temp\DataSet1.csv

$DataSet1 | Add-Member -MemberType NoteProperty -Value '' -Name 'KeyData'

Measure-Command -Expression {

    Foreach ($CSVDataSet1_Line in $DataSet1) {
        Foreach ($CSVDataSet2_Line in $DataSet2) {
            
            If ($CSVDataSet1_Line.Username -EQ $CSVDataSet2_Line.Username) {
                $CSVDataSet1_Line.Keydata = $CSVDataSet2_Line.Keydata
            }# If
        }#Forech2
    }#Forech1

}#Measure

#endregion
Minutes           : 2
Seconds           : 47
Milliseconds      : 166

Code Sample 2

    [array]::indexof()

#region Index of

$DataSet1 = Import-Csv -Path C:\_Temp\DataSet1.csv
$DataSet2 = Import-Csv -Path C:\_Temp\DataSet1.csv

$DataSet1 | Add-Member -MemberType NoteProperty -Value '' -Name 'KeyData'

Measure-Command -Expression {

    Foreach ($CSVDataSet1_Line in $DataSet1) {
        $RecordID = [array]::indexof(
                                        $($DataSet2.username),
                                        $($CSVDataSet1_Line.username)
                                    )

    }#Foreach

    If (($RecordID -ne '-1') -and ($RecordID -ne '')) {
        If ($CSVDataSet1_Line.Username -EQ $DataSet2.GetValue($RecordID).Username) {
            $CSVDataSet1_Line.Keydata = $DataSet2.GetValue($RecordID).Keydata
        }
        
    }


}#Measure
Minutes           : 0
Seconds           : 40
Milliseconds      : 286
#endregion

.OUTPUT
Username;Computername;KeyData
UniqueUser1;UniqueComputer1;KeyData101
UniqueUser2;UniqueComputer2;KeyData102
UniqueUser3;UniqueComputer3;KeyData103

Has anyone else struggled with an similar problem and found an faster solution?

April 24, 2017 at 1:44 pm

I'm not really sure there is, short of custom-writing something in C# so you can get the benefit of compiled, vs interpreted, execution. Comparing huge sets is naturally memory- and processor-intensive. If there's a way to format the data so Compare-Object will work, that'd be the quickest solution. Honestly, 2-3 minutes isn't bad given the size of the data and the fact that you're in an interpreted, dynamic language.

April 24, 2017 at 2:55 pm

seems your data haven't complex structure, so you can try to convert arrays to hashes and compare keys

I'm try to simulate your case and what I get:

 D:\> $DataSet1 = 1..30000 | Foreach-Object { [PSCustomObject]@{UserName="User$_"; ComputerName="
Computer$_"; KeyData='placeholder' } }
 D:\> $DataSet2 = 1..5000 | Foreach-Object { $rnd = Get-Random -min 1 -max 30000; [PSCustomObject
]@{username="User$rnd"; KeyData="KeyData $_ $rnd" } }
 D:\> $DataSet1 | ForEach-Object -Begin { $hash=@{} } -Process { $hash[$_.username] = $_ }
 D:\> measure-command {
   foreach ($d2 in $DataSet2) {
     if ($hash.ContainsKey($d2.username)) {
       $hash[$d2.username].KeyData = $d2.KeyData
     }
   }
 }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 54
Ticks             : 545249
TotalDays         : 6,31075231481482E-07
TotalHours        : 1,51458055555556E-05
TotalMinutes      : 0,000908748333333333
TotalSeconds      : 0,0545249
TotalMilliseconds : 54,5249

 D:\> $result = $DataSet1 | ? { $_.keydata -ne 'placeholder' }
 D:\> $result.count
4616
 D:\> $result[0]

UserName ComputerName KeyData
-------- ------------ -------
User2    Computer2    KeyData 1785 2

Don't see to Results count, it lesser than 5000 because of rnd, but I think you will be impressed by speed 🙂

April 24, 2017 at 3:20 pm

seems your data haven't complex structure, so you can try to convert arrays to hashes and compare keys

I'm try to simulate your case and what I get:

D:\> $DataSet1 = 1..30000 | Foreach-Object { [PSCustomObject]@{UserName="User$_"; ComputerName="Computer$_"; KeyData='placeholder' } }
 D:\> $DataSet2 = 1..5000 | Foreach-Object { $rnd = Get-Random -min 1 -max 30000; [PSCustomObject]@{username="User$rnd"; KeyData="KeyData $_ $rnd" } }
 D:\> $DataSet1 | ForEach-Object -Begin { $hash=@{} } -Process { $hash[$_.username] = $_ }
 D:\> measure-command {
   foreach ($d2 in $DataSet2) {
     if ($hash.ContainsKey($d2.username)) {
       $hash[$d2.username].KeyData = $d2.KeyData
     }
   }
 }

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 54
Ticks             : 545249
TotalDays         : 6,31075231481482E-07
TotalHours        : 1,51458055555556E-05
TotalMinutes      : 0,000908748333333333
TotalSeconds      : 0,0545249
TotalMilliseconds : 54,5249

 D:\> $result = $DataSet1 | ? { $_.keydata -ne 'placeholder' }
 D:\> $result.count
4616
 D:\> $result[0]

UserName ComputerName KeyData
-------- ------------ -------
User2    Computer2    KeyData 1785 2

Don't look at Results count, it lesser than 5000 because of rnd, but I think you will be impressed by speed.
Even with hash creation it's less than a second

April 24, 2017 at 6:27 pm

Posting on behalf of someone who got flagged as spam:

seems your data haven't complex structure, so you can try to convert arrays to hashes and compare keys

I'm try to simulate your case and what I get:

D:\> $DataSet1 = 1..30000 Foreach-Object { PSCustomObject @{UserName="User$_"; ComputerName="
Computer$_"; KeyData='placeholder' } }
D:\> $DataSet2 = 1..5000 Foreach-Object { $rnd = Get-Random -min 1 -max 30000; PSCustomObject
@{username="User$rnd"; KeyData="KeyData $_ $rnd" } }
D:\> $DataSet1 ForEach-Object -Begin { $hash=@{} } -Process { $hash $_.username = $_ }
D:\> measure-command {
foreach ($d2 in $DataSet2) {
if ($hash.ContainsKey($d2.username)) {
$hash $d2.username .KeyData = $d2.KeyData
}
}
}

Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 54
Ticks : 545249
TotalDays : 6,31075231481482E-07
TotalHours : 1,51458055555556E-05
TotalMinutes : 0,000908748333333333
TotalSeconds : 0,0545249
TotalMilliseconds : 54,5249

D:\> $result = $DataSet1 ? { $_.keydata -ne 'placeholder' }
D:\> $result.count
4616
D:\> $result 0

UserName ComputerName KeyData
——– ———— ——-
User2 Computer2 KeyData 1785 2

Don't see to Results count, it lesser than 5000 because of rnd, but I think you will be impressed by speed 🙂

April 24, 2017 at 6:29 pm

Posting on behalf of someone who got flagged as spam:

seems your data haven't complex structure, so you can try to convert arrays to hashes and compare keys

I'm try to simulate your case and what I get:

D:\> $DataSet1 = 1..30000 Foreach-Object { PSCustomObject @{UserName="User$_"; ComputerName="
Computer$_"; KeyData='placeholder' } }
D:\> $DataSet2 = 1..5000 Foreach-Object { $rnd = Get-Random -min 1 -max 30000; PSCustomObject
@{username="User$rnd"; KeyData="KeyData $_ $rnd" } }
D:\> $DataSet1 ForEach-Object -Begin { $hash=@{} } -Process { $hash $_.username = $_ }
D:\> measure-command {
foreach ($d2 in $DataSet2) {
if ($hash.ContainsKey($d2.username)) {
$hash $d2.username .KeyData = $d2.KeyData
}
}
}

Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 54
Ticks : 545249
TotalDays : 6,31075231481482E-07
TotalHours : 1,51458055555556E-05
TotalMinutes : 0,000908748333333333
TotalSeconds : 0,0545249
TotalMilliseconds : 54,5249

D:\> $result = $DataSet1 ? { $_.keydata -ne 'placeholder' }
D:\> $result.count
4616
D:\> $result 0

UserName ComputerName KeyData
——– ———— ——-
User2 Computer2 KeyData 1785 2

Don't see to Results count, it lesser than 5000 because of rnd, but I think you will be impressed by speed 🙂

April 25, 2017 at 3:28 am

Here is the same concept as that reposted by Don on behalf of someone else, just a little cleaned up to work with CSVs and to not drop records that that don't need to combine data.

DataSet1.csv

Username,Computername
UniqueUser1,UniqueComputer1
UniqueUser2,UniqueComputer2
...
UniqueUser4999,UniqueComputer4999
UniqueUser5000,UniqueComputer5000

DataSet2.csv

UserName,KeyData
UniqueUser2613,KeyData2613
UniqueUser2736,KeyData2736
...
UniqueUser4313,KeyData4313
UniqueUser1695,KeyData1695

Code

Measure-Command {
    $d1 = Import-Csv "C:\Temp\DataSet1.csv"
    $d2 = Import-Csv "C:\Temp\DataSet2.csv"
    $hash = @{}

    $d1 | ForEach-Object {
        $hash[$_.UserName] = [pscustomobject]@{
            UserName = $_.UserName
            ComputerName = $_.ComputerName
            KeyData = $null
        }
    }

    $d2 | ForEach-Object {
        If ($hash[$_.UserName]) {
            $hash[$_.UserName].KeyData = $_.KeyData
        } Else {
            $hash[$_.UserName] = [pscustomobject]@{
                UserName = $_.UserName
                ComputerName = $_.ComputerName
                KeyData = $_.KeyData
            }
        }
    }

    $hash.values | Export-Csv "C:\Temp\CombinedDataSet.csv" -NoTypeInformation
}

Measure Results

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 238
Ticks             : 12380238
TotalDays         : 1.43289791666667E-05
TotalHours        : 0.0003438955
TotalMinutes      : 0.02063373
TotalSeconds      : 1.2380238
TotalMilliseconds : 1238.0238

CombinedDataSet.csv Results

"UserName","ComputerName","KeyData"
"UniqueUser2613","UniqueComputer2613","KeyData2613"
"UniqueUser2736","UniqueComputer2736","KeyData2736"
...
"UniqueUser4313","UniqueComputer4313","KeyData4313"
"UniqueUser1695","UniqueComputer1695","KeyData1695"

April 25, 2017 at 6:58 am

Big Thanx to the Spam poster! Too Bad the Formatting was off, but with the rewrite from Curtis I managed the create an working solution.
I Still need to rewrite the script and see if I can make other values like DateTime Work or workaround them, but the speed is just incredible!

@Don, ( Big fan btw, keep writing books and Pluralsight video's, there great! )
The example data took a couple of minutes, only the script I am running is taking about 30-60min.
DataSet1 = 14.000 – 17.000 items
And I need to add data from 3 other data sets where the ComputerName matches.
Dataset2 = 20.000 – 40.000
Dataset3 = 5.000 – 10.000
Dataset4 = 300 – 1000

Example solution for anyone reading the Article later on:

#Import Example Data
$DataSet1 = Import-Csv -Path C:\_Temp\DataSet1.csv -Delimiter ';'
$DataSet1 | Add-Member -MemberType NoteProperty -Value 'placeholder' -Name 'KeyData'
$DataSet2 = Import-Csv -Path C:\_Temp\DataSet2.csv -Delimiter ';'

#Create Empty Hash Table
$hash=@{}
 
#Creating $Hash with Data From $DataSet1
$DataSet1 | ForEach-Object -Process { $hash[$_.username] = $_ }
   #Name             Value                                                                                                                                                          
   #----             -----                                                                                                                                                          
   #UniqueUser1      @{Username=UniqueUser1; Computername=UniqueComputer1; KeyData=placeholder}   

measure-command {
    foreach ($d2 in $DataSet2) {
        if ($hash.ContainsKey($d2.username)) {
            $hash[$d2.username].KeyData = $d2.KeyData
        }
    }
}
#Seconds           : 0
#Milliseconds      : 35

#Outputting $Hash values to CSV File 
$hash.values | Export-Csv "C:\_Temp\CombinedDataSet.csv" -NoTypeInformation -Force -Delimiter ';' 

April 25, 2017 at 8:46 am

that spammer was me 🙂

In my experience I can say that with arrays about 100000 records this method works very well. But also some speedup for complete set of data can be achieved if $data | foreach {} pattern replaced to foreach($i in $data) {}

here is some speedup measurements for array filtering:

# Prepare to filter 50 records from 100000 by name with different metods:
#records array
$all = 1..100000 | %{  [PSCustomObject]@{ID=$_; Name="Name$_"; Guid=[Guid]::NewGuid().ToString()} }
#records to filter
$wn = 1..50 | %{ Get-Random -Minimum 1 -Maximum 99999 }
$wn = $all[$wn]
# filter by regex
$m = '^' + ($wn.Name -join '$|^') + '$'
$wn_name = $wn.Name
# filter by hash
$hash = @{}; $wn.Name | %{ $hash[$_] = 1 }
# Measurements:
# where-object + regex
 C:\> measure-command { $all | ?{ $_.Name -match $m} }
Seconds           : 3
Milliseconds      : 50
# where-object + contains
C:\> measure-command { $all | ?{ $wn_name -contains $_.Name } }
Seconds           : 3
Milliseconds      : 682
#where-object + hash 
C:\> measure-command { $all | ?{ $hash.ContainsKey($_.Name) } }
Seconds           : 2
Milliseconds      : 675
# .where method + hash
 C:\> measure-command { $all.where( { $hash.ContainsKey($_.Name) } ) }
Seconds           : 0
Milliseconds      : 870
# foreach + regex
C:\> measure-command { foreach ($a in $all) { if ( $a.Name -match $m ) { $a } } }
Seconds           : 0
Milliseconds      : 648
# foreach + contains
 C:\> measure-command { foreach ($a in $all) { if ( $wn_name -contains $a.Name ) { $a } } }
Seconds           : 1
Milliseconds      : 245
# foreach + hash
C:\> measure-command { foreach ($a in $all) { if ( $hash.ContainsKey($a.Name) ) { $a } } }
Seconds           : 0
Milliseconds      : 293

and...
I don't know what problem with DateTime, but it always can be converted from string witn [DateTine]::Parse() and [DateTime]::ParseExact()