ForEach in an Foreach / Comparing multiple Arrays

This topic contains 8 replies, has 4 voices, and was last updated by Profile photo of Max Kozlov Max Kozlov 2 months, 4 weeks ago.

  • Author
    Posts
  • #69253
    Profile photo of Juliën Zweverink
    Juliën Zweverink
    Participant

    Specs

    • OS: Windows 7
    • PSVersion 3.0

    Situation

    Comparing Objects (About 30.000 items) in an array with 4 other big arrays.
    All the arrays have a unique ComputerName Value, but all dataset have other data that's needs to be added/combined with the 1st array and sometimes edited in the process.

    The Problem

    Speed! The solution I have now is slow, and I am looking for an faster approach.

    Tryed solutions

    1. Compare-Object can't do the trick in this situation, or I might have missed an lesson.
    2. ForEach in an Foreach.
    —a. This is Slow.
    3. [array]::indexof('Dataset2','UniqueData1')
    —a. This is Faster than the ForEach in an Foreach
    —b. Readability is an down side.

    .INPUT

      Example DataSet 1 in CSV up untill 5000 items

    Username;Computername
    UniqueUser1;UniqueComputer1
    UniqueUser2;UniqueComputer2
    UniqueUser3;UniqueComputer3

      Example DataSet 2 in CSV up untill 5000 items

    Username;KeyData
    UniqueUser1;KeyData101
    UniqueUser2;KeyData102
    UniqueUser3;KeyData103

    Code Sample 1

      ForEach in an ForEach

    #region  Foreach in an Foreach
    $DataSet1 = Import-Csv -Path C:\_Temp\DataSet1.csv
    $DataSet2 = Import-Csv -Path C:\_Temp\DataSet1.csv
    
    $DataSet1 | Add-Member -MemberType NoteProperty -Value '' -Name 'KeyData'
    
    Measure-Command -Expression {
    
        Foreach ($CSVDataSet1_Line in $DataSet1) {
            Foreach ($CSVDataSet2_Line in $DataSet2) {
                
                If ($CSVDataSet1_Line.Username -EQ $CSVDataSet2_Line.Username) {
                    $CSVDataSet1_Line.Keydata = $CSVDataSet2_Line.Keydata
                }# If
            }#Forech2
        }#Forech1
    
    }#Measure
    
    #endregion
    Minutes           : 2
    Seconds           : 47
    Milliseconds      : 166
    
    

    Code Sample 2

      [array]::indexof()

    #region Index of
    
    $DataSet1 = Import-Csv -Path C:\_Temp\DataSet1.csv
    $DataSet2 = Import-Csv -Path C:\_Temp\DataSet1.csv
    
    $DataSet1 | Add-Member -MemberType NoteProperty -Value '' -Name 'KeyData'
    
    Measure-Command -Expression {
    
        Foreach ($CSVDataSet1_Line in $DataSet1) {
            $RecordID = [array]::indexof(
                                            $($DataSet2.username),
                                            $($CSVDataSet1_Line.username)
                                        )
    
        }#Foreach
    
        If (($RecordID -ne '-1') -and ($RecordID -ne '')) {
            If ($CSVDataSet1_Line.Username -EQ $DataSet2.GetValue($RecordID).Username) {
                $CSVDataSet1_Line.Keydata = $DataSet2.GetValue($RecordID).Keydata
            }
            
        }
    
    
    }#Measure
    Minutes           : 0
    Seconds           : 40
    Milliseconds      : 286
    #endregion
    

    .OUTPUT
    Username;Computername;KeyData
    UniqueUser1;UniqueComputer1;KeyData101
    UniqueUser2;UniqueComputer2;KeyData102
    UniqueUser3;UniqueComputer3;KeyData103

    Has anyone else struggled with an similar problem and found an faster solution?

  • #69370
    Profile photo of Don Jones
    Don Jones
    Keymaster

    I'm not really sure there is, short of custom-writing something in C# so you can get the benefit of compiled, vs interpreted, execution. Comparing huge sets is naturally memory- and processor-intensive. If there's a way to format the data so Compare-Object will work, that'd be the quickest solution. Honestly, 2-3 minutes isn't bad given the size of the data and the fact that you're in an interpreted, dynamic language.

  • #69396
    Profile photo of Max Kozlov
    Max Kozlov
    Participant

    seems your data haven't complex structure, so you can try to convert arrays to hashes and compare keys

    I'm try to simulate your case and what I get:

     D:\> $DataSet1 = 1..30000 | Foreach-Object { [PSCustomObject]@{UserName="User$_"; ComputerName="
    Computer$_"; KeyData='placeholder' } }
     D:\> $DataSet2 = 1..5000 | Foreach-Object { $rnd = Get-Random -min 1 -max 30000; [PSCustomObject
    ]@{username="User$rnd"; KeyData="KeyData $_ $rnd" } }
     D:\> $DataSet1 | ForEach-Object -Begin { $hash=@{} } -Process { $hash[$_.username] = $_ }
     D:\> measure-command {
       foreach ($d2 in $DataSet2) {
         if ($hash.ContainsKey($d2.username)) {
           $hash[$d2.username].KeyData = $d2.KeyData
         }
       }
     }
    
    
    Days              : 0
    Hours             : 0
    Minutes           : 0
    Seconds           : 0
    Milliseconds      : 54
    Ticks             : 545249
    TotalDays         : 6,31075231481482E-07
    TotalHours        : 1,51458055555556E-05
    TotalMinutes      : 0,000908748333333333
    TotalSeconds      : 0,0545249
    TotalMilliseconds : 54,5249
    
     D:\> $result = $DataSet1 | ? { $_.keydata -ne 'placeholder' }
     D:\> $result.count
    4616
     D:\> $result[0]
    
    UserName ComputerName KeyData
    -------- ------------ -------
    User2    Computer2    KeyData 1785 2
    

    Don't see to Results count, it lesser than 5000 because of rnd, but I think you will be impressed by speed 🙂

  • #69403
    Profile photo of Max Kozlov
    Max Kozlov
    Participant

    seems your data haven't complex structure, so you can try to convert arrays to hashes and compare keys

    I'm try to simulate your case and what I get:

    D:\> $DataSet1 = 1..30000 | Foreach-Object { [PSCustomObject]@{UserName="User$_"; ComputerName="Computer$_"; KeyData='placeholder' } }
     D:\> $DataSet2 = 1..5000 | Foreach-Object { $rnd = Get-Random -min 1 -max 30000; [PSCustomObject]@{username="User$rnd"; KeyData="KeyData $_ $rnd" } }
     D:\> $DataSet1 | ForEach-Object -Begin { $hash=@{} } -Process { $hash[$_.username] = $_ }
     D:\> measure-command {
       foreach ($d2 in $DataSet2) {
         if ($hash.ContainsKey($d2.username)) {
           $hash[$d2.username].KeyData = $d2.KeyData
         }
       }
     }
    
    Days              : 0
    Hours             : 0
    Minutes           : 0
    Seconds           : 0
    Milliseconds      : 54
    Ticks             : 545249
    TotalDays         : 6,31075231481482E-07
    TotalHours        : 1,51458055555556E-05
    TotalMinutes      : 0,000908748333333333
    TotalSeconds      : 0,0545249
    TotalMilliseconds : 54,5249
    
     D:\> $result = $DataSet1 | ? { $_.keydata -ne 'placeholder' }
     D:\> $result.count
    4616
     D:\> $result[0]
    
    UserName ComputerName KeyData
    -------- ------------ -------
    User2    Computer2    KeyData 1785 2
    

    Don't look at Results count, it lesser than 5000 because of rnd, but I think you will be impressed by speed.
    Even with hash creation it's less than a second

  • #69421
    Profile photo of Don Jones
    Don Jones
    Keymaster

    Posting on behalf of someone who got flagged as spam:

    seems your data haven't complex structure, so you can try to convert arrays to hashes and compare keys

    I'm try to simulate your case and what I get:

    D:\> $DataSet1 = 1..30000 Foreach-Object { PSCustomObject @{UserName="User$_"; ComputerName="
    Computer$_"; KeyData='placeholder' } }
    D:\> $DataSet2 = 1..5000 Foreach-Object { $rnd = Get-Random -min 1 -max 30000; PSCustomObject
    @{username="User$rnd"; KeyData="KeyData $_ $rnd" } }
    D:\> $DataSet1 ForEach-Object -Begin { $hash=@{} } -Process { $hash $_.username = $_ }
    D:\> measure-command {
    foreach ($d2 in $DataSet2) {
    if ($hash.ContainsKey($d2.username)) {
    $hash $d2.username .KeyData = $d2.KeyData
    }
    }
    }

    Days : 0
    Hours : 0
    Minutes : 0
    Seconds : 0
    Milliseconds : 54
    Ticks : 545249
    TotalDays : 6,31075231481482E-07
    TotalHours : 1,51458055555556E-05
    TotalMinutes : 0,000908748333333333
    TotalSeconds : 0,0545249
    TotalMilliseconds : 54,5249

    D:\> $result = $DataSet1 ? { $_.keydata -ne 'placeholder' }
    D:\> $result.count
    4616
    D:\> $result 0

    UserName ComputerName KeyData
    ——– ———— ——-
    User2 Computer2 KeyData 1785 2

    Don't see to Results count, it lesser than 5000 because of rnd, but I think you will be impressed by speed 🙂

  • #69424
    Profile photo of Don Jones
    Don Jones
    Keymaster

    Posting on behalf of someone who got flagged as spam:

    seems your data haven't complex structure, so you can try to convert arrays to hashes and compare keys

    I'm try to simulate your case and what I get:

    D:\> $DataSet1 = 1..30000 Foreach-Object { PSCustomObject @{UserName="User$_"; ComputerName="
    Computer$_"; KeyData='placeholder' } }
    D:\> $DataSet2 = 1..5000 Foreach-Object { $rnd = Get-Random -min 1 -max 30000; PSCustomObject
    @{username="User$rnd"; KeyData="KeyData $_ $rnd" } }
    D:\> $DataSet1 ForEach-Object -Begin { $hash=@{} } -Process { $hash $_.username = $_ }
    D:\> measure-command {
    foreach ($d2 in $DataSet2) {
    if ($hash.ContainsKey($d2.username)) {
    $hash $d2.username .KeyData = $d2.KeyData
    }
    }
    }

    Days : 0
    Hours : 0
    Minutes : 0
    Seconds : 0
    Milliseconds : 54
    Ticks : 545249
    TotalDays : 6,31075231481482E-07
    TotalHours : 1,51458055555556E-05
    TotalMinutes : 0,000908748333333333
    TotalSeconds : 0,0545249
    TotalMilliseconds : 54,5249

    D:\> $result = $DataSet1 ? { $_.keydata -ne 'placeholder' }
    D:\> $result.count
    4616
    D:\> $result 0

    UserName ComputerName KeyData
    ——– ———— ——-
    User2 Computer2 KeyData 1785 2

    Don't see to Results count, it lesser than 5000 because of rnd, but I think you will be impressed by speed 🙂

  • #69442
    Profile photo of Curtis Smith
    Curtis Smith
    Participant

    Here is the same concept as that reposted by Don on behalf of someone else, just a little cleaned up to work with CSVs and to not drop records that that don't need to combine data.

    DataSet1.csv

    Username,Computername
    UniqueUser1,UniqueComputer1
    UniqueUser2,UniqueComputer2
    ...
    UniqueUser4999,UniqueComputer4999
    UniqueUser5000,UniqueComputer5000

    DataSet2.csv

    UserName,KeyData
    UniqueUser2613,KeyData2613
    UniqueUser2736,KeyData2736
    ...
    UniqueUser4313,KeyData4313
    UniqueUser1695,KeyData1695

    Code

    Measure-Command {
        $d1 = Import-Csv "C:\Temp\DataSet1.csv"
        $d2 = Import-Csv "C:\Temp\DataSet2.csv"
        $hash = @{}
    
        $d1 | ForEach-Object {
            $hash[$_.UserName] = [pscustomobject]@{
                UserName = $_.UserName
                ComputerName = $_.ComputerName
                KeyData = $null
            }
        }
    
        $d2 | ForEach-Object {
            If ($hash[$_.UserName]) {
                $hash[$_.UserName].KeyData = $_.KeyData
            } Else {
                $hash[$_.UserName] = [pscustomobject]@{
                    UserName = $_.UserName
                    ComputerName = $_.ComputerName
                    KeyData = $_.KeyData
                }
            }
        }
    
        $hash.values | Export-Csv "C:\Temp\CombinedDataSet.csv" -NoTypeInformation
    }

    Measure Results

    Days              : 0
    Hours             : 0
    Minutes           : 0
    Seconds           : 1
    Milliseconds      : 238
    Ticks             : 12380238
    TotalDays         : 1.43289791666667E-05
    TotalHours        : 0.0003438955
    TotalMinutes      : 0.02063373
    TotalSeconds      : 1.2380238
    TotalMilliseconds : 1238.0238

    CombinedDataSet.csv Results

    "UserName","ComputerName","KeyData"
    "UniqueUser2613","UniqueComputer2613","KeyData2613"
    "UniqueUser2736","UniqueComputer2736","KeyData2736"
    ...
    "UniqueUser4313","UniqueComputer4313","KeyData4313"
    "UniqueUser1695","UniqueComputer1695","KeyData1695"
  • #69445
    Profile photo of Juliën Zweverink
    Juliën Zweverink
    Participant

    Big Thanx to the Spam poster! Too Bad the Formatting was off, but with the rewrite from Curtis I managed the create an working solution.
    I Still need to rewrite the script and see if I can make other values like DateTime Work or workaround them, but the speed is just incredible!

    @Don, ( Big fan btw, keep writing books and Pluralsight video's, there great! )
    The example data took a couple of minutes, only the script I am running is taking about 30-60min.
    DataSet1 = 14.000 – 17.000 items
    And I need to add data from 3 other data sets where the ComputerName matches.
    Dataset2 = 20.000 – 40.000
    Dataset3 = 5.000 – 10.000
    Dataset4 = 300 – 1000

    Example solution for anyone reading the Article later on:

    #Import Example Data
    $DataSet1 = Import-Csv -Path C:\_Temp\DataSet1.csv -Delimiter ';'
    $DataSet1 | Add-Member -MemberType NoteProperty -Value 'placeholder' -Name 'KeyData'
    $DataSet2 = Import-Csv -Path C:\_Temp\DataSet2.csv -Delimiter ';'
    
    #Create Empty Hash Table
    $hash=@{}
     
    #Creating $Hash with Data From $DataSet1
    $DataSet1 | ForEach-Object -Process { $hash[$_.username] = $_ }
       #Name             Value                                                                                                                                                          
       #----             -----                                                                                                                                                          
       #UniqueUser1      @{Username=UniqueUser1; Computername=UniqueComputer1; KeyData=placeholder}   
    
    measure-command {
        foreach ($d2 in $DataSet2) {
            if ($hash.ContainsKey($d2.username)) {
                $hash[$d2.username].KeyData = $d2.KeyData
            }
        }
    }
    #Seconds           : 0
    #Milliseconds      : 35
    
    #Outputting $Hash values to CSV File 
    $hash.values | Export-Csv "C:\_Temp\CombinedDataSet.csv" -NoTypeInformation -Force -Delimiter ';' 
    
    
  • #69448
    Profile photo of Max Kozlov
    Max Kozlov
    Participant

    that spammer was me 🙂

    In my experience I can say that with arrays about 100000 records this method works very well. But also some speedup for complete set of data can be achieved if $data | foreach {} pattern replaced to foreach($i in $data) {}

    here is some speedup measurements for array filtering:

    # Prepare to filter 50 records from 100000 by name with different metods:
    #records array
    $all = 1..100000 | %{  [PSCustomObject]@{ID=$_; Name="Name$_"; Guid=[Guid]::NewGuid().ToString()} }
    #records to filter
    $wn = 1..50 | %{ Get-Random -Minimum 1 -Maximum 99999 }
    $wn = $all[$wn]
    # filter by regex
    $m = '^' + ($wn.Name -join '$|^') + '$'
    $wn_name = $wn.Name
    # filter by hash
    $hash = @{}; $wn.Name | %{ $hash[$_] = 1 }
    # Measurements:
    # where-object + regex
     C:\> measure-command { $all | ?{ $_.Name -match $m} }
    Seconds           : 3
    Milliseconds      : 50
    # where-object + contains
    C:\> measure-command { $all | ?{ $wn_name -contains $_.Name } }
    Seconds           : 3
    Milliseconds      : 682
    #where-object + hash 
    C:\> measure-command { $all | ?{ $hash.ContainsKey($_.Name) } }
    Seconds           : 2
    Milliseconds      : 675
    # .where method + hash
     C:\> measure-command { $all.where( { $hash.ContainsKey($_.Name) } ) }
    Seconds           : 0
    Milliseconds      : 870
    # foreach + regex
    C:\> measure-command { foreach ($a in $all) { if ( $a.Name -match $m ) { $a } } }
    Seconds           : 0
    Milliseconds      : 648
    # foreach + contains
     C:\> measure-command { foreach ($a in $all) { if ( $wn_name -contains $a.Name ) { $a } } }
    Seconds           : 1
    Milliseconds      : 245
    # foreach + hash
    C:\> measure-command { foreach ($a in $all) { if ( $hash.ContainsKey($a.Name) ) { $a } } }
    Seconds           : 0
    Milliseconds      : 293
    

    and...
    I don't know what problem with DateTime, but it always can be converted from string witn [DateTine]::Parse() and [DateTime]::ParseExact()

You must be logged in to reply to this topic.