Merging PSCustomObjects. Is there a better way?

This topic contains 3 replies, has 3 voices, and was last updated by Profile photo of Steve Steve 2 months ago.

  • Author
    Posts
  • #62506
    Profile photo of Steve
    Steve
    Participant

    Hi all,

    So I have a very large data set that takes some text input and turns it into a PSCustomObject. I then want to add that object to my list of all objects unless it exists already, then I just want to add one member of the new object to a member of an existing object in the list. So as an example, here is my data structure and execution:

    $allObjs = New-Object System.Collections.ArrayList
    
    $obj = [pscustomobject]@{
        Name = ""
        Gender = ""
        Size = ""
        Value = 0 }
    
    #...
    #Do some stuff, populate each key at different times....
    $obj.Name = "Frankie"
    #...
    $obj.Size = "L"
    $obj.Value = 14993
    #...
    $obj.Gender = "Male"
    
    $allObjs.Add($obj)
    

    Now this works as expected. Only problem is that it can create duplicates. Name, Size, and Gender the keys that become the fingerprint for a specific object. The variance is the Value attribute. So let's say "Frankie" is in multiple books having different "Values" assigned to him. I want to add those up across all books containing "Frankie".

    I came up with this function that merges any duplicates that somehow got added into the list and merges their "Value" keys with some basic and primitive error checking.

    FUNCTION Add_Obj {
        PARAM ( [pscustomobject]$obj )
        
        IF ($obj -eq $null -Or $obj -isnot [pscustomobject]) {return}
    
        $tempObj = $allObjs | Where-Object {($_.Name -eq $obj.Name) -and ($_.Gender -eq $obj.Gender) -and ($_.Size -eq $obj.Size)}
    
        IF ($tempObj -eq $null) { [void]$allObjs.Add($Obj)}
        ELSE {
            IF ($tempObj -is [array]) {
                $newObj = [pscustomobject]@{Name="";Gender="";Size="";Value=0}
                FOREACH ($dup in $tempObj) {
                    $allObjs.Remove($dup)
                    $newObj.Name = $dup.Name
                    $newObj.Gender = $dup.Gender
                    $newObj.Size = $dup.Size
                    $newObj.Value += $dup.Value
                }
                [void]$allObjs.Add($newObj)
            }    
            ELSEIF ($tempObj -is [pscustomobject]) {
                $tempObj.Value += $obj.Value
            }
        }
    }
    
    ##### Output:
    PS C:\Users\Steve> $allObjs = New-Object System.Collections.ArrayList
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "Frankie";Gender = "Male";Size = "L";Value = 1337 })
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "Dude";Gender = "Male";Size = "L";Value = 100000 })
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "Frank";Gender = "Male";Size = "S";Value = 5002365 })
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "Jane";Gender = "Female";Size = "S";Value = 69696969 })
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 42 })
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 420 })
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 4200 })
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 42000 })
    PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 420000 })
    PS C:\Users\Steve>
    PS C:\Users\Steve> $allObjs
    
    Name    Gender Size    Value
    ----    ------ ----    -----
    Frankie Male   L        1337
    Dude    Male   L      100000
    Frank   Male   S     5002365
    Jane    Female S    69696969
    June    Female L      466662
    
    

    Now I can spam this function all day long and not end up with multiple "Frankie"'s who are Size L and Gender Male. It just seems like there has to be a better way of doing this. The code above seems inefficient. This is only an example mind you. My actual data set has 2 keys that are my fingerprint and 9 keys that will either be added to or subtracted from depending on the situation. I'm parsing a few thousand lines of text because some vendor who shall remain nameless can't expose these variables via JSON/RESTFUL API.... anyways, I need this to be fast.

    Thank you for your time,
    Steve

  • #62565
    Profile photo of Ron
    Ron
    Participant

    If you will always have to deal with removing any existing multiple duplicates, there's not much that you can do that's significantly better than what you have. You could just add up all of the existing duplicates and remove them, then add the new summary entry, rather than repeatedly removing/adding entries for each duplicate. If you could start fresh with all existing duplicates cleaned up, you wouldn't need to remove any entries, just add/update.

  • #62575
    Profile photo of random commandline
    random commandline
    Participant

    My sample code will compare new objects from dataset and all list then update allobj.

    $all = "Name Gender Size Value
    Frankie Male L 12
    Jane Female S 12
    Joe Male M 13
    "
    $dataset = "Name Gender Size Value
    Frankie Male L 12
    Jane Female S 12
    Joe Male M 13
    Frankie Male L 12
    Frankie Male L 14
    Jane Female S 12
    Joe Male M 13
    Jane Male M 13
    Frankie Male L 13
    Joe Male M 14
    "
    $allobj = New-Object System.Collections.ArrayList
    $obj1 = $all | ConvertFrom-Csv -Delimiter " "
    $obj2 = $dataset | ConvertFrom-Csv -Delimiter " "
    
    $addobj = Compare-Object -ReferenceObject $obj1 -DifferenceObject $obj2 -PassThru | 
    Select-Object * -ExcludeProperty sideindicator
    
    $allobj.add($addobj) | Out-Null
    $allobj
    
  • #62599
    Profile photo of Steve
    Steve
    Participant

    Ron,

    The de-dupe portion of the code isn't explicitly required. It's mainly there as a worse-case scenario in case I make a boo-boo and add data improperly. I suppose I could just add everything and de-dupe outside the loop...

    Random,
    I didn't think of doing that... I'll give it a shot and see if there are any speed penalties. Great idea!

    -Steve

You must be logged in to reply to this topic.