Author Posts

January 24, 2017 at 7:25 am

Hi all,

So I have a very large data set that takes some text input and turns it into a PSCustomObject. I then want to add that object to my list of all objects unless it exists already, then I just want to add one member of the new object to a member of an existing object in the list. So as an example, here is my data structure and execution:

$allObjs = New-Object System.Collections.ArrayList

$obj = [pscustomobject]@{
    Name = ""
    Gender = ""
    Size = ""
    Value = 0 }

#...
#Do some stuff, populate each key at different times....
$obj.Name = "Frankie"
#...
$obj.Size = "L"
$obj.Value = 14993
#...
$obj.Gender = "Male"

$allObjs.Add($obj)

Now this works as expected. Only problem is that it can create duplicates. Name, Size, and Gender the keys that become the fingerprint for a specific object. The variance is the Value attribute. So let's say "Frankie" is in multiple books having different "Values" assigned to him. I want to add those up across all books containing "Frankie".

I came up with this function that merges any duplicates that somehow got added into the list and merges their "Value" keys with some basic and primitive error checking.

FUNCTION Add_Obj {
    PARAM ( [pscustomobject]$obj )
    
    IF ($obj -eq $null -Or $obj -isnot [pscustomobject]) {return}

    $tempObj = $allObjs | Where-Object {($_.Name -eq $obj.Name) -and ($_.Gender -eq $obj.Gender) -and ($_.Size -eq $obj.Size)}

    IF ($tempObj -eq $null) { [void]$allObjs.Add($Obj)}
    ELSE {
        IF ($tempObj -is [array]) {
            $newObj = [pscustomobject]@{Name="";Gender="";Size="";Value=0}
            FOREACH ($dup in $tempObj) {
                $allObjs.Remove($dup)
                $newObj.Name = $dup.Name
                $newObj.Gender = $dup.Gender
                $newObj.Size = $dup.Size
                $newObj.Value += $dup.Value
            }
            [void]$allObjs.Add($newObj)
        }    
        ELSEIF ($tempObj -is [pscustomobject]) {
            $tempObj.Value += $obj.Value
        }
    }
}

##### Output:
PS C:\Users\Steve> $allObjs = New-Object System.Collections.ArrayList
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "Frankie";Gender = "Male";Size = "L";Value = 1337 })
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "Dude";Gender = "Male";Size = "L";Value = 100000 })
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "Frank";Gender = "Male";Size = "S";Value = 5002365 })
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "Jane";Gender = "Female";Size = "S";Value = 69696969 })
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 42 })
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 420 })
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 4200 })
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 42000 })
PS C:\Users\Steve> Add_Obj -obj ([pscustomobject]@{Name = "June";Gender = "Female";Size = "L";Value = 420000 })
PS C:\Users\Steve>
PS C:\Users\Steve> $allObjs

Name    Gender Size    Value
----    ------ ----    -----
Frankie Male   L        1337
Dude    Male   L      100000
Frank   Male   S     5002365
Jane    Female S    69696969
June    Female L      466662

Now I can spam this function all day long and not end up with multiple "Frankie"'s who are Size L and Gender Male. It just seems like there has to be a better way of doing this. The code above seems inefficient. This is only an example mind you. My actual data set has 2 keys that are my fingerprint and 9 keys that will either be added to or subtracted from depending on the situation. I'm parsing a few thousand lines of text because some vendor who shall remain nameless can't expose these variables via JSON/RESTFUL API.... anyways, I need this to be fast.

Thank you for your time,
Steve

January 24, 2017 at 2:46 pm

If you will always have to deal with removing any existing multiple duplicates, there's not much that you can do that's significantly better than what you have. You could just add up all of the existing duplicates and remove them, then add the new summary entry, rather than repeatedly removing/adding entries for each duplicate. If you could start fresh with all existing duplicates cleaned up, you wouldn't need to remove any entries, just add/update.

January 24, 2017 at 3:29 pm

My sample code will compare new objects from dataset and all list then update allobj.

$all = "Name Gender Size Value
Frankie Male L 12
Jane Female S 12
Joe Male M 13
"
$dataset = "Name Gender Size Value
Frankie Male L 12
Jane Female S 12
Joe Male M 13
Frankie Male L 12
Frankie Male L 14
Jane Female S 12
Joe Male M 13
Jane Male M 13
Frankie Male L 13
Joe Male M 14
"
$allobj = New-Object System.Collections.ArrayList
$obj1 = $all | ConvertFrom-Csv -Delimiter " "
$obj2 = $dataset | ConvertFrom-Csv -Delimiter " "

$addobj = Compare-Object -ReferenceObject $obj1 -DifferenceObject $obj2 -PassThru | 
Select-Object * -ExcludeProperty sideindicator

$allobj.add($addobj) | Out-Null
$allobj

January 24, 2017 at 6:40 pm

Ron,

The de-dupe portion of the code isn't explicitly required. It's mainly there as a worse-case scenario in case I make a boo-boo and add data improperly. I suppose I could just add everything and de-dupe outside the loop...

Random,
I didn't think of doing that... I'll give it a shot and see if there are any speed penalties. Great idea!

-Steve