Sort Unique Hashtable results without variables

Welcome Forums General PowerShell Q&A Sort Unique Hashtable results without variables

This topic contains 0 replies, has 1 voice, and was last updated by  Forums Archives 7 years, 3 months ago.

  • Author
    Posts
  • #6042

    Member
    Points: 0
    Rank: Member

    by omega.red at 2013-02-27 22:47:34

    I am trying to reduce the system memory usage of and uniquely sort the following script results without putting a section of the script in a variable. Using the variable,$testvar, is not a good option and will result in system memory usage increasing over 200MB. Is it possible to uniquely sort a hashtable collection in pipeline?


    $testvar =
    Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
    {
    $sr = new-object system.io.streamreader(“.\Desktop\logs\$file”);
    While (($line = $sr.ReadLine() –ne $null) {
    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$Matches | select-object @{name=’Date’;expression={$Matches.date}},@{name=’Mail’;expression={$Matches.mail}}
    }}} $testvar | sort-object –unique –property Mail

    RESULTS: (WITH OMITTED VARIABLE LINE AT THE END OF SCRIPT)
    TypeName: Selected.System.Collections.Hashtable
    Date Mail
    01/01/2013 jane.doe@ mail.mail
    01/01/2013 jane.doe@ mail.mail
    01/01/2013 john.doe@ mail.mail
    01/01/2013 john.doe@ mail.mail

    Changing the “if” statement bracket section to the following utilizes the least amount of memory. Is it possible to put a space or tab between the date and email address then uniquely sort in the pipeline?


    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$Matches.date + $Matches.mail}

    RESULTS:
    TypeName: System.String
    01/01/2013jane.doe@ mail.mail
    01/01/2013jane.doe@ mail.mail
    01/01/2013john.doe@ mail.mail
    01/01/2013john.doe@ mail.mail

    by mjolinor at 2013-02-28 03:35:16

    First off, you can eliminate the variable and make your foreach loop "pipeline friendly by wrapping it in a script block:
    &{
    Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
    {
    $sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
    While (($line = $sr.ReadLine() –ne $null) {
    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$Matches |
    select-object @{name=’Date’;expression={$Matches.date}},@{name=’Mail’;expression={$Matches.mail}}
    }
    }
    }
    } | sort-object –unique –property Mail

    But that isn't really going to help your memory usage much as long as your piping the result to sort. Sort is a "blocking cmdlet" – all the results from the pipeline will accumulate there, since it can't really sort the items until it has them all.
    Since you seem to be using sort to de-dupe the Mail field, a better way might be to use a hash table using the Mail field as the key, and eliminate the sort -unique:

    $ht = @{}

    Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
    {
    $sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
    While (($line = $sr.ReadLine() –ne $null) {
    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$ht[$Matches.Mail] = $Matches.Date}
    }
    }

    $ht.GetEnumerator() |
    foreach {
    New-Object psobject -property @{
    Date = $_.Value
    Mail = $_.name
    }
    }

    Edit:
    If you've got multiple dates and are looking for unique date/address combinations, combine the date and address to make the hash table keys, then split them back out to create the objects.

    $ht = @{}
    &{
    Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
    {
    $sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
    While (($line = $sr.ReadLine() –ne $null) {
    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$ht["$($Matches.Mail])#$($Matches.Date)"] = $true
    }
    }
    }

    $ht.keys |
    foreach {
    New-Object psobject -property @{
    Date = $_.split('#')[1]
    Mail = $_.split('#')[0]
    }
    }

    by omega.red at 2013-02-28 21:18:33

    I see; I was thinking of using a custom object. Your first suggestion uniquely sorted the results.
    I have a few questions about the syntax used.

    1. Why was the mail key put in brackets after the hash table variable ; is this another way of creating custom properties similar to this ?
    select-object @{name=??;exp={$_.somekey}} {$ht[$Matches.Mail] = $Matches.Date}
    2.Why are the results uniquely sorted without using a sorting cmdlet or parameter?

    by mjolinor at 2013-03-01 03:34:59

    That's one way of creating a hash table key. There are two ways of referencing hash table keys – square brackets and dot notation.
    ($hashtable.key or $hashtable[key])
    There are slightly different syntax rules for each one. If you're using an expression for the value, like $Matches.Mail then you have to use a subexpression with the dot operator ($hashtable.$($Matches.Mail)). If you use the square brackets you don't need the sub-expression syntax.

    The results are unique without using a sort because hash table keys must be unique (see get-help about_hash_tables).

    {$ht[$Matches.Mail] = $Matches.Date}

    is adding a hash table entry that has a key name taken from $matches.mail, and a value taken from $Matches.Date. If there is not already a key with that name in the table, a new entry will be created. If there is already a entry with that key, it's value will be replaced with the new one, so you are essentially de-duping the entries in-stream. This saves your memory because the only thing you're saving is the hash table. The original objects are discarded after the hash table entry is created.

    by omega.red at 2013-03-07 16:14:25

    I have another syntax question.
    Why did you give the key name an equals true value?
    $ht["$($Matches.Mail])#$($Matches.Date)"] = $true

    by mjolinor at 2013-03-07 17:33:00

    You have to set the value to something. $true was an arbitrary choice. It could have been anything.

The topic ‘Sort Unique Hashtable results without variables’ is closed to new replies.

denizli escort samsun escort muğla escort ataşehir escort kuşadası escort