Sort Unique Hashtable results without variables

This topic contains 0 replies, has 1 voice, and was last updated by  Forums Archives 5 years, 7 months ago.

  • Author
    Posts
  • #6042

    by omega.red at 2013-02-27 22:47:34

    I am trying to reduce the system memory usage of and uniquely sort the following script results without putting a section of the script in a variable. Using the variable,$testvar, is not a good option and will result in system memory usage increasing over 200MB. Is it possible to uniquely sort a hashtable collection in pipeline?


    $testvar =
    Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
    {
    $sr = new-object system.io.streamreader(“.\Desktop\logs\$file”);
    While (($line = $sr.ReadLine() –ne $null) {
    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$Matches | select-object @{name=’Date’;expression={$Matches.date}},@{name=’Mail’;expression={$Matches.mail}}
    }}} $testvar | sort-object –unique –property Mail

    RESULTS: (WITH OMITTED VARIABLE LINE AT THE END OF SCRIPT)
    TypeName: Selected.System.Collections.Hashtable
    Date Mail
    01/01/2013 jane.doe@ mail.mail
    01/01/2013 jane.doe@ mail.mail
    01/01/2013 john.doe@ mail.mail
    01/01/2013 john.doe@ mail.mail

    Changing the “if” statement bracket section to the following utilizes the least amount of memory. Is it possible to put a space or tab between the date and email address then uniquely sort in the pipeline?


    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$Matches.date + $Matches.mail}

    RESULTS:
    TypeName: System.String
    01/01/2013jane.doe@ mail.mail
    01/01/2013jane.doe@ mail.mail
    01/01/2013john.doe@ mail.mail
    01/01/2013john.doe@ mail.mail

    by mjolinor at 2013-02-28 03:35:16

    First off, you can eliminate the variable and make your foreach loop "pipeline friendly by wrapping it in a script block:
    &{
    Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
    {
    $sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
    While (($line = $sr.ReadLine() –ne $null) {
    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$Matches |
    select-object @{name=’Date’;expression={$Matches.date}},@{name=’Mail’;expression={$Matches.mail}}
    }
    }
    }
    } | sort-object –unique –property Mail

    But that isn't really going to help your memory usage much as long as your piping the result to sort. Sort is a "blocking cmdlet" – all the results from the pipeline will accumulate there, since it can't really sort the items until it has them all.
    Since you seem to be using sort to de-dupe the Mail field, a better way might be to use a hash table using the Mail field as the key, and eliminate the sort -unique:

    $ht = @{}

    Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
    {
    $sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
    While (($line = $sr.ReadLine() –ne $null) {
    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$ht[$Matches.Mail] = $Matches.Date}
    }
    }

    $ht.GetEnumerator() |
    foreach {
    New-Object psobject -property @{
    Date = $_.Value
    Mail = $_.name
    }
    }

    Edit:
    If you've got multiple dates and are looking for unique date/address combinations, combine the date and address to make the hash table keys, then split them back out to create the objects.

    $ht = @{}
    &{
    Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
    {
    $sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
    While (($line = $sr.ReadLine() –ne $null) {
    If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
    {$ht["$($Matches.Mail])#$($Matches.Date)"] = $true
    }
    }
    }

    $ht.keys |
    foreach {
    New-Object psobject -property @{
    Date = $_.split('#')[1]
    Mail = $_.split('#')[0]
    }
    }

    by omega.red at 2013-02-28 21:18:33

    I see; I was thinking of using a custom object. Your first suggestion uniquely sorted the results.
    I have a few questions about the syntax used.

    1. Why was the mail key put in brackets after the hash table variable ; is this another way of creating custom properties similar to this ?
    select-object @{name=??;exp={$_.somekey}} {$ht[$Matches.Mail] = $Matches.Date}
    2.Why are the results uniquely sorted without using a sorting cmdlet or parameter?

    by mjolinor at 2013-03-01 03:34:59

    That's one way of creating a hash table key. There are two ways of referencing hash table keys – square brackets and dot notation.
    ($hashtable.key or $hashtable[key])
    There are slightly different syntax rules for each one. If you're using an expression for the value, like $Matches.Mail then you have to use a subexpression with the dot operator ($hashtable.$($Matches.Mail)). If you use the square brackets you don't need the sub-expression syntax.

    The results are unique without using a sort because hash table keys must be unique (see get-help about_hash_tables).

    {$ht[$Matches.Mail] = $Matches.Date}

    is adding a hash table entry that has a key name taken from $matches.mail, and a value taken from $Matches.Date. If there is not already a key with that name in the table, a new entry will be created. If there is already a entry with that key, it's value will be replaced with the new one, so you are essentially de-duping the entries in-stream. This saves your memory because the only thing you're saving is the hash table. The original objects are discarded after the hash table entry is created.

    by omega.red at 2013-03-07 16:14:25

    I have another syntax question.
    Why did you give the key name an equals true value?
    $ht["$($Matches.Mail])#$($Matches.Date)"] = $true

    by mjolinor at 2013-03-07 17:33:00

    You have to set the value to something. $true was an arbitrary choice. It could have been anything.

You must be logged in to reply to this topic.