Author Posts

January 1, 2012 at 12:00 am

by omega.red at 2013-02-27 22:47:34

I am trying to reduce the system memory usage of and uniquely sort the following script results without putting a section of the script in a variable. Using the variable,$testvar, is not a good option and will result in system memory usage increasing over 200MB. Is it possible to uniquely sort a hashtable collection in pipeline?


$testvar =
Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
{
$sr = new-object system.io.streamreader(“.\Desktop\logs\$file”);
While (($line = $sr.ReadLine() –ne $null) {
If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
{$Matches | select-object @{name=’Date’;expression={$Matches.date}},@{name=’Mail’;expression={$Matches.mail}}
}}} $testvar | sort-object –unique –property Mail

RESULTS: (WITH OMITTED VARIABLE LINE AT THE END OF SCRIPT)
TypeName: Selected.System.Collections.Hashtable
Date Mail
01/01/2013 jane.doe@ mail.mail
01/01/2013 jane.doe@ mail.mail
01/01/2013 john.doe@ mail.mail
01/01/2013 john.doe@ mail.mail

Changing the “if” statement bracket section to the following utilizes the least amount of memory. Is it possible to put a space or tab between the date and email address then uniquely sort in the pipeline?


If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
{$Matches.date + $Matches.mail}

RESULTS:
TypeName: System.String
01/01/2013jane.doe@ mail.mail
01/01/2013jane.doe@ mail.mail
01/01/2013john.doe@ mail.mail
01/01/2013john.doe@ mail.mail

by mjolinor at 2013-02-28 03:35:16

First off, you can eliminate the variable and make your foreach loop "pipeline friendly by wrapping it in a script block:
&{
Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
{
$sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
While (($line = $sr.ReadLine() –ne $null) {
If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
{$Matches |
select-object @{name=’Date’;expression={$Matches.date}},@{name=’Mail’;expression={$Matches.mail}}
}
}
}
} | sort-object –unique –property Mail

But that isn't really going to help your memory usage much as long as your piping the result to sort. Sort is a "blocking cmdlet" – all the results from the pipeline will accumulate there, since it can't really sort the items until it has them all.
Since you seem to be using sort to de-dupe the Mail field, a better way might be to use a hash table using the Mail field as the key, and eliminate the sort -unique:

$ht = @{}

Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
{
$sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
While (($line = $sr.ReadLine() –ne $null) {
If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
{$ht[$Matches.Mail] = $Matches.Date}
}
}

$ht.GetEnumerator() |
foreach {
New-Object psobject -property @{
Date = $_.Value
Mail = $_.name
}
}

Edit:
If you've got multiple dates and are looking for unique date/address combinations, combine the date and address to make the hash table keys, then split them back out to create the objects.

$ht = @{}
&{
Foreach ($file in get-childitem –path “.\Desktop\logs\” –filter Test.log.* -name)
{
$sr = new-object system.io.streamreader(“.\Desktop\logs\$file”)
While (($line = $sr.ReadLine() –ne $null) {
If ($line –match “\[(?\d{2}/\d{2}/\d{4})\].*mail=(?.*?)\ from”)
{$ht["$($Matches.Mail])#$($Matches.Date)"] = $true
}
}
}

$ht.keys |
foreach {
New-Object psobject -property @{
Date = $_.split('#')[1]
Mail = $_.split('#')[0]
}
}

by omega.red at 2013-02-28 21:18:33

I see; I was thinking of using a custom object. Your first suggestion uniquely sorted the results.
I have a few questions about the syntax used.

1. Why was the mail key put in brackets after the hash table variable ; is this another way of creating custom properties similar to this ?
select-object @{name=??;exp={$_.somekey}} {$ht[$Matches.Mail] = $Matches.Date}
2.Why are the results uniquely sorted without using a sorting cmdlet or parameter?

by mjolinor at 2013-03-01 03:34:59

That's one way of creating a hash table key. There are two ways of referencing hash table keys – square brackets and dot notation.
($hashtable.key or $hashtable[key])
There are slightly different syntax rules for each one. If you're using an expression for the value, like $Matches.Mail then you have to use a subexpression with the dot operator ($hashtable.$($Matches.Mail)). If you use the square brackets you don't need the sub-expression syntax.

The results are unique without using a sort because hash table keys must be unique (see get-help about_hash_tables).

{$ht[$Matches.Mail] = $Matches.Date}

is adding a hash table entry that has a key name taken from $matches.mail, and a value taken from $Matches.Date. If there is not already a key with that name in the table, a new entry will be created. If there is already a entry with that key, it's value will be replaced with the new one, so you are essentially de-duping the entries in-stream. This saves your memory because the only thing you're saving is the hash table. The original objects are discarded after the hash table entry is created.

by omega.red at 2013-03-07 16:14:25

I have another syntax question.
Why did you give the key name an equals true value?
$ht["$($Matches.Mail])#$($Matches.Date)"] = $true

by mjolinor at 2013-03-07 17:33:00

You have to set the value to something. $true was an arbitrary choice. It could have been anything.