Log parsing for specific error, counting occurences, and show date

Welcome Forums General PowerShell Q&A Log parsing for specific error, counting occurences, and show date

Viewing 23 reply threads
  • Author
    Posts
    • #234271
      Participant
      Topics: 1
      Replies: 7
      Points: 32
      Rank: Member

      Hi All,

      I need some help with some log parsing.

      1.  (Done)I want to get a count number of times per line an address occurs
      2. (Need Help) I now want to add two columns to the output to show the first and last time the error occurred per user.

      This is what the log format looks like

      2020-06-04 15:06:53 [12655] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:06:54 [12653] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:06:56 [12651] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:10:14 [8276] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:11:01 [6800] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:16:58 [11340] INFO Display[email protected], ExAddress=, [email protected]
      2020-06-04 15:16:59 [11340] INFO [email protected], ExAddress=, [email protected]

      What Ive written so far

      $regexA = "\DisplayName.*"
      select-string -Path .\logfile.log* -Pattern $regexa -AllMatches | % { $_.Matches } | % { $_.Value } | Group-Object -Verbose -noelement | select name, count | sort-object count -Descending > .\output.txt

       

       

      Thanks you any help would be greatly appreciated.

      • This topic was modified 3 weeks, 1 day ago by Logan09. Reason: Updating syntax
    • #234361
      Participant
      Topics: 5
      Replies: 2384
      Points: 6,066
      Helping Hand
      Rank: Community MVP

      logan, welcome to Powershell.org. Please take a moment and read the very first post on top of the list of this forum: Read Me Before Posting! You’ll be Glad You Did!.

      When you post code, error messages, sample data or console output format it as code, please.
      In the “Text” view you can use the code tags “PRE“, in the “Visual” view you can use the format template “Preformatted“. You can go back edit your post and fix the formatting – you don’t have to create a new one.
      Thanks in advance.

      You actually do not need external tools like findstr. You should use native Powershell cmdlets liek Select-String instead like you already did in your second pipeline step. You may (re-)read the help for Select-String including the examples to learn how to use it. You can provide input files for Select-String as well. You may share a few more lines from your log file to play with and your regex pattern as well and a few examples of the desired output, please.

    • #237691
      Participant
      Topics: 1
      Replies: 7
      Points: 32
      Rank: Member

      Hi Olaf,

      Thank you for your response, I updated my example as you suggested.  Thank you for any ideas you may have to add first and last occurrences.

    • #237778
      Participant
      Topics: 5
      Replies: 2384
      Points: 6,066
      Helping Hand
      Rank: Community MVP

      It may be over the top, but your data look like structured data… like CSV data. So you may treat them as such lite this:

      $ImportedData = 
      Import-Csv -Path D:\sample\sample.log -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
      Select-Object -Property @{
          Name       = 'DateTime'
          Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
      }, ID, Info,
      @{
          Name       = 'DisplayName'
          Expression = { ($_.DisplayName -split '=')[1].trim(',') }
      },
      @{
          Name       = 'ExAddress'
          Expression = { ($_.ExAddress -split '=')[1].trim(',') }
      },
      @{
          Name       = 'SmtpAddress'
          Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
      }

      Now that you have the data in a variable you can much more easily sort, group or count them how you like:

      $ImportedData |
      Format-Table -AutoSize

      Output would look like this:

      DateTime            ID      Info DisplayName              ExAddress SmtpAddress
      --------            --      ---- -----------              --------- -----------
      2020-06-04 15:06:53 [12655] INFO [email protected]           [email protected]
      2020-06-04 15:06:54 [12653] INFO [email protected]           [email protected]       
      2020-06-04 15:06:56 [12651] INFO [email protected]           [email protected]       
      2020-06-04 15:10:14 [8276]  INFO [email protected]           [email protected]
      2020-06-04 15:11:01 [6800]  INFO [email protected]           [email protected]       
      2020-06-04 15:16:58 [11340] INFO [email protected]           [email protected]       
      2020-06-04 15:16:59 [11340] INFO [email protected]           [email protected]

      I’d recommend to save the output as CSV as well.

    • #237829
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      Olaf, bravo. I wasn’t aware import-csv was so powerful. This is an excellent answer.

    • #237847
      Participant
      Topics: 1
      Replies: 7
      Points: 32
      Rank: Member

      Hi Olaf,

      I am trying to grab the first time it occurred, last time, and count for each address. The idea is to first see how big of an issue that user is causing, see when it started, and see if its still happening. I was using group-object to get counts but that requires it to be unique.

      Wanted output something like this:

      First Occurrence Last Occurrence Display name Count
      2020-06-04 15:06:53 [12655] 2020-06-04 15:06:56 [12651] [email protected] 3
      2020-06-04 15:10:14 [8276] 2020-06-04 15:10:14 [8276] [email protected] 1
      2020-06-04 15:11:01 [6800] 2020-06-04 15:11:01 [6800] [email protected] 1
      2020-06-04 15:16:58 [11340] 2020-06-04 15:16:58 [11340] [email protected] 1
      2020-06-04 15:16:59 [11340] 2020-06-04 15:16:59 [11340] [email protected] 1

      Thanks for your help

    • #237850
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      Olaf has provided your data as an object that you can now filter/group/calculate to your heart’s content.

      $ImportedData | Group-Object -Property displayname | 
          Select-Object -Property @{N="First Occurrence"; E={$_.group.datetime | sort | select -First 1}},
                                  @{N="Last Occurrence" ; E={$_.group.datetime | sort | select -Last 1}},
                                  Name,
                                  Count
      

      Output

      First Occurrence    Last Occurrence     Name                     Count
      ----------------    ---------------     ----                     -----
      2020-06-04 15:06:53 2020-06-04 15:06:56 [email protected]     3
      2020-06-04 15:10:14 2020-06-04 15:10:14 [email protected]     1
      2020-06-04 15:11:01 2020-06-04 15:11:01 [email protected]     1
      2020-06-04 15:16:58 2020-06-04 15:16:58 [email protected]     1
      2020-06-04 15:16:59 2020-06-04 15:16:59 [email protected]     1
      
      • #237853
        Participant
        Topics: 1
        Replies: 7
        Points: 32
        Rank: Member

        Got it, thank you both for your time.   This is exactly what I was looking for.

    • #237856
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      I was bored..

      New sample data

      2020-06-04 15:06:53 [1255] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:06:54 [1653] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:06:56 [12651] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:10:14 [8276] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:11:01 [680] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:16:58 [1340] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:16:59 [11340] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:16:57 [1308] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 15:16:57 [19971] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 11:08:01 [6800] INFO [email protected], ExAddress=, [email protected]
      2020-06-04 19:08:01 [6033] INFO [email protected], ExAddress=, [email protected]
      

      New script

      $ImportedData = Import-Csv -Path D:\sample\sample.log -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
          Select-Object -Property @{
          Name = 'DateTime'
          Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
          }, ID, Info,
          @{
          Name = 'DisplayName'
          Expression = { ($_.DisplayName -split '=')[1].trim(',') }
          },
          @{
          Name = 'ExAddress'
          Expression = { ($_.ExAddress -split '=')[1].trim(',') }
          },
          @{
          Name = 'SmtpAddress'
          Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
          }
      
      $lookup = $ImportedData | Group-Object -Property displayname -AsHashTable -AsString
      
      $lookup.Keys | foreach {
          $oldest = $lookup[$_].datetime | sort | select -First 1
          $newest = $lookup[$_].datetime | sort | select -Last 1
          $newestID = $lookup[$_] | ? datetime -eq $newest | select -ExpandProperty ID
          $oldestID = $lookup[$_] | ? datetime -eq $oldest | select -ExpandProperty ID
      
          [pscustomobject]@{
              "First Occurrence" = "{0} {1}" -f $oldest,$oldestID
              "Last Occurrence"  = "{0} {1}" -f $newest,$newestID
              Displayname        = $_
              Count              = $lookup[$_].Count
          }
      } | sort -Property count -Descending
      

      New output

      First Occurrence            Last Occurrence             Displayname              Count
      ----------------            ---------------             -----------              -----
      2020-06-04 11:08:01 [6800]  2020-06-04 19:08:01 [6033]  [email protected]     3
      2020-06-04 15:06:53 [1255]  2020-06-04 15:06:56 [12651] [email protected]     3
      2020-06-04 15:16:57 [19971] 2020-06-04 15:16:59 [11340] [email protected]     2
      2020-06-04 15:16:57 [1308]  2020-06-04 15:16:58 [1340]  [email protected]     2
      2020-06-04 15:10:14 [8276]  2020-06-04 15:10:14 [8276]  [email protected]     1
      
    • #237880
      Participant
      Topics: 1
      Replies: 7
      Points: 32
      Rank: Member

      This does work great, however I have an average of 120 MB of log files to scan, it seemed to take about 5 min per 20 MB of logs. Any ideas how to optimize?

      I also switch to below to be able to use multiple input files.

      $ImportedData = get-content .\sample.log* | ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
      
    • #237937
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      If you remove the sort at the end does that speed it up?

    • #237940
      Participant
      Topics: 5
      Replies: 2384
      Points: 6,066
      Helping Hand
      Rank: Community MVP

      I also switch to below to be able to use multiple input files.

      You can provide more than one file at a time for Import-Csv. Either with an array you filled up in advance or directly like this:

      Import-Csv -Path (Get-ChildItem -Path .\sample*.log) 

      … however I have an average of 120 MB of log files to scan, it seemed to take about 5 min per 20 MB of logs. Any ideas how to optimize?

      Powershell is quite slow when it comes to filesystem operations. If the running time really matters you could use a file system [System.IO.StreamReader] to speed up the import of the data. But that would be less comfortable than pure Powershell of course. 😉

      Here you have something to start reading: https://stackoverflow.com/questions/44462561/system-io-streamreader-vs-get-content-vs-system-io-file

    • #237946
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      Here see if this helps.

      $lookup = Get-Content .\sample.log* |
          ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
              Select-Object -Property @{
                  Name = 'DateTime'
                  Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
                  }, ID, Info,
                  @{
                  Name = 'DisplayName'
                  Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                  },
                  @{
                  Name = 'ExAddress'
                  Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                  },
                  @{
                  Name = 'SmtpAddress'
                  Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                  } | Group-Object -Property displayname -AsHashTable -AsString
      
      $lookup.keys | Foreach-Object {
          $rows = $lookup[$_]
      
          $oldest = $rows.datetime | sort | select -First 1
          $newest = $rows.datetime | sort | select -Last 1
          $newestID = $rows | ? datetime -eq $newest | select -first 1 -ExpandProperty ID
          $oldestID = $rows | ? datetime -eq $oldest | select -first 1 -ExpandProperty ID
      
          [pscustomobject]@{
              "First Occurrence" = "{0} {1}" -f $oldest,$oldestID
              "Last Occurrence"  = "{0} {1}" -f $newest,$newestID
              Displayname        = $_
              Count              = $rows.Count
          }
      } | sort -Property count -Descending
      
    • #237955
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      If you can afford to put it all in memory, you can try it this way.

      $script = {
          Get-Content .\sample.log* |
              ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
                  Select-Object -Property @{
                      Name = 'DateTime'
                      Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
                      }, ID, Info,
                      @{
                      Name = 'DisplayName'
                      Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                      },
                      @{
                      Name = 'ExAddress'
                      Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                      },
                      @{
                      Name = 'SmtpAddress'
                      Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                      } | Group-Object -Property displayname
       }
      foreach($group in (& $script))
      {
          $oldest = $group.group.datetime | sort | select -First 1
          $newest = $group.group.datetime | sort | select -Last 1
          $newestID = $group.group | ? datetime -eq $newest | select -first 1 -ExpandProperty ID
          $oldestID = $group.group | ? datetime -eq $oldest | select -first 1 -ExpandProperty ID
      
          [pscustomobject]@{
              "First Occurrence" = "{0} {1}" -f $oldest,$oldestID
              "Last Occurrence"  = "{0} {1}" -f $newest,$newestID
              Displayname        = $group.name
              Count              = $group.Count
          }
      }
      

      Take care.

    • #237958
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      Or this slightly simplified version of the previous

      $script = {
          Get-Content .\sample.log* |
              ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
                  Select-Object -Property @{
                      Name = 'DateTime'
                      Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
                      }, ID, Info,
                      @{
                      Name = 'DisplayName'
                      Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                      },
                      @{
                      Name = 'ExAddress'
                      Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                      },
                      @{
                      Name = 'SmtpAddress'
                      Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                      } | Group-Object -Property displayname
       }
      foreach($group in (& $script))
      {
          $oldest = $group.group | sort datetime | select -First 1
          $newest = $group.group | sort datetime | select -Last 1
      
          [pscustomobject]@{
              "First Occurrence" = "{0} {1}" -f $oldest.datetime,$oldest.id
              "Last Occurrence"  = "{0} {1}" -f $newest.datetime,$newest.id
              Displayname        = $group.name
              Count              = $group.Count
          }
      }
      
    • #237995
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      You can provide more than one file at a time for Import-Csv. Either with an array you filled up in advance or directly like this:

      Import-Csv Path (Get-ChildItem Path .\sample*.log)

      This did not seem to work for me, Olaf.

    • #238004
      Participant
      Topics: 1
      Replies: 7
      Points: 32
      Rank: Member

      Unfortunately its still too slow for that particular set of logs. It does work for the other another set that I am parsing down the data first.

      Checking other forums a common suggestion is to use Microsoft.VisualBasic.FileIO.TextFieldParser, though I lack background in VB to do this. If I find a way to get it working or improve the performance I’ll post an update.

    • #238016
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      Yeah powershell isn’t the fastest with large text files. May need another tool for faster speeds. The .net methods aren’t much different than get-content -raw. Also, the get-date seemed redundant and was called for each record. This should speed it up some.

      $script = {
          Get-Content .\sample.log* -raw |
              ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
                  Select-Object -Property @{
                      Name = 'DateTime'
                      Expression = { '{0} {1}' -f $_.Date,$_.Time }
                      }, ID, Info,
                      @{
                      Name = 'DisplayName'
                      Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                      },
                      @{
                      Name = 'ExAddress'
                      Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                      },
                      @{
                      Name = 'SmtpAddress'
                      Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                      } | Group-Object -Property displayname
       }
      foreach($group in (& $script))
      {
          $oldest = $group.group | sort datetime | select -First 1
          $newest = $group.group | sort datetime | select -Last 1
      
          [pscustomobject]@{
              "First Occurrence" = "{0} {1}" -f $oldest.datetime,$oldest.id
              "Last Occurrence"  = "{0} {1}" -f $newest.datetime,$newest.id
              Displayname        = $group.name
              Count              = $group.Count
          }
      }
      
    • #238019
      Participant
      Topics: 5
      Replies: 2384
      Points: 6,066
      Helping Hand
      Rank: Community MVP

      If I find a way to get it working or improve the performance I’ll post an update.

      Hmmm … does performance really matters? In my experience it matters quite less often than you might think. Especially when the task is about to run automatically by schedule in the background. 😉

      If performance matters you should measure it to know where you lack on performance. Otherwise you might optimize an already speedy part of your code. You can use Measure-Command to measure the individual parts of your script.

      Please have look at my last answer. Quite often Powershells file system operations consumes a lot of time and there are some ways to circumwent this.

      This did not seem to work for me, Olaf.

      I don’t know what to say. It just works for me as intended. Did you notice – it’s Get-ChildItem inside the parenthesis – not Get-Content!! 😉

    • #238025
      Participant
      Topics: 5
      Replies: 2384
      Points: 6,066
      Helping Hand
      Rank: Community MVP

      What came to my mind just right now – do you run this code locally on the computer where the logs files are or remote on a networkshare? What version of Powershell do you use? I noticed a distinct difference in performance between v5.1 and the current v7.0.2.

    • #238124
      Participant
      Topics: 1
      Replies: 7
      Points: 32
      Rank: Member

      Yeah powershell isn’t the fastest with large text files. May need another tool for faster speeds. The .net methods aren’t much different than get-content -raw. Also, the get-date seemed redundant and was called for each record. This should speed it up some.

      PowerShell
      33 lines

       

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      $script = {
      Get-Content .\sample.log* raw |
      ConvertFrom-Csv Delimiter ‘ ‘ Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
      Select-Object Property @{
      Name = ‘DateTime’
      Expression = { ‘{0} {1}’ -f $_.Date,$_.Time }
      }, ID, Info,
      @{
      Name = ‘DisplayName’
      Expression = { ($_.DisplayName -split ‘=’)[1].trim(‘,’) }
      },
      @{
      Name = ‘ExAddress’
      Expression = { ($_.ExAddress -split ‘=’)[1].trim(‘,’) }
      },
      @{
      Name = ‘SmtpAddress’
      Expression = { ($_.SmtpAddress -split ‘=’)[1].trim(‘,’) }
      } | Group-Object Property displayname
      }
      foreach($group in (& $script))
      {
      $oldest = $group.group | sort datetime | select First 1
      $newest = $group.group | sort datetime | select Last 1
      [pscustomobject]@{
      “First Occurrence” = “{0} {1}” -f $oldest.datetime,$oldest.id
      “Last Occurrence” = “{0} {1}” -f $newest.datetime,$newest.id
      Displayname = $group.name
      Count = $group.Count
      }
      }
      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

      Yes this is a bit faster.

      What came to my mind just right now – do you run this code locally on the computer where the logs files are or remote on a networkshare? What version of Powershell do you use? I noticed a distinct difference in performance between v5.1 and the current v7.0.2.

      V5.1, I will test with the new version, logs would be local.  Speed is requirement as it would be ran manually for multiple sets of data, on logs that are not always accessible.

       

       

    • #238196
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      The majority of the time is spent reading and grouping the content, which of course requires all the data to be available. I had fun playing with this and I thought I’d share the code I used to create the test files as well as a class for filtering/sorting the group data. I’m not sure if it’s any faster than my previous suggestion, just yet another approach. You could also take each file and run them in background jobs/runspaces and then group on the collection of their output. Like Olaf said though, if it’s running on a schedule would it matter if it took some time? I was getting about 5 minutes for 4 x 11MB files with my previous suggestion. With the following code I got 48 seconds for the 4 x 5.75MB files it creates.

      Create the test files

      function get-randomdate {
          $num = Get-Random(-3..-365)
          (get-date).AddDays($num).AddHours($num).AddMinutes($num).AddSeconds($num).ToString('yyyy-MM-dd HH:mm:ss')
      }
      
      function get-randomnum {
          Get-Random(3000..15000)
      }
      
      function get-randomemail{
          "Emailaddress$(get-random(1..12))@domain.com"
      }
      
      function generateline{
      @"
      $(get-randomdate) [$(get-randomnum)] INFO DisplayName=$(get-randomemail), ExAddress=, [email protected]
      "@
      }
      
      foreach($num in 1..4)
      {
          1..50000|foreach{
              generateline
          }| Set-Content "c:\temp\Sample$num.csv"
      }
      

      Sorting class

      class SortGroup
      {
          [string]$FirstOccurrence
          [string]$LastOccurrence
          [string]$DisplayName
          [int]$Count
      
          [string]Sort([object]$subgroup)
          {
              $newest,$oldest = $subgroup | sort datetime | select -Last 1 -First 1
              $this.FirstOccurrence = $newest.datetime,$newest.id
              $this.LastOccurrence = $oldest.datetime,$oldest.id
              Return $this
          }
      
          SortGroup([object]$group)
          {
              $this.Count = $group.count
              $this.DisplayName = $Group.name
              $this.Sort($group.group)
          }
      }
      

      Read data in, group, and sort/arrange/output.

      get-content c:\temp\sample*.csv -raw |
          ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
                  Select-Object -Property @{
                      Name = 'DateTime'
                      Expression = { '{0} {1}' -f $_.Date,$_.Time }
                      }, ID, Info,
                      @{
                      Name = 'DisplayName'
                      Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                      },
                      @{
                      Name = 'ExAddress'
                      Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                      },
                      @{
                      Name = 'SmtpAddress'
                      Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                      }  | Group-Object -Property displayname | Foreach{[SortGroup]::new($_)}
      
    • #238424
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      I must say I am really impressed with classes in powershell. This code would process the sample files I made in 1.1 – 2 minutes. The original suggestion I provided took 16 minutes and my last suggestion took ~ 6 minutes. This is pretty significant improvement if you ask me. There is still more speed you can gain by running these in parallel/background jobs if needed but this is very reasonable.

      Class LogParser {
      
          static [object]ProcessFile($file)
          {
              return [system.io.file]::ReadAllText($file) |
                  ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
                      foreach{[FormatLine]::new($_)}
          }
      }
      
      Class FormatLine {
      
          [datetime]$Datetime
          [string]$Displayname
          [string]$ExAddress
          [string]$SmtpAddress
          [string]$ID
      
          FormatLine($line)
          {
              $this.Datetime = '{0} {1}' -f $line.Date,$line.Time
              $this.ID = $line.id
              $this.Displayname = ($line.DisplayName -split '=')[1].trim(',')
              $this.ExAddress = ($line.ExAddress -split '=')[1].trim(',')
              $this.SmtpAddress = ($line.SmtpAddress -split '=')[1].trim(',')
          }
      }
      
      class SortGroup
      {
          [string]$FirstOccurrence
          [string]$LastOccurrence
          [string]$DisplayName
          [int]$Count
      
          [string]Sort([object]$subgroup)
          {
              $newest,$oldest = $subgroup | sort datetime | select -Last 1 -First 1
              $this.FirstOccurrence = $newest.datetime,$newest.id
              $this.LastOccurrence = $oldest.datetime,$oldest.id
              Return $this
          }
      
          SortGroup([object]$group)
          {
              $this.Count = $group.count
              $this.DisplayName = $Group.name
              $this.Sort($group.group)
          }
      }
      
      Get-ChildItem -Path C:\Temp\Sample*.csv | %{ [LogParser]::ProcessFile($_) } | 
          Group-Object -Property displayname | %{ [SortGroup]::new($_) }
      

      Just for the record I also tested logparser class with get-content -raw and it really was about the same as readalltext()

    • #238448
      Participant
      Topics: 1
      Replies: 7
      Points: 32
      Rank: Member

      Wow nice work Doug huge speed improvement.  Ive honestly never used classes at all I’ll need to read up on this to use in future.

      Something kinda odd if there are any lines that don’t match the data structure outlined , the script it fails to run.  I had 2 entries that my parsed down log missed.

      Example: 2019-07-17 14:40:13 [4303] INFO [email protected], ExAddress=, [email protected], Type=Sender
      foreach : You cannot call a method on a null-valued expression.
      At line:8 char:17
      + foreach{[FormatLine]::new($_)}
      + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      + CategoryInfo : InvalidOperation: (:) [ForEach-Object], RuntimeException
      + FullyQualifiedErrorId : InvokeMethodOnNull,Microsoft.PowerShell.Commands.ForEachObjectCommand

       

       

      • This reply was modified 2 weeks, 5 days ago by Logan09.
    • #238454
      Participant
      Topics: 3
      Replies: 431
      Points: 1,533
      Helping Hand
      Rank: Community Hero

      I would check for spaces at the beginning of the line. Any spaces there caused errors for me.

Viewing 23 reply threads
  • You must be logged in to reply to this topic.