Parsing Log Files and Extracting Multiple Strings

This topic contains 0 replies, has 1 voice, and was last updated by  Forums Archives 5 years, 7 months ago.

  • Author
    Posts
  • #5995

    by omega.red at 2013-01-14 22:52:41

    [PowerShell v2 used]

    [01/09/2013][15]....[Setting custom HTTP header variable: 'HTTP_mail=email.1@email.com from]
    [01/07/2013][11]....[Setting custom HTTP header variable: 'HTTP_mail=email.2@email.com from]

    I would like to extract the dates and email addresses from log files and output that info to a text file. (Examples of lines within a log file are above)

    Example Output – Line 1: 01/09/2013 email.1@email.com
    Example Output – Line 2: 01/07/2013 email.2@email.com

    The following one-liner was successful in splitting lines and extracting email addresses, but I am looking for more optimal solutions.
    The cmdlet (get-content) is much too slow and consumes large amounts of memory due to putting all log file contents in memory.


    get-content "\\logs\Trace*" |
    foreach-object {$_.split("HTTP_")} | Select-String -Pattern "mail=" |
    out-file "\\logs\all-names.txt"

    Starting with the (select-string) cmdlet is much faster, but I need to extract both dates and email addresses. What am I overlooking?
    select-string -path Trace.log.1 -pattern "mail=" | out-file -filepath names.txt

    by DonJ at 2013-01-15 14:48:09

    You're not overlooking anything, you're just running against two different limitations in those commands. You might need to write your own script instead of using a one-liner. That way you can read one line from a file (using, perhaps a .NET Framework file reader class) at a time, match it for both of your patterns, create your output, and then read the next line.

    by nohandle at 2013-01-16 01:23:10

    [quote="DonJ"]That way you can read one line from a file (using, perhaps a .NET Framework file reader class) at a time[/quote]
    Hi Don, I know you hate processing strings in PowerShell 🙂 But using .NET reader is unnecessary. The Get-Content serves one line to the pipeline at a time by default (the behavior can be controlled by the ReadCount parameter which defaults to 1.)

    OP: You are right the Select-string is not ideal cmdlet to do this as it outputs the whole line that matched. Just the matches are there but buried pretty deep in the properties of the output object, using just match operator and simple regex is bit easier.
    Get-Content -Path |
    foreach {
    #extract the data
    if ($_ -match "\[(?\d{2}/\d{2}/\d{4})\]\[(?

    }
    Date Time Mail
    ---- ---- ----
    01/09/2013 15:51:58 email.1@email.com
    01/07/2013 11:51:58 email.2@email.com

    by omega.red at 2013-01-27 20:45:01

    This works great; I am able to parse log files and extract the info I need. It can consume alot of system memory, but it is functional.
    After experimenting with the "-readcount" parameter, I noticed I get different results. My addition is in bold.

    For example:
    Without "-readcount," I get all 1766 lines and the exact results I knew I should get.
    With "-readcount -1000," I get only (2) results.
    I know if "-readcount -1000" is used, PowerShell will push the first 1000 lines down the pipeline then the remaining lines down the pipeline.
    Why does this parameter only give (2) results rather than the 1766 I should get?

    Get-Content -Path -readcount 1000 |
    foreach {
    #extract the data
    if ($_ -match "\[(?\d{2}/\d{2}/\d{4})\]\[(?

    by nohandle at 2013-01-27 22:55:32

    That is because the match operator takes the group as one object and returns only the first match from it. Two groups -> two results (assuming there is at least one match in each group.)

You must be logged in to reply to this topic.