Formatting Command Line Output

This topic contains 5 replies, has 4 voices, and was last updated by  Joakim Svendsen 3 years, 9 months ago.

  • Author
    Posts
  • #10709

    Christopher Corbett
    Participant

    I have a legacy command line application that returns each line as an object such as the following. I am trying to format this output so I can insert it to a CSV with headers. The original thought was to grab all the lines using the blank line as a separator. A few things throw me for a loop. 1.) the number of objects are not consistent and 2.) the last object get's placed into a new object, opposed to the same line.

    Any thoughts on how I should proceed. Perhaps it's just Friday, and I cannot think straight! I have been at this for a few hours now.

    –begin output–
    Path: c:\test\test\
    Created: 2013-09-12 10:13:09 -0500 (Thu 12 Sep 2013)
    File: somefile.txt
    Comment (1 Line):
    12345 – Test Comment

    Path: c:\test\test2\
    Created: 2013-09-12 10:14:09 -0500 (Thu 12 Sep 2013)
    File: somefile2.txt
    Comment (1 Line):
    67890 – Test Comment 2

    Path: c:\test\test3\
    Created: 2013-09-12 10:15:09 -0500 (Thu 12 Sep 2013)
    File: somefile2.txt
    Comment (1 Line):
    09876 – Test Comment 3

    Path: c:\test\test2\
    Created: 2013-09-12 10:14:09 -0500 (Thu 12 Sep 2013)
    File: somefile2.txt

    Path: c:\test\test2\
    File: somefile2.txt
    –end output–
    The CSV would look similar to:

    Path Created File Comment
    c:\test\test 2013-09-12 10:13:09 -0500 (Thu 12 Sep 2013) somefile.txt 12345 – Test Comment

    By no means am I trying to get someone to do all the work for me... just a push in the right direction. (Unless of course you want to LoL)

  • #10710

    Don Jones
    Keymaster

    You're probably going to have to go with regular expressions, and use a capturing expression. Give me a sec, I'll work up an example.

  • #10712

    Don Jones
    Keymaster

    BTW, http://www.powershelladmin.com/wiki/Powershell_regular_expressions#Example_-_Named_Captures is a handy reference for this.

    Assuming $out contains your output:

    PS C:\> $out
    Path: c:\test\test2
    File: somefile.txt
    Path: c:\test3\test
    File: otherfile.txt
    

    You can do something like this:

    PS C:\> $out -match "Path:\s(?[:\w\\]*)"
    True
    PS C:\> $matches
    
    Name                           Value
    ----                           -----
    path                           c:\test\test2
    0                              Path: c:\test\test2
    
    
    PS C:\> $matches.path
    c:\test\test2
    

    The "?" creates a named capture expression, which is everything inside the (parens). So whatever the parens match get captured that way. You'll have to do some playing around, I expect, but when it comes to string parsing regex is usually the way.

  • #10713

    Christopher Corbett
    Participant

    Thank you for the expedient reply, Don. I will give it a shot. =)

  • #10714

    Dave Wyatt
    Moderator

    Parsing this kind of text output is annoying, but doable. I'd start with something like this:

    $properties = @{
        Path = $null
        File = $null
        Created = $null
    }
    
    $currentObject = New-Object psobject -Property $properties
    
    $output = SomeCommandLineTool.exe
    
    $(
        foreach ($line in $output -split '\r?\n')
        {
            if ($line -notmatch '^\s*(.+?)\s*:\s*(.+)$')
            {
                Write-Debug "'$line' does not match pattern."
                continue
            }
    
            $propertyName = $matches[1]
            $value = $matches[2]
    
            Write-Debug "PropertyName: '$propertyName', Value: '$value'"
    
            if (-not $properties.ContainsKey($propertyName))
            {
                continue
            }
    
            if ($null -ne $currentObject.$propertyName)
            {
                # This is the second time we've encountered this property, so it must be a new object.
                Write-Output $currentObject
    
                $currentObject = New-Object psobject -Property $properties
            }
    
            # Note:  this doesn't do any processing of the values, such as converting Created to a DateTime object.  It just takes the strings
            # as they appeared in the command's output, and sticks them into the CSV file.
            $currentObject.$propertyName = $value
        }
    
        # Outputting the last object in the file
        Write-Output $currentObject
    ) |
    Export-Csv -Path .\output.csv -NoTypeInformation
    
  • #11162

    Joakim Svendsen
    Participant

    I happen to be the author of the article Don linked to, and came across this article during one of the rare occasions where I check the Google WebMaster tools. Thanks for the "acknowledgement". 🙂 My reply is a bit late, however.

    First, I'll start by saying that this is a quite imprecise request for help. The data posted shows severe illogical inconsistencies, such as C:\test\test2\somefile2.txt appearing three times, with less and less data. First the comment is missing, then the "Created" field is missing. This is not even commented on? But worry not, I have accounted for this broken behaviour as well – at least partially... Named captures are indeed handy here. There will simply be duplicate entries made for such files, with less info for the "broken" occurrences. I made some notes about what you might want to do instead in the comments in the code itself.

    Second, I made quite a few assumptions when writing this code and regexp. One being that there will always be a "Path" and "File" entry. Another being that there are no comments with two newlines in a row, since I split on multiple newlines. That last one might bite you, I suspect. If so, keep in mind that you should try to anchor on "Path:", which seems like the safe choice given your (quite broken ;-)) test data.

    The text file I'm attaching is based on CSV generated from parsing the initial poster's exact data as pasted from the web browser into notepad.exe. Another thing I made sure to do, despite not accounting for multiple newlines in a row in a comment, is to handle potential multi-line comments; they will be joined with a semicolon in the comment field.

    The regexp could have been written differently; especially playing with ".", (?s), "\r\n", ".+?", etc. will all work in many different ways. I chose something halfway sane here. Initially I wrote it without stripping out \r, but doing that was apparently simply a lot easier than the other options.

    Anyway, here's the code I wrote that works for all your test data, pasted verbatim from the browser into notepad:

    Set-StrictMode -Off
    $SingleLineData = (Get-Content -Raw -Path E:\temp\old-output.txt) -join "`n"
    
    # This makes things a lot easier in the regex later (strip out \r).
    $SingleLineData = $SingleLineData -replace '\r', ''
    
    # Consider broken data with repeated elements... Use hash keys that represent path + file.
    # Couldn't be bothered now... To handle the utterly broken test data, and the possibility
    # of a path+file without "Created" and "Comment" appearing before an entry with one or both
    # of those, you'd want to store an object in the hash and to look up to see if properties
    # are already populated using some if statements. Won't bother.
    #$AlreadyDone = @{}
    
    # Probably should consider the case of comments with two newlines in a row.
    # That would require different logic that anchors on "Path:" which seems safe
    # even given this utterly broken test data...
    # I copied and pasted the output from my browser to notepad, and the test data had
    # \r\n for some newlines. I figure this might be an artifact carried over from the
    # actual output data, so beware of that.
    #@(
    foreach ($BlockText in $SingleLineData -split "\n{2,}") {
        if ($BlockText -imatch "^\s*Path:\s+(?'Path'[^\n]+)\s*(?:Created:\s+(?'Created'[^\n]+))?\s*File:\s+(?'File'[^\n]+)\s*(?s:Comment\s+\([^)]+\)\s*:[^\n]*\n(?'Comment'.+))?") {
            New-Object PSObject -Property @{
                Path    = $Matches['Path']
                Created = $Matches['Created']
                File    = $Matches['File']
                Comment = $Matches['Comment'] -replace '[\r\n]+', ';'
            }
        }
    }
    #) | Export-Csv -Encoding UTF8 -NoType -Path old-output.csv
    

    Cheers.

    -Joakim

You must be logged in to reply to this topic.