Author Posts

October 11, 2013 at 11:00 am

I have a legacy command line application that returns each line as an object such as the following. I am trying to format this output so I can insert it to a CSV with headers. The original thought was to grab all the lines using the blank line as a separator. A few things throw me for a loop. 1.) the number of objects are not consistent and 2.) the last object get's placed into a new object, opposed to the same line.

Any thoughts on how I should proceed. Perhaps it's just Friday, and I cannot think straight! I have been at this for a few hours now.

–begin output–
Path: c:\test\test\
Created: 2013-09-12 10:13:09 -0500 (Thu 12 Sep 2013)
File: somefile.txt
Comment (1 Line):
12345 – Test Comment

Path: c:\test\test2\
Created: 2013-09-12 10:14:09 -0500 (Thu 12 Sep 2013)
File: somefile2.txt
Comment (1 Line):
67890 – Test Comment 2

Path: c:\test\test3\
Created: 2013-09-12 10:15:09 -0500 (Thu 12 Sep 2013)
File: somefile2.txt
Comment (1 Line):
09876 – Test Comment 3

Path: c:\test\test2\
Created: 2013-09-12 10:14:09 -0500 (Thu 12 Sep 2013)
File: somefile2.txt

Path: c:\test\test2\
File: somefile2.txt
–end output–
The CSV would look similar to:

Path Created File Comment
c:\test\test 2013-09-12 10:13:09 -0500 (Thu 12 Sep 2013) somefile.txt 12345 – Test Comment

By no means am I trying to get someone to do all the work for me... just a push in the right direction. (Unless of course you want to LoL)

October 11, 2013 at 11:06 am

You're probably going to have to go with regular expressions, and use a capturing expression. Give me a sec, I'll work up an example.

October 11, 2013 at 11:14 am

BTW, http://www.powershelladmin.com/wiki/Powershell_regular_expressions#Example_-_Named_Captures is a handy reference for this.

Assuming $out contains your output:

PS C:\> $out
Path: c:\test\test2
File: somefile.txt
Path: c:\test3\test
File: otherfile.txt

You can do something like this:

PS C:\> $out -match "Path:\s(?[:\w\\]*)"
True
PS C:\> $matches

Name                           Value
----                           -----
path                           c:\test\test2
0                              Path: c:\test\test2


PS C:\> $matches.path
c:\test\test2

The "?" creates a named capture expression, which is everything inside the (parens). So whatever the parens match get captured that way. You'll have to do some playing around, I expect, but when it comes to string parsing regex is usually the way.

October 11, 2013 at 11:17 am

Thank you for the expedient reply, Don. I will give it a shot. =)

October 11, 2013 at 11:20 am

Parsing this kind of text output is annoying, but doable. I'd start with something like this:

$properties = @{
    Path = $null
    File = $null
    Created = $null
}

$currentObject = New-Object psobject -Property $properties

$output = SomeCommandLineTool.exe

$(
    foreach ($line in $output -split '\r?\n')
    {
        if ($line -notmatch '^\s*(.+?)\s*:\s*(.+)$')
        {
            Write-Debug "'$line' does not match pattern."
            continue
        }

        $propertyName = $matches[1]
        $value = $matches[2]

        Write-Debug "PropertyName: '$propertyName', Value: '$value'"

        if (-not $properties.ContainsKey($propertyName))
        {
            continue
        }

        if ($null -ne $currentObject.$propertyName)
        {
            # This is the second time we've encountered this property, so it must be a new object.
            Write-Output $currentObject

            $currentObject = New-Object psobject -Property $properties
        }

        # Note:  this doesn't do any processing of the values, such as converting Created to a DateTime object.  It just takes the strings
        # as they appeared in the command's output, and sticks them into the CSV file.
        $currentObject.$propertyName = $value
    }

    # Outputting the last object in the file
    Write-Output $currentObject
) |
Export-Csv -Path .\output.csv -NoTypeInformation

November 1, 2013 at 1:44 pm

I happen to be the author of the article Don linked to, and came across this article during one of the rare occasions where I check the Google WebMaster tools. Thanks for the "acknowledgement". 🙂 My reply is a bit late, however.

First, I'll start by saying that this is a quite imprecise request for help. The data posted shows severe illogical inconsistencies, such as C:\test\test2\somefile2.txt appearing three times, with less and less data. First the comment is missing, then the "Created" field is missing. This is not even commented on? But worry not, I have accounted for this broken behaviour as well – at least partially... Named captures are indeed handy here. There will simply be duplicate entries made for such files, with less info for the "broken" occurrences. I made some notes about what you might want to do instead in the comments in the code itself.

Second, I made quite a few assumptions when writing this code and regexp. One being that there will always be a "Path" and "File" entry. Another being that there are no comments with two newlines in a row, since I split on multiple newlines. That last one might bite you, I suspect. If so, keep in mind that you should try to anchor on "Path:", which seems like the safe choice given your (quite broken ;-)) test data.

The text file I'm attaching is based on CSV generated from parsing the initial poster's exact data as pasted from the web browser into notepad.exe. Another thing I made sure to do, despite not accounting for multiple newlines in a row in a comment, is to handle potential multi-line comments; they will be joined with a semicolon in the comment field.

The regexp could have been written differently; especially playing with ".", (?s), "\r\n", ".+?", etc. will all work in many different ways. I chose something halfway sane here. Initially I wrote it without stripping out \r, but doing that was apparently simply a lot easier than the other options.

Anyway, here's the code I wrote that works for all your test data, pasted verbatim from the browser into notepad:

Set-StrictMode -Off
$SingleLineData = (Get-Content -Raw -Path E:\temp\old-output.txt) -join "`n"

# This makes things a lot easier in the regex later (strip out \r).
$SingleLineData = $SingleLineData -replace '\r', ''

# Consider broken data with repeated elements... Use hash keys that represent path + file.
# Couldn't be bothered now... To handle the utterly broken test data, and the possibility
# of a path+file without "Created" and "Comment" appearing before an entry with one or both
# of those, you'd want to store an object in the hash and to look up to see if properties
# are already populated using some if statements. Won't bother.
#$AlreadyDone = @{}

# Probably should consider the case of comments with two newlines in a row.
# That would require different logic that anchors on "Path:" which seems safe
# even given this utterly broken test data...
# I copied and pasted the output from my browser to notepad, and the test data had
# \r\n for some newlines. I figure this might be an artifact carried over from the
# actual output data, so beware of that.
#@(
foreach ($BlockText in $SingleLineData -split "\n{2,}") {
    if ($BlockText -imatch "^\s*Path:\s+(?'Path'[^\n]+)\s*(?:Created:\s+(?'Created'[^\n]+))?\s*File:\s+(?'File'[^\n]+)\s*(?s:Comment\s+\([^)]+\)\s*:[^\n]*\n(?'Comment'.+))?") {
        New-Object PSObject -Property @{
            Path    = $Matches['Path']
            Created = $Matches['Created']
            File    = $Matches['File']
            Comment = $Matches['Comment'] -replace '[\r\n]+', ';'
        }
    }
}
#) | Export-Csv -Encoding UTF8 -NoType -Path old-output.csv

Cheers.

-Joakim