Powershell Regex and Excel

This topic contains 11 replies, has 3 voices, and was last updated by Profile photo of Wilfredo Perez Wilfredo Perez 2 years, 2 months ago.

  • Author
    Posts
  • #25422
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    I have a user who gets a file via email and has to take certain parts from the email and insert it into Excel. So for example: the email come is with the following info:

    User: John Doe
    Purchase: Laptop
    Cost: 1,300.00

    For now I am saving the contents of the email into a text file and using Regex to get the pattern and save the value into excel. The problem is that it I have more than one user who purchased a computer, it just reads to first user and that's it.

    This is the code I am using
    #start Excel

    Add-Type -AssemblyName Microsoft.Office.Interop.Excel
    $excel = New-Object -ComObject Excel.Application
    $excel.Visible = $true
    $workbook = $excel.Workbooks.add()
    $sheet1 = $workbook.Worksheets.Item("Sheet1")
    $sheet1.Name = "Week ending in April"
    $sheet2 = $workbook.Worksheets.Item("Sheet2")
    $sheet2.Name = "April 2015"
    $sheet3 = $workbook.Worksheets.Item("Sheet3")
    $sheet3.Name = "Total by Funds"

    #Add Heading Names
    $sheet1.Cells.item(1,1) = 'User'
    $sheet1.Cells.item(1,2) = 'Purchase'
    $sheet1.Cells.item(1,3) = 'Cost'

    #Format Spreadsheet
    $sheet1.Columns.Item("A").columnwidth="25"
    $sheet1.Columns.Item("B").columnwidth="15"
    $sheet1.Columns.Item("C").columnwidth="15"

    $range = $excel.Range("A1", "O1")
    $range.font.bold=$true
    #$range.Columns("A1", "O1").HorizonTalAlignment = -4108

    #$range.ColumnWidth ="30"

    $introw = 2

    # Start reading the file**************************************************************************************

    $file = Get-Childitem C:\Test\FinancialStatements\*.* -Include *.txt
    foreach ($file in $files)
    {

    $sheet1.Cells.item($intRow,1) = gc -Path $file | Select-String -Pattern 'User: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $sheet1.Cells.item($intRow,2) = gc -Path $file | Select-String -Pattern 'Purchase: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $sheet1.Cells.item($intRow,3) = gc -Path $file | Select-String -Pattern 'Cost: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
    }

  • #25423
    Profile photo of Matt McNabb
    Matt McNabb
    Participant

    Looks like you're running everything against the second row. You'll need to increment the $introw variable in each iteration of the foreach loop:

    foreach ($file in $files)
    {
        $sheet1.Cells.item($intRow,1) = gc -Path $file | Select-String -Pattern 'User: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $sheet1.Cells.item($intRow,2) = gc -Path $file | Select-String -Pattern 'Purchase: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $sheet1.Cells.item($intRow,3) = gc -Path $file | Select-String -Pattern 'Cost: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $introw++
    }
    
  • #25430
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    I do have $introw++ at the end. It did not copy over

  • #25444
    Profile photo of Matt McNabb
    Matt McNabb
    Participant

    Can you edit your OP so we can see exactly what the script looks like?

    Also, can you use the code formatting tags so that we can read this a little easier? It's in the top ribbon of the post editor – just put all your code between the 'pre' tags.

  • #25458
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    I would parse the file into a Powershell object and then just select the first line using Select-Object. There is probably a regex that could capture the data, but this works too. Simply split the lines at the semi-colon. If it's USER it is the start of new record, so initialize the hashtable. If it's COST, it's the last part of a record, so generate an object and return to $result. Anything in the middle, add it to the hashtable:

    $content = Get-Content C:\Users\Rsimmers\Desktop\test.txt.txt
    
    $results = foreach ($line in $content) {
        $arr = $line -split ":"
        switch ($arr[0].ToUpper()){
            "USER" {
                $props = @{}
                $props.Add($arr[0].Trim(),$arr[1].Trim())
            }
            "COST" {
                $props.Add($arr[0].Trim(),$arr[1].Trim())
                New-Object -TypeName PSObject -Property $props
            }
            default {
                $props.Add($arr[0].Trim(),$arr[1].Trim())
            }
        }
    }
    
    $results | Select User, Purchase, Cost | Format-Table -AutoSize
    

    Output:

    User        Purchase Cost    
    ----        -------- ----    
    John Doe    Laptop   1,300.00
    Sally Smith Desktop  800.00 
    

    Then you would just need to do something like this:

    foreach ($record in $results | Select -First 1) {
    
        $sheet1.Cells.item($intRow,1) = $record.User
        $sheet1.Cells.item($intRow,2) = $record.Purchase
        $sheet1.Cells.item($intRow,3) = $record.Cost
    }
    
  • #25467
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    I have three text file in a dir and I am using this code to parse each file

    $files = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
    foreach($file in $files)
    {
    
    Get-Content $file | Select-String -Pattern 'Transaction Type: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    Get-Content $file | Select-String -Pattern 'Settlement Date: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    Get-Content $file | Select-String -Pattern 'CUSIP Number: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
    }
    

    **It reads all three text files

    But when try to write to excel, it only read one text file and closes.

    $fileDir = Get-ChildItem C:\Test\FinancialStatements\
    foreach($file in $files)
    {
        $files = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
        $sheet1.Cells.item($intRow,1) = Get-Content -Path $file | Select-String -Pattern 'User Name: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value} 
        $sheet1.Cells.item($intRow,2) = Get-Content -Path $file | Select-String -Pattern 'Purchase: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $sheet1.Cells.item($intRow,3) = Get-Content -Path $file | Select-String -Pattern 'Cost:: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $introw ++
        
    }
    
  • #25472
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    Debug one thing at a time. First you need to determine if your for loop is returning the information you expect and each file is being enumerated, so try this:

    $files = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
    foreach($file in $files) {
        "Processing {0}" -f $file.Name
        $TransType = Get-Content $file | Select-String -Pattern 'Transaction Type: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $SettlementDate = Get-Content $file | Select-String -Pattern 'Settlement Date: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $CUSIP = Get-Content $file | Select-String -Pattern 'CUSIP Number: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
    
        "Transaction Type: {0}" -f $TransType
        "Settlement Date:  {0}" -f $SettlementDate
        "CUSIP Number: {0}" -f $CUSIP
    }
    
    

    Are you seeing each file you expect and the expected results?

  • #25473
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    Rob,

    I can see that it's reading all three text files beacuse of the info it writes to the console. The problem is when I start to write to excel, it only looks at one text file and closes. It does not finish reading the text file and moves to the next text file.

    Thanks
    Freddy

  • #25479
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    The code you posted above looks incorrect, it should look like this:

    #$fileDir = Get-ChildItem C:\Test\FinancialStatements
    $intRow = 0
    $files = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
    foreach($file in $files) {  
        $sheet1.Cells.item($intRow,1) = Get-Content -Path $file | Select-String -Pattern 'User Name: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value} 
        $sheet1.Cells.item($intRow,2) = Get-Content -Path $file | Select-String -Pattern 'Purchase: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $sheet1.Cells.item($intRow,3) = Get-Content -Path $file | Select-String -Pattern 'Cost:: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $intRow ++
    }
    
  • #25480
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    Rob,
    Can you try the below script.

    
    #start Excel
    
    Add-Type -AssemblyName Microsoft.Office.Interop.Excel
    $excel = New-Object -ComObject Excel.Application
    $excel.Visible = $true
    $workbook = $excel.Workbooks.add()
    $sheet1 = $workbook.Worksheets.Item("Sheet1")
    $sheet1.Name = "Week ending in April"
    $sheet2 = $workbook.Worksheets.Item("Sheet2")
    $sheet2.Name = "April 2015"
    $sheet3 = $workbook.Worksheets.Item("Sheet3")
    $sheet3.Name = "Total by Funds"
    
    #Add Heading Names
    $sheet1.Cells.item(1,1) = 'Week Ending Date'
    $sheet1.Cells.item(1,2) = 'Trade Date'
    $sheet1.Cells.item(1,3) = 'Cusip'
    $sheet1.Cells.item(1,4) = 'SSID'
    $sheet1.Cells.item(1,5) = 'Fund Name and Class'
    #$sheet1.Cells.item(1,6) = 'Class'
    $sheet1.Cells.item(1,6) = 'Affiliate'
    $sheet1.Cells.item(1,7) = 'Type of Trade'
    $sheet1.Cells.item(1,8) = 'Subscriptions'
    $sheet1.Cells.item(1,9) = '# of Sub Trades'
    $sheet1.Cells.item(1,10) = 'Redemptions'
    $sheet1.Cells.item(1,11) = '# of Red Trades'
    $sheet1.Cells.item(1,12) = 'Net Flows'
    $sheet1.Cells.item(1,13) = 'Dealer'
    $sheet1.Cells.item(1,14) = '# of Trades'
    $sheet1.Cells.item(1,7).HorizonTalAlignment = -4108
    #$sheet1.Cells.Item(1,1).font.bold=$true
    
    #Format Spreadsheet
    $sheet1.Columns.Item("A").columnwidth="25"
    $sheet1.Columns.Item("B").columnwidth="15"
    $sheet1.Columns.Item("C").columnwidth="15"
    $sheet1.Columns.Item("D").columnwidth="10"
    $sheet1.Columns.Item("E").columnwidth="60"
    $sheet1.Columns.Item("F").columnwidth="25"
    $sheet1.Columns.Item("G").columnwidth="15"
    $sheet1.Columns.Item("H").columnwidth="20"
    $sheet1.Columns.Item("I").columnwidth="20"
    $sheet1.Columns.Item("J").columnwidth="20"
    $sheet1.Columns.Item("K").columnwidth="20"
    $sheet1.Columns.Item("L").columnwidth="15"
    $sheet1.Columns.Item("M").columnwidth="30"
    $sheet1.Columns.Item("N").columnwidth="15"
    
    $range = $excel.Range("A1", "O1")
    $range.font.bold=$true
    #$range.Columns("A1", "O1").HorizonTalAlignment = -4108
    
    #$range.ColumnWidth ="30"
    
    $introw = 2
    
    $files = Get-ChildItem C:\Test\*.* -Include *.txt 
    $txtFile = gc $files.
    Foreach ($file in $files)
    {
        Foreach($txtfile in $file){
               $sheet1.Cells.item($intRow,2) =  Get-Content $txtfile | Select-String -Pattern 'CUSIP Number: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
        $introw++
        $txtFile 
        }
    
    }
    

    It give me the name of the three text files, but its not reading each text from the beginning to end. It reads the first few line than it goes to the next file and so on. I need it to read the whole text file. I think the excel part is the problem. I am trying different things.

  • #25531
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    Your logic is still not correct. I think what you are not understanding is that Get-Content returns an array of lines, not the entire text file, so you can't search for the pattern in the entire file.

    -Get the files
    –for each file
    —get-content of file
    —-for each line in content
    —–if this line has foo in it, parse with this regex

    This goes back to my original post, I would parse the files and create objects FIRST and then process with Excel. With that logic, you can also validate if files exist and content exists prior to opening excel and creating all of those sheets to put nothing in it. Re-read my first post and see how I'm checking what is in each line before I create the object.

  • #25532
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    OK, I will try it

You must be logged in to reply to this topic.