Read text file and find match

This topic contains 15 replies, has 3 voices, and was last updated by Profile photo of Rob Simmers Rob Simmers 2 years, 2 months ago.

  • Author
    Posts
  • #25637
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    Here is the code I am using

    
    $Gettext = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
    
    Foreach ($text in $Gettext){
    $txtline = Get-Content -Path $text 
       
        Foreach($line in $txtline){
        $line1 = Get-Content -Path $line | select-string -Pattern  'Transaction Type:\s(?[a-z].+)' | select -expand Matches | foreach {$_.groups["digit"].value}
        $line1
        }
    
    }
    
    

    $line1 should give me a match but instead it give me the content of the file.

  • #25643
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    You are calling Get-Content too many times:

    $Gettext = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
     
    Foreach ($text in $Gettext){
    $txtline = Get-Content -Path $text 
     
        Foreach($line in $txtline){
            $line1 = $line | select-string -Pattern  'Transaction Type:\s(?[a-z].+)' | select -expand Matches | foreach {$_.groups["digit"].value}
            $line1
        }
    }
    

    Keep in mind that if there are 3 lines and line 2 has what you are looking for, $line1 would be blank. You are overwriting $line1 for each line, so on the third line there is no match and it would write nothing.

  • #25648
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    I wanted to show you an example of search the entire contents of a file. Here is what is in the text file:

    Some Column: Some Data
    Transaction Type: 3523tergdg
    Another Column: More Data
    

    Next we can use the FileSystemObject which was popular in vbScript days to do a .ReadAll() method and get all of the content and search for a regex match:

    $objFSO = New-Object -ComObject Scripting.FileSystemObject
    $textFile = $objFSO.OpenTextFile("C:\Users\Rob\Desktop\test.txt", 1, $false)
    $textFile.ReadAll() -match "Transaction Type:\s+(.*)"
    $textFile.Close()
    
    $Matches
    
  • #25655
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    Rob,
    I would like to use the select-string in the Foreach loop. I will try your way first. So if I have 14 different matches can I use the select-string in the loop?

    Thanks
    Freddy

  • #25672
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    I did try your script and once it finds a match it exits. How can I make a loop so it can find more matches

  • #25676
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    I think you are overcomplicating this. Do all of the files have the same "headers" in them with the data separated by colons like the example file I used above? You are trying to use a regex solution which I think is overboard to just pull data out of colon-separated data. If your files had no similarity and say Transaction Type was just hidden in all of this random junk:

    Foo DataFoo DataFoo DataFoo DataFoo DataFoo
    Foo DataFoo DataFoo Data Transaction Type: 3523tergdgFoo DataFoo DataFoo DataFoo Data
    Foo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo Data
    Foo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo Data
    
    Foo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo Data
    Foo DataFoo DataFoo DataFoo DataFoo DataFoo Data
    

    In the above, I would use a regex pattern, which is what you are doing in Select-String. However, if your files have the same headers with a colon to separate data, then I would really consider something like this:

    #Get all text files
    $results = Get-ChildItem C:\Users\rsimmers\Desktop\* -Include *.txt -File | foreach {
        #Get the content of the file
        $content = Get-Content $_.FullName
        #Create a blank hash table for properties
        $props = @{}
        #Add a property with the File Name
        $props.Add("File Name", $_.Name)
        foreach($line in $content) {
            #Split the line at the colon to make an array
            $arrLine = $line.Split(":")
            #Add the header ($arrLine[0]) and data ($arrLine[1]) into a hash table
            $props.Add(($arrLine[0]).Trim(), ($arrLine[1]).Trim())
        }
       #Create a new object with the properties and store in $results
       New-Object -TypeName PSObject -Property $props
    }
    
    $results | Format-Table -AutoSize
    

    I placed 3 text files with the same headers with different data on my desktop and run the script would produce something like this:

    File Name Another Column Transaction Type Some Column   
    --------- -------------- ---------------- -----------   
    test.txt  More Data      3523tergdg       Some Data     
    test1.txt Foo Data       35w46235ythdg    Different Data
    test2.txt Data Foo       o68rg352ty34tyr  Junk Data   
    

    Now that the files are parsed, you know what file the data came from, you can do whatever you want with it. If you are convinced you have to use regex, to get multiple matches you would do something like this:

    $objFSO = New-Object -ComObject Scripting.FileSystemObject
    $textFile = $objFSO.OpenTextFile("C:\Users\rsimmers\Desktop\test.txt", 1, $false)
    $content = $textFile.ReadAll()
    $transaction = ([regex]"Transaction Type:\s+(.*)").match($content).groups[1].value
    $anothercolumn = ([regex]"Another Column:\s+(.*)").match($content).groups[1].value
    $textFile.Close()
     
    $transaction
    $anothercolumn
    

    There are a lot of ways to do things. If you really want help, you need to be specific with what you are trying to do and the challenges you are having. Provide examples and expected results. Tell us how you want to use the data.

  • #25681
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    Rob,

    I have 3 + text files in a folder. Each text file have the same info like Transaction Type: Purchase, Cost: 250.00, etc. I already have the code to open Excel and format the sheets. My problem is that it loops through a file once and it goes to the next text file.

    This is a sample code:

    #start Excel
    
    Add-Type -AssemblyName Microsoft.Office.Interop.Excel
    $excel = New-Object -ComObject Excel.Application
    $workbook = $excel.Workbooks.add()
    $wrksht = $workbook.Worksheets.Item("sheet1")
    $wrksht.name = "Hello"
    $excel.Visible = $true
    
    $introw = 2
    
    $files = Get-ChildItem -Path C:\Test\FinancialStatements\*.* -Include *.txt
    foreach ($txtfile in $files){
         $wrksht.Cells.item($intRow,3) = Get-Content -Path $txtfile | select-string -Pattern  'Transaction Type:\s(?[a-z].+)' | select -expand Matches | foreach {$_.groups["digit"].value}
         $introw ++
    }
    

    So if a have more than 1 Transaction Type: Purchase it will only match the first one and move to the next Text file. I think I need another foreach loop

  • #25682
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    Does this code work?

    #Get all text files
    $results = Get-ChildItem C:\Users\rsimmers\Desktop\* -Include *.txt -File | foreach {
        #Get the content of the file
        $content = Get-Content $_.FullName
        #Create a blank hash table for properties
        $props = @{}
        #Add a property with the File Name
        $props.Add("File Name", $_.Name)
        foreach($line in $content) {
            #Split the line at the colon to make an array
            $arrLine = $line.Split(":")
            #Add the header ($arrLine[0]) and data ($arrLine[1]) into a hash table
            $props.Add(($arrLine[0]).Trim(), ($arrLine[1]).Trim())
        }
       #Create a new object with the properties and store in $results
       New-Object -TypeName PSObject -Property $props
    }
     
    $results | Format-Table -AutoSize
    
  • #25683
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    This script is some what working for me:

    #start Excel
    
    Add-Type -AssemblyName Microsoft.Office.Interop.Excel
    $excel = New-Object -ComObject Excel.Application
    $workbook = $excel.Workbooks.add()
    $wrksht = $workbook.Worksheets.Item("sheet1")
    $wrksht.name = "Hello"
    $excel.Visible = $true
    
    $introw = 2
    $files = Get-ChildItem -Path C:\Test\FinancialStatements\*.* -Include *.txt
    Foreach($txtfile in $files){
       Foreach ($line in [System.IO.File]::OpenRead($txtfile)){
       $Trans = $line| select-string -Pattern 'Transaction Type:\s(?[a-z].+)' | select -expand Matches | foreach {$_.groups["digit"].value}
       $Trans
       #$wrksht.Cells.item($intRow,3) = $Trans
       $introw ++
    
       }
       
    }
    

    Sample file

    Transaction Type: Purchase
    Cost:250.00

    Transaction Type: Purchase
    Cost:250.00

    Transaction Type: Purchase
    Cost:250.00

    Instead of writing to excel line after line it skips two lines and write Transaction Type: Purchase, skips two lines and write. I would like it like so"
    Transaction Type: Purchase
    Transaction Type: Purchase
    Transaction Type: Purchase

    Not like:
    Transaction Type: Purchase

    Transaction Type: Purchase

    Transaction Type: Purchase
    .

  • #25685
    Profile photo of Simon Wåhlin
    Simon Wåhlin
    Participant

    Hi Wilfredo,

    The regular expression in your code is invalid.
    This part of the expression:

    (?[a-z].+)
    

    In a regular expression, the match of something within parenthesis is called a group.
    By default, each match of a group is assigned a number as an index. To be able to refer
    to a specific group by name you can add the following to the beginning of the group:

    ?
    

    This is what your regex almost does. Since you later on refer to the group as "digit" this is propably the regex you need:

    "Transaction Type:\s(?[a-z].+)"
    

    This will match any row containing "Transaction Type:" followed by a whitespace (like space or tab) followed by one or more of any char. The group "digit" will contain the part after the whitespace.

    Regarding the looping, if you want multiple matches from one file to appear in the same cell, you need to combine the values to one string. For example like this:

    foreach ($txtfile in $files){
         $Strings = Get-Content -Path $txtfile | 
         	select-string -Pattern  "Transaction Type:\s(?[a-z].+)" | 
         		select -expand Matches | 
         			foreach {$_.groups["digit"].value}
         $wrksht.Cells.item($intRow,3) = $Strings -join ","
         $introw ++
    }
    

    If your file contains two matching rows like this:
    Transaction Type: Purchase
    Transaction Type: Sale
    The cell would then have the value: Purchase,Sale

    If you would like to add each value to its own row, you should probably do something like this:

    foreach ($txtfile in $files){
        Get-Content -Path $txtfile | 
        	select-string -Pattern  "Transaction Type:\s(?[a-z].+)" | 
        		select -expand Matches | 
        			foreach {
        				$wrksht.Cells.item($intRow,3) = $_.groups["digit"].value
            			$intRow++
    				}
    }
    

    Note how I"ve added a line break after each pipechar (|) which I think makes the code a little easier to read.

    EDIT: Tried to get code to display properly but I give up

  • #25686
    Profile photo of Simon Wåhlin
    Simon Wåhlin
    Participant

    Ops, I just realized that the part of the regex that I thought was missing from your code was just parsed away by the HTML, sorry!

    I'll try to edit my post to get it correct.

  • #25688
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    The code you posted worked. but I need to match around 14 different words

  • #25689
    Profile photo of Simon Wåhlin
    Simon Wåhlin
    Participant

    13 alternatives to "Transaction Type"?

    Does the files you read only contain things you want or alot of other data too?
    It is hard to design a search pattern for a file without knowing the whole content of the file.

  • #25693
    Profile photo of Wilfredo Perez
    Wilfredo Perez
    Participant

    Here are the contents:

    Please see the large trade(s) below:
                                                                                                      
    Transaction Type: PURCHASE                
    Trade Amount:    250.00
    Trade Date: 05/14/15              
    Settlement Date: 05/15/15
    Fund Name and Class: ABCDEFG              
    Symbol: ABCD  
    CUSIP Number: 12345
    Transfer Agent Fund Number:  123
    State Street Fund Number: 1111
                                                    
    ________________________________________________________________________________

    Transaction Type: PURCHASE
    Trade Amount: 250.00
    Trade Date: 05/14/15
    Settlement Date: 05/15/15
    Fund Name and Class: ABCDEFG
    Symbol: ABCD
    CUSIP Number: 12345
    Transfer Agent Fund Number: 123
    State Street Fund Number: 1111
                                                     
    In some text files there is more of the some data. So I may have 5 text files with the same king of data. I need to match the Trade Date: and input the value "05/14/15" in Excel and do the same for each text file.

  • #25696
    Profile photo of Simon Wåhlin
    Simon Wåhlin
    Participant

    Ok, so each files contains many transactions and you want to break out Transaction Type and Trade Date from each transaction?

    I would suggest looking into the new cmdlet ConvertFrom-String in PowerShell V5.

    If that is not an option, try to read each file and split it in a way that each part contains just one transaction, for that part, replace : with = and use ConvertFrom-StringData to create objects.

  • #25711
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    If each transaction always starts and finishes with the same data, I would use what I posted in the other post you opened for the same thing. Understand that putting the data into Excel is trivial, the important part of this process is parsing the data and making it useable to place in Excel. Your feeding small pieces of information instead of providing examples and requirements up front, so it's really hard for anyone to actually help you. With that said, here is modified code from the other post:

    $results = foreach ($file in (Get-ChildItem -Path C:\Users\rsimmers\Desktop\* -Include *.txt -File )) {
        $content = Get-Content $file.FullName
        #Start a record number, so if there are 4 entries in a file, you would see what
        #was the first entry in the file
        $intRecordNum = 1
        foreach ($line in $content) {
            #Process any line that has a colon in it
            if ($line -like "*:*") {
                #Split the line at the colon to create an array with the header and data
                $arr = $line -split ":"
                #Create a switch with the header
                switch ($arr[0].Trim().ToUpper()){
                    "TRANSACTION TYPE" {
                        #Transaction Type is the start of a new record, so we create a blank property hashtable
                        $props = @{}
                        #Add the transaction type data
                        $props.Add($arr[0].Trim(),$arr[1].Trim())
                        #Add the file name for reference
                        $props.Add("FileName", $file.Name)
                        #Add the record number to indicate the sequence of data in the file
                        $props.Add("RecordNumber", $intRecordNum)
    
                    } #TRANSACTION TYPE
                    "STATE STREET FUND NUMBER" {
                        #State Street is the last entry of a record, so add the data
                        $props.Add($arr[0].Trim(),$arr[1].Trim())
                        #Since it's the last record, we want to actually create the object
                        #with all of the properties we've gathered.  This is returned to the $results variable
                        New-Object -TypeName PSObject -Property $props
                        #Again, this our last record, so we increment the record number
                        $intRecordNum++
                    } #STATE STREET FUND NUMBER
                    default {
                        #This is anything in between the start and finish headers, we're just creating a property for it
                        $props.Add($arr[0].Trim(),$arr[1].Trim())
                    } #default
                }#switch header name
            } #if line contains a colon
        } #foreach line
    } #foreach file
     
    $results | Select * | Format-Table -AutoSize
    

    I have 3 text files, one with two transactions, one with 4 and another with 3 using the format you posted above. When I run this against the files, I get the following:

    Symbol RecordNumber FileName  State Street Fund Number Settlement Date CUSIP Number Trade Amount Transaction Type Trade Date Fund Name and Class
    ------ ------------ --------  ------------------------ --------------- ------------ ------------ ---------------- ---------- -------------------
    ABCD              1 test.txt  1111                     05/15/15        12345        250.00       PURCHASE         05/14/15   ABCDEFG            
    ABCD              2 test.txt  1111                     05/15/15        12345        1354.00      PURCHASE         05/14/15   ABCDEFG            
    RVFE              1 test1.txt 5342                     05/15/15        463432       451.00       PURCHASE         05/14/15   REGFSFVF           
    RVFE              2 test1.txt 5342                     05/15/15        463432       4845.00      PURCHASE         05/14/15   ABCDEFG            
    RVFE              3 test1.txt 5342                     05/15/15        463432       5621.00      PURCHASE         05/14/15   REGFSFVF           
    RVFE              4 test1.txt 5342                     05/15/15        463432       250.00       PURCHASE         05/14/15   ABCDEFG            
    GFGE              1 test2.txt 8634                     05/15/15        789453       1434.00      PURCHASE         05/14/15   LIOBDMSS           
    GFGE              2 test2.txt 8634                     05/15/15        789453       3351.00      PURCHASE         05/14/15   ABCDEFG            
    GFGE              3 test2.txt 8634                     05/15/15        789453       121.00       PURCHASE         05/14/15   LIOBDMSS
    

    We now have the filename the transactions came from, the order of the transactions in the file (i.e.RecordNumber) and all of the data for each transaction. Now I highly recommend that you do NOT try to rip this apart and put it in Excel. Validate that that this code correctly parses that data in your files. You should see similar output as above. Again, no Excel, does the data parse correctly? If the data parses, I would convert this code into a function, like Get-TransactionData. Then it's a simple to get it into Excel.

You must be logged in to reply to this topic.