Read through raw text files in a directory and report out a count of unique data

Welcome Forums General PowerShell Q&A Read through raw text files in a directory and report out a count of unique data

This topic contains 6 replies, has 3 voices, and was last updated by

 
Participant
3 months, 2 weeks ago.

  • Author
    Posts
  • #109568

    Participant
    Points: 0
    Rank: Member

    Hello.  I need help.  I'm tinkering with a ps script, but I am a novice and cannot get this to work.  Here's what I need:1 –  open a directory2 –  read in the text files in the directory (*.txt)3 – open each one & do a delimiter functions.  Each text file is raw data.  It doesn't have any headers, etc.  The "|" pipe symbol is the delimiter; each field is separated by a pipe.4 – I want it to count for unique last name.5 – what's the easiest way to output that data to summarize for all files?  Is it easier in a csv output or a text file output?
    I was able to get this code to work for one text file.

    @(Import-Csv C:\itdept\494-Billing_201808010300.txt -Delimiter "|" -Header a,b,c,d,e | Select-Object -unique d,e

    The directory is like this  D:\RASLO-BILLING\
    The format of the text file is like this:

    1|2R149653||DOE|JANE||19790127|DOE|JANE||231 STREET AVE||CITY|CA|93420||805|1234567|N|18||||||v|

    2|2R149653||DOE|JANE||19790127|231 STREET AVE||CITY|CA|93420||805|1234567|||F||6230|||||||||||||||

    3|2R149653||DOE|JANE||19790127|P|136391580|||2950|BC||SIF928A56074|DOE|JANE||231 STREET AVE||CITY|CA|93420||805|1234567|F|19790127||18||||||||||INSURANCE 60007|PO BOX 60007||LOS ANGELES|CA|90060||INSURANCE 60007|(800) 227-3771||||||

    3|2R149653||DOE|JANE||19790127|S||||RAS1159|CAP||XEH903377346|DOE|JANE||231 STREET AVE||CITY|CA|93420||805|1234567|F|19790127||18||||||||||INSURANCE|PO BOX 13518||BAKERSFIELD|CA|93389||CCPN|(800)604-8752||||||

    4|2R149653||DOE|JANE||19790127|2950|3JXWVOV1|1|20180731|493|100|6230|70543||||||||||v|||WEISS|WENDY|E|892506|892506|C4|MRI SOFT TISSUE NECK W/O+W (35/40)|Procedure|1609867738|PISMO FAM PRAC|D|(805) 773-0707|N|1|136391580|494-707|2018079JNSGG2X6|

    1|2R149822||SMITH|JANET||20160919|SMITH|JANET||540 STREET AVE|#119|CITY|CA|93446||805|3694564|||||||||

    2|2R149822||SMITH|JANET||20160919|540 STREET AVE|#119|CITY|CA|93446||805|3694564|||F||2782|||||||||||||||

    3|2R149822||SMITH|JANET||20160919|P||||031809SS|MC||97814821G|SMITH|JANET||540 STREET AVE|#119|CITY|CA|93446||805|3694564|F|20160919||||||||||||INSURANCE|PO BOX 948||GOLETA|CA|93110||INSURANCE|(877) 814-1861||||||

    4|2R149822||SMITH|JANET||20160919|031809SS|3K08828P|1|20180731|493|132|2782|73100|||||||||||||CORYELL|JAMES|D|893024|893024|C4|XR RIGHT WRIST 2 VIEWS|Procedure|1427006212|CENT CST PEDS-TP|D|(805) 434-3796||0||493-702|20180800IHMY9ZH|

    1|2R149822||SMITH|JANET||20160919|SMITH|JANET||540 STREET AVE|#119|CITY|CA|93446||805|1234567|||||||||

    2|2R149822||SMITH|JANET||20160919|540 STREET AVE|#119|CITY|CA|93446||805|1234567|||F||2782|||||||||||||||

    3|2R149822||SMITH|JANET||20160919|P||||031809SS|MC||97814821G|SMITH|JANET||540 STREET AVE|#119|CITY|CA|93446||805|1234567|F|20160919||||||||||||INSURANCE|PO BOX 948||GOLETA|CA|93110||INSURANCE|(877) 814-1861||||||

    4|2R149822||SMITH|JANET||20160919|031809SS|3K08A1AE|1|20180731|493|132|2782|73100|||||||||||||CORYELL|JAMES|D|893023|893023|C4|XR LEFT WRIST 2 VIEWS|Procedure|1427006212|CENT CST PEDS-TP|D|(805) 434-3796||0||493-702|20180800IHMY9ZH|

     

     

     

  • #109750

    Participant
    Points: 93
    Rank: Member

    I think this is what you are looking for?

    $Path = 'd:\RASLOW-BILLING\*.txt'
    Import-Csv -Path $path -Delimiter '|' -Header a,b,c,d,e | Group-Object -Property d | Select-Object Name,Count
    
    • #109811

      Participant
      Points: 0
      Rank: Member

      Thanks. I will try that and see what it produces!

  • #109754

    Participant
    Points: 324
    Helping Hand
    Rank: Contributor

    Hey Gina. The key here is in order to do comparison or grouping, we need column headers. You can do it like you started, like so:

    $header = Idx,
              Hdr1,
              LastName,
              FirstName...
    

    I am lazy and I also have no idea what the data is to create a header, so we can take a line and make it a character array, find all of the pipe symbols and get a count (which is 44).

    $test = '4|2R149653||DOE|JANE||19790127|2950|3JXWVOV1|1|20180731|493|100|6230|70543||||||||||v|||WEISS|WENDY|E|892506|892506|C4|MRI SOFT TISSUE NECK W/O+W (35/40)|Procedure|1609867738|PISMO FAM PRAC|D|(805) 773-0707|N|1|136391580|494-707|2018079JNSGG2X6|'
    ($test.ToCharArray() | Where{$_ -eq '|'}).Count
    

    If there are 44 pipes, there could be a data after the last pipe, so I went with 45 generic headers:

    $header = 1..45 | foreach {'Hdr{0}' -f $_}
    

    Now I have a generic header row that is Hdr1, Hdr2 and so on up to Hdr45. Next we get 3 files, import them into a single collection so that we can start doing analysis:

    Now you can get grouped information:

    PS C:\Users\Rob> $results | Group-Object -Property hdr4 -NoElement
    
    
    Count Name                     
    ----- ----                     
        5 DOE                      
        8 SMITH                    
    
    
    
    PS C:\Users\Rob> $results | Group-Object -Property FileName -NoElement
    
    Count Name                     
    ----- ----                     
        5 file1                    
        4 file2                    
        4 file3                    
    
  • #109760

    Participant
    Points: 324
    Helping Hand
    Rank: Contributor

    Hey Gina. The key here is in order to do comparison or grouping, we need column headers. You can do it like you started, like so:

    $header = Idx,
              Hdr1,
              LastName,
              FirstName...
    

    I am lazy and I also have no idea what the data is to create a header, so we can take a line and make it a character array, find all of the pipe symbols and get a count (which is 44).

    $test = '4|2R149653||DOE|JANE||19790127|2950|3JXWVOV1|1|20180731|493|100|6230|70543||||||||||v|||WEISS|WENDY|E|892506|892506|C4|MRI SOFT TISSUE NECK W/O+W (35/40)|Procedure|1609867738|PISMO FAM PRAC|D|(805) 773-0707|N|1|136391580|494-707|2018079JNSGG2X6|'
    ($test.ToCharArray() | Where{$_ -eq '|'}).Count
    

    If there are 44 pipes, there could be a data after the last pipe, so I went with 45 generic headers:

    $header = 1..45 | foreach {'Hdr{0}' -f $_}
    

    Now I have a generic header row that is Hdr1, Hdr2 and so on up to Hdr45. Next we get 3 files, import them into a single collection so that we can start doing analysis:

    #Create a generic header array
    $header = 1..45 | foreach {'Hdr{0}' -f $_}
    #Get the files
    $files = Get-ChildItem 'C:\Scripts\*.txt'
    
    #Go thru each file and import it with the header
    $results = foreach ( $file in $files ) {
        $csv = Import-CSV -Path $file.FullName -Delimiter '|' -Header $header
        #Get all of the header information and add on the file we got the data from
        $csv | Select *, @{Name='FileName';Expression={$file.BaseName}}
    }
    
    #Assume header 4 is what we want to group
    $results | Group-Object -Property hdr4 -NoElement
    #See how many results we got per file
    $results | Group-Object -Property FileName -NoElement
    

    Now you can get grouped information:

    PS C:\Users\Rob> $results | Group-Object -Property hdr4 -NoElement
    
    
    Count Name                     
    ----- ----                     
        5 DOE                      
        8 SMITH                    
    
    
    
    PS C:\Users\Rob> $results | Group-Object -Property FileName -NoElement
    
    Count Name                     
    ----- ----                     
        5 file1                    
        4 file2                    
        4 file3                    
    
    • #110204

      Participant
      Points: 0
      Rank: Member

      It worked!  Your code helped me get it working.  I was able to also output to a csv file and email the file to the IT department.  Thank you so very much!

  • #110207

    Participant
    Points: 324
    Helping Hand
    Rank: Contributor

    Glad it worked!! Now you can show those IT guys how to use Powershell 🙂

The topic ‘Read through raw text files in a directory and report out a count of unique data’ is closed to new replies.