Extracting Data from Text File

This topic contains 12 replies, has 3 voices, and was last updated by  Michael Vincenty 7 months, 2 weeks ago.

  • Author
    Posts
  • #67813

    Michael Vincenty
    Participant

    Hello,

    Newb here and if possible i would like some assistance with a powershell script. I need to be able to read through a text file and extract 4 pieces of information from each line and export it to a text file. My current script adds and removes a bunch of regex to each line in order to remove everything I don't need but unfortunately i have to keep changing it every month.

    Example of the Text below:
    mrc_app_access_20170224.log: JOBID2680528370: MRC_MAIN.S00846s-BatchJob ended: Fri Feb 24 04:20:53 EST 2017. time used: 1248422ms ****

    The ideal output: 20170224 S00846 BatchJob 04:20:53

    The rest of the data is garbage for my purposes. I found some helpful info here https://social.technet.microsoft.com/Forums/windowsserver/en-US/77c03443-927b-43a3-a27b-77a6c8bd238c/powershell-read-only-specific-parts-of-a-line-in-a-txt-file?forum=winserverpowershell
    But I am not versed enough in powershell to make it work for my needs.

    Any powershell gurus out there know of a easy solution for me? Any and all help is appreciated.

    Thanks in advance

  • #67818

    Sam Boutros
    Participant
    $myInput = 'mrc_app_access_20170224.log: JOBID2680528370: MRC_MAIN.S00846s-BatchJob ended: Fri Feb 24 04:20:53 EST 2017. time used: 1248422ms ****'
    
    $myOutput = New-Object -TypeName PSObject -Property @{
        # Identified Date as the 8 characters preceeding '-log:'
        Date     = $myInput.Substring($myInput.IndexOf('.log:')-8,8)
        # Identified BatchJob as the 7 characters preceeding '-BatchJob'
        BatchJob = $myInput.Substring($myInput.IndexOf('-BatchJob')-7,7)
        # Identified Ended as the 8 characters that start 12 characters after 'ended:'
        Ended    = $myInput.Substring($myInput.IndexOf('ended:')+18,8)
    }
    
    $myOutput
    
    • #67837

      Michael Vincenty
      Participant

      Sam,

      Would I have to place that code into a foreach loop? If so can you show me as I am not very good at those? And also am I correct in my assumption that I can change the number in the indexof(x,x) for my needs? The location where you see batch jobs is usually usernames and I would need to capture one more variable if not more.

  • #67840

    random commandline
    Participant

    You can put this into a loop if you have multiple text files.

    $file = Get-ChildItem 'C:\users\user99\desktop\MRC Access Logs\combinedlogs.txt'
    
    $appusage = switch -Regex -File $file {
    '_(\d{8}).*MAIN.(.*)-(.*) [se].*(\d{2}:\d{2}:\d{2})' 
        {    [PSCustomObject]@{
                Date = $Matches[1]
                BatchJob = $Matches[2]
                UserName = $Matches[3]
                End = $Matches[4]}
        }
    }
    
    $appusage | Out-File "C:\users\mvincenty\desktop\MRC Access Logs\appusage.txt"
    # If you prefer a csv file
    $appusage | Export-Csv "C:\users\mvincenty\desktop\MRC Access Logs\appusage.csv" -NoTypeInformation
    
    • #67843

      Sam Boutros
      Participant

      in a loop:

      $myInput = Get-Content .\myfile.txt # Array input
      
      $myOutput = $myInput | % {
          New-Object -TypeName PSObject -Property @{
              # Identified Date as the 8 characters preceding '-log:'
              Date     = $myInput.Substring($myInput.IndexOf('.log:')-8,8)
              # Identified BatchJob as the 7 characters preceding '-BatchJob'
              BatchJob = $myInput.Substring($myInput.IndexOf('-BatchJob')-7,7)
              # Identified Ended as the 8 characters that start 12 characters after 'ended:'
              Ended    = $myInput.Substring($myInput.IndexOf('ended:')+18,8)
          }
      }
      
      $myOutput # Array output
      
    • #67848

      Michael Vincenty
      Participant

      I have modified my code to the below:

      $textfile1 = "C:\users\user99\desktop\MRC Access Logs\combinedlogs.txt"
      
      $myInput = $textfile1 # Array input
      
      $myOutput = $myInput | % {
          New-Object -TypeName PSObject -Property @{
              # Identified Date as the 8 characters preceding '-log:'
              Date     = $myInput.Substring($myInput.IndexOf('.log:')-8,8)
              # Identified Application as the 16 characters preceding '-'
              Application = $myInput.Substring($myInput.IndexOf('-')-16,16)
              # Identified Username as the 7 characters following '-'
              $start = $myInput.indexOf("-") + 1
              $end = $myInput.indexOf("d:", $start)
              $length = $end - $start
              Username = $myInput.Substring($start, $lenght)
              # Identified Time as the 8 characters preceding 'EST'
              Time    = $myInput.Substring($myInput.IndexOf('EST')-9,8)
          }
      }
      # Array output
      $myOutput  | Out-File "C:\users\mvincenty\desktop\MRC Access Logs\appusage.txt"
      

      I am getting the error below
      Exception calling "Substring" with "2" argument(s): "StartIndex cannot be less than zero.
      Parameter name: startIndex"
      At line:2 char:5
      New-Object -TypeName PSObject -Property @{
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      CategoryInfo : NotSpecified: (:) [], MethodInvocationException
      FullyQualifiedErrorId : ArgumentOutOfRangeException

      I had to change the line for the username to the above due to the dynamic range of what the username could vary. If any of it looks incorrect please advise. Below is example of the file:
      mrc_app_access_20170224.log: JOBID4218313870: MRC_MAIN.R05847-user1 started: Fri Feb 24 08:16:23 EST 2017
      mrc_app_access_20170224.log: JOBID4218477858: MRC_MAIN.R05847-user1 started: Fri Feb 24 08:16:24 EST 2017
      mrc_app_access_20170224.log: JOBID4218729567: MRC_MAIN.R00907-user23 started: Fri Feb 24 08:16:27 EST 2017
      mrc_app_access_20170224.log: JOBID4218729569: MRC_MAIN.I00150-user23 started: Fri Feb 24 08:16:27 EST 2017
      mrc_app_access_20170224.log: JOBID4218729569: MRC_MAIN.I00150-user23 ended: Fri Feb 24 08:16:27 EST 2017. time used: 15ms

      Thank you for all your assistance.

  • #67851

    random commandline
    Participant

    Check my previous post for an updated solution.

    • #67863

      Michael Vincenty
      Participant

      random commandline,

      Thanks for the help the changes in your code worked but I am getting additional data that I will need to remove. I was wondering if you could explain how your code works?

      In the text version i am seeing where the username or batchjob as user1...(ex. gdonhaus...) while in the csv it is grabbing more such as the wording "(action ...)" or "started:". I would like to modify the code you provided to grab from '-' in '-user etc' until the blank space.

      mrc_app_access_20170224.log: JOBID4279904271: MRC_MAIN.MRCView-user11111 (action: WW List) ended: Fri Feb 24 08:26:39 EST 2017. time used: 14ms

  • #67914

    random commandline
    Participant

    Ok, replace the old regular expression line with the new. I used a switch statement which reads one line at a time utilizing less memory and processes larger text files faster. The regular expression uses capture groups (parentheses) to match the parts you want. This (.*?) will match the smallest set and not be greedy (which is what happened with your last example).

    # Old
    '_(\d{8}).*MAIN.(.*)-(.*) [se].*(\d{2}:\d{2}:\d{2})'
    # New
    '_(\d{8}).*MAIN.(.*)-(.*?) .*(\d{2}:\d{2}:\d{2})' 
    
    • #67915

      Michael Vincenty
      Participant

      Awesome.

      Thanks for the help guys. This seem to have solved my issue. Now i just have to complete the script by filtering out un-needed application.

      Thanks,

    • #67923

      Michael Vincenty
      Participant

      Random Commandline,

      I have 3 questions,

      1. In the text version of the output some random username have user... after their name. Is the regex truncating the name and how can i get around this?
      2. Can I use the csv same as the text file for the following command?

      Get-Content $textfile2 | where-object {$_ -notmatch '.*mrcmenu*' -and $_ -notmatch '.*I00030*' -and $_ -notmatch '.*I00035*' -and $_ -notmatch '.*I00050*' -and $_ -notmatch '.*I00060*' -and $_ -notmatch '.*I00080*' -and $_ -notmatch '.*I00100*' -and $_ -notmatch '.*I00140*' -and $_ -notmatch '.*I00150*' -and $_ -notmatch '.*I00160*' -and $_ -notmatch '.*I00200*' -and $_ -notmatch '.*I00240*' -and $_ -notmatch '.*I00333*' -and $_ -notmatch '.*I00372*' -and $_ -notmatch '.*I00373*' -and $_ -notmatch '.*I00382*' -and $_ -notmatch '.*I00401 *' -and $_ -notmatch '.*I00402*' -and $_ -notmatch '.*I00403*' -and $_ -notmatch '.*I00442*' -and $_ -notmatch '.*I00472*' -and $_ -notmatch '.*I00642*' -and $_ -notmatch '.*I00643*' -and $_ -notmatch '.*I00712*' -and $_ -notmatch '.*I00713*' -and $_ -notmatch '.*I00732*' -and $_ -notmatch '.*I00752*' -and $_ -notmatch '.*I00753*' -and $_ -notmatch '.*I00762*' -and $_ -notmatch '.*I00772*' -and $_ -notmatch '.*I00852*' -and $_ -notmatch '.*I00863*' -and $_ -notmatch '.*I00912*' -and $_ -notmatch '.*I00913*' -and $_ -notmatch '.*I00925*' -and $_ -notmatch '.*I00932*' -and $_ -notmatch '.*I00993*' -and $_ -notmatch '.*I01052*' -and $_ -notmatch '.*I01053*' -and $_ -notmatch '.*I01072*' -and $_ -notmatch '.*I01152*' -and $_ -notmatch '.*I01182*' -and $_ -notmatch '.*I01192*' -and $_ -notmatch '.*I01222*' -and $_ -notmatch '.*I01262*' -and $_ -notmatch '.*I01263*' -and $_ -notmatch '.*I01493*' -and $_ -notmatch '.*I01494*' -and $_ -notmatch '.*I01495*' -and $_ -notmatch '.*I01496*' -and $_ -notmatch '.*I01497*' -and $_ -notmatch '.*I01515*' -and $_ -notmatch '.*I01605*' -and $_ -notmatch '.*I01615*' -and $_ -notmatch '.*I01927*' -and $_ -notmatch '.*I02007*'} | set-content 'C:\users\mvincenty\desktop\MRC Access Logs\appdataimport.txt'
      

      3. How can I shorten the code above in length, time, processing and memory? I will also need it to be flexible as i will need to add more "exclusions" to the list.

  • #67935

    random commandline
    Participant

    1&2. I recommend using csv files. You don't have to worry about the truncating issue, but this might work.

    $appusage | Format-Table -AutoSize | 
    Out-File "C:\users\mvincenty\desktop\MRC Access Logs\appusage.txt" 

    3. That is a long list. Will this work for you?

    (Get-Content $textfile2) -notmatch 'I00|mrcmenu'
    • #67936

      Michael Vincenty
      Participant

      3. Yes and no. I need some of the I's as some are reports the rest are option list. However I could use the S00. Would it be faster if I just used csv format instead?

You must be logged in to reply to this topic.