Parsing a text file with tab as delimiter

This topic contains 18 replies, has 5 voices, and was last updated by  AloneOnTheEdge 4 weeks, 1 day ago.

  • Author
    Posts
  • #82460

    AloneOnTheEdge
    Participant

    Hello team,
    I have a log file which i need to parse, i don't need all the details but need only few. below is how my file will look like.

    T=CEMTDR1864393_10/09/2017 S=CEMTDR1864393_10/09/2017 R=TDRALRT01 2017/09/08 02:32:52 1
    T=000284447_10/09/2017 S=000284447_10/09/2017 R=ITNEALRT01 2017/09/05 01:23:24 1
    T=C14288015_10/18/2017 S=C14288015_10/18/2017 R=ITNECASE01 2017/08/03 01:27:06 1
    T=SAM1291734_10/18/2017 S=SAM1291734_10/18/2017 R=1ZFAALRT01 2016/11/14 10:18:51 1

    1st column = String that starts after T= and ends before "_"
    2nd column = Date that is starting after "_" in the first string
    3rd column = String that starts after R=.

    Rest everything has to be ignored and the output should be written to a new file. Output will be like
    CEMTDR1864393 10/09/2017 TDRALRT01
    000284447 10/09/2017 ITNEALRT01

    Please let me know how this can be acheived using powershell. I am new to it and do not have much information

  • #82463

    Olaf Soyk
    Participant

    Hi, this forum is not about writing scripts for you. It's more about to answer questions about Powershell scripting. Do you have any specific question about a particular issue you have with Powershell scripting?

  • #82466

    JohnM
    Participant

    In PowerShell v5 there is a command ConvertFrom-String which can parse text files based off a template you create. A very good reason to upgrade to v5.

    Here are some links for more information
    https://foxdeploy.com/2015/01/13/walkthrough-part-two-advanced-parsing-with-convertfrom-string/

    http://blogs.msdn.com/b/powershell/archive/2014/10/31/convertfrom-string-example-based-text-parsing.aspx

    http://www.lazywinadmin.com/2014/09/powershell-convertfrom-string-and.html

  • #82481

    Don Jones
    Keymaster

    You could also use Import-CSV and just specify `t as the delimiter.

    • #82492

      AloneOnTheEdge
      Participant

      when i use
      Import-Csv file_path -Delimiter "`t" it just repeats the log file contents. I just need 3 columns

  • #82484

    Rick
    Participant

    It looks like your log file already has delimiters you can work with. How about:

    Get-Content  | % {
    $field1 = $_.Split("=")[1].Split("_")[0]
    $field2 = $_.Split("=")[1].Split("_")[1].Replace(" S","")
    $field3 = $_.Split("=")[3].Split(" ")[0]
    
    Write-output "$field1 $field2 $field3"
    }
    
  • #82498

    JohnM
    Participant

    Not sure I would use anything but ConvertFrom-String going forward. Very powerful tool This is what I got based upon your initial post.

    
    
    
    $template = @'
    T={First*:CEMTDR1864393}_{Date1:10/09/2017} S=CEMTDR1864393_10/09/2017 R={Third:TDRALRT01} 2017/09/08 02:32:52 1
    T={First*:000284447}_{Date1:10/09/2017} S=000284447_10/09/2017 R={Third:ITNEALRT01} 2017/09/05 01:23:24 1
    '@
    
    
    gc .\x.TXT | cfs -TemplateContent $template 
    

    Results are as follows:

    First Date1 Third
    —– —– —–
    CEMTDR1864393 10/09/2017 TDRALRT01
    000284447 10/09/2017 ITNEALRT01
    C14288015 10/18/2017 ITNECASE01
    SAM1291734 10/18/2017 1ZFAALRT01

  • #82501

    JohnM
    Participant
    This is the content of x.txt 
    
    T=CEMTDR1864393_10/09/2017 S=CEMTDR1864393_10/09/2017 R=TDRALRT01 2017/09/08 02:32:52 1
    T=000284447_10/09/2017 S=000284447_10/09/2017 R=ITNEALRT01 2017/09/05 01:23:24 1
    T=C14288015_10/18/2017 S=C14288015_10/18/2017 R=ITNECASE01 2017/08/03 01:27:06 1
    T=SAM1291734_10/18/2017 S=SAM1291734_10/18/2017 R=1ZFAALRT01 2016/11/14 10:18:51 1
    
    • #82505

      AloneOnTheEdge
      Participant

      Thank you John. I see you are hard-coding the values. I just provided a sample and number of records vary from 500 – 1000 or even more. I just want to extract 3 columns as mentioned in my initial post

      1st column = String that starts after T= and ends before "_"
      2nd column = Date that is starting after "_" in the first string
      3rd column = String that starts after R=

      Thanks again for trying to help me

  • #82502

    JohnM
    Participant

    Results are as shown below. (hope the font is non-proportional and this all lines up)

    
    First         Date1      Third     
    -----         -----      -----     
    CEMTDR1864393 10/09/2017 TDRALRT01 
    000284447     10/09/2017 ITNEALRT01
    C14288015     10/18/2017 ITNECASE01
    SAM1291734    10/18/2017 1ZFAALRT01
    
    • #82510

      AloneOnTheEdge
      Participant

      Perfect. It worked like magic..Thank you so much.
      I have couple of questions
      1. gc .\x.txt : how can i specify absolute file path? though it worked in my case.
      2. is it possible to supress the headers? i redirected the output and there were trailing spaces along with headers which i want to get rid off

    • #82513

      JohnM
      Participant

      It should work for 1 or more lines. What I have hard coded was just the template to apply to your text file. Please see the results I posted in a separate post.

    • #82516

      AloneOnTheEdge
      Participant

      Hi John,
      probably you replied before reading my post.your script worked and it is what i exactly wanted. Thank you so much again. however, i have 2 more questions

      1. gc .\x.txt : how can i specify absolute file path? though it worked in my case.
      2. is it possible to supress the headers? i redirected the output and there were trailing spaces along with headers which i want to get rid off

    • #82517

      JohnM
      Participant

      Just specify the path you want after the GC (get-content) command Get-Content C:\somedir\someotherdir\somefile.txt

      use format table with hide headers option.

      gc .\x.TXT | cfs -TemplateContent $template | ft First, Date, Third -HideTableHeaders

      But you probably don't want to use Format-Table if you wanted to do something with these items. Probably want to pipe to some sort of output file (out-file, export-csv, or do something with the individual items)

      @'# this hides the headers '@
      gc .\x.TXT | cfs -TemplateContent $template | ft First, Date, Third -HideTableHeaders
      
      @'# do something with a value -- replace Write-Output with your code/command '@
      gc .\x.TXT | cfs -TemplateContent $template | % {$_.First | Write-Output }
      
      @'# Create a CSV file to import somewhere else '@
      gc .\x.TXT | cfs -TemplateContent $template | export-csv -Path .\x.csv
      
    • #82520

      AloneOnTheEdge
      Participant

      Thank you again John. however when i redirect the output using "> c:\results.txt" first line is empty and at the end of the file 3 lines are empty. is it possible to get rid of those blank lines? I mean can we trim those lines and have only required output? I willbe using this output to insert into an oracle table and it will be easy when there are no trailing lines and spaces

    • #82528

      JohnM
      Participant

      Don't be afraid of using get-help in PowerShell. If I were importing to any database I would use a CSV file.

      gc .\x.TXT | cfs -TemplateContent $template | export-csv -Path .\x.csv

      Not sure about the whitespace issues....while you can redirect with > PowerShell has a bunch of different ways to write to a file. Maybe use Out-File ? But my first recommendation is to use Export-CSV as shown above.

    • #82720

      AloneOnTheEdge
      Participant

      Thanks John. when i use ".csv" prefixed zeroes will be truncated and only non numeric values are retained. for ex,000052520 will be 52520. However I am good with the output. Thank you so much for your time and helping me out. I spent a lot of time on this but in vain. you were my saviour.

You must be logged in to reply to this topic.