Learning to parse data. Question.

This topic contains 10 replies, has 3 voices, and was last updated by  Gorstag 10 months, 3 weeks ago.

  • Author
    Posts
  • #59257

    Gorstag
    Participant

    Hello,

    I am attempting to parse data from a flat log file, Pull out the relevant information, and turn the remaining relevant information into something useful. I am pretty new to this and have no programming background.

    The following is my basic scenario. I have a flat file with lines that look similar to the following:

    A|B|C|D|E
    A|C|D|E
    A|B|C|D|E
    A|B|C|D|E
    A|C|D|E

    I only want to pull objects from the file that contain the full A|B|C|D|E. So I wrote up a regex that searching for the pipe,B,pipe and assigned it to a variable. I then perform a get-content against the file with a select-string using the regex to filter for the objects I want. So far good to go. Now I run into problems.

    I have this single string of data that is a matchinfo. I try to export it to CSV thinking then I can re-import the results as an array adding headers and the CSV output is garbage. I then switch it into a string.. more garbage results. Tried out-file also.. different garbage results.

    Right now I am trying to figure out how to output the filtered content to a "clean" csv (Or even maintain it in memory and delimit the content by | 1-1 to an array with "tables".

    I don't want anyone to write me up a full script of this, (This is purposely learn-as-i-go project) but I am completely stuck after tinkering with this for about half a waking day.

  • #59259

    Olaf Soyk
    Participant

    So – show your code and we will be pleased to help you.

  • #59263

    Gorstag
    Participant

    Okay, Here is how far I have gotten.

    $importlogfiles = ".\*.log"
    
    $ErrorActionPreference= 'silentlycontinue'
    
    #Regex to locate static value on lines that contain value.
    $reg1 = "\|\bB\b\|"
    
    #Grabs all content from all files
    $content = Get-Content -path $importlogfiles
    
    #Processing content to obtain matched lines.
    $regmatch = Select-String -InputObject $content -Pattern $reg1 -AllMatches
    
    #Incomplete below.  Idea here would perhaps be to form some sort of a loop to feed the values into each column once I figure out how to manipulate the data from above in a way that makes sense to me.
    
    #$Col1 = "Col1";
    #$Col2 = "Col2";
    #$Col3 = "Col3";
    #$Col4 = "Col4";
    #$Col5 = "Col5";
    
    #$test = New-Object PSObject -Property @{ FirstValue = $Col1; SecondValue = $Col2; ThirdValue = $Col3; FourthValue = $Col4; FifthValue = $Col5 }
    #Export-Csv -InputObject $test -Path .\test.csv -NoTypeInformation
    

    The idea was described above.

  • #59268

    Ron
    Participant

    $content | where-object {$_ -match $reg1} | export-CSV filepath -notype

    That should give you the subset of lines that match the regex and save it.

    Assuming that data can now be parsed by Import-CSV, you can import it and create headers with the -header parameter.

    • #59271

      Gorstag
      Participant

      Thanks, but now you are running into the stuff that is confusing the hell out of me.

      Doing that outputs a file that has headers and matching non useful values. And none of the original content.
      PSPath, PSParentPath, PSChildName, PSDrive, PSProvider, ReadCOunt, Length

      If i turn it into a string i get Header:
      Length
      int value (89 in this case)

  • #59274

    Gorstag
    Participant

    Okay, made progress

    
    $t1 = Get-Content -path $importlogfiles | Where-Object {$_ -match $reg1} 
    $csvcontent = $t1 | ConvertFrom-Csv -Delimiter '|'
    $csvcontent | Export-Csv -NoTypeInformation -Path .\test.csv
    
    

    This gives me a CSV with the expected amount of rows with the expected content.

    • #59277

      Ron
      Participant

      Yours should work except your input does not have a header (I assume), so convertfrom-csv eats the first line and uses it as a header.

      $csvcontent = $t1 | ConvertFrom-Csv -Delimiter '|' -header c1,c2,c3,c4,c5
      
  • #59275

    Ron
    Participant

    Try this and adapt as needed.

    $l = 'A|B|C|D|E','A|C|D|E','A|B|C|D|E','A|B|C|D|E','A|C|D|E'
    $l|where-object {$_ -match '\|\bB\b\|'}|out-file u:\ron\testc5.txt
    import-csv u:\ron\testc5.txt -header c1,c2,c3,c4,c5 -Delimiter "|"|ft -auto
    
    • #59280

      Gorstag
      Participant

      Thanks, Adding the -header to the csvcontent removes some steps and has same results.

      So this leaves me with one other question (obviously this example was super simplified). But in the real content There are multiple blocks of the same data types but at variable lengths with "some" of the columns matching. My idea was to pull each of the groupings out into individual CSV's with the correct column names describing the data. I then was hoping to import each of the CSV into an array that has all of the data from the original file into the correct columns so then I could create "select" type statements to parse the data into useful returns. Like how many of X error when accessing Y file. How many access attempts to X path from Y IP address etc.

      So the question is: If I make lets say 2 CSV's (I will test myself if the answer is yes). That look something like this:

      CSV1 columns:
      A,B,C,D,E,F,G,I,J,M,N

      CSV2 columns:
      A,D,E,G,J,M,P

      Can they be imported into the same array and the "null" values for the columns that don't exist all line up properly? LIke:

      A,B,C,D,E,F,G, ,I,J, , ,M,N, , ,
      A, , ,D,E, ,G, , ,J, , ,M, , ,P,
      
  • #59281

    Ron
    Participant

    You can merge them, but you have to create the combined object first.

    $a=''|select a,c
    $a.a = 1
    $a.c = 3
    
    $b=''|select b,d
    $b.b = 2
    $b.d = 4
    
    # Doesn't work
    $c=@()
    $c+=$a
    $c+=$b
    $c|ft -auto
    
    # Must create all properties first
    $c=@()
    $c+=$a|select a,b,c,d
    $c+=$b
    $c|ft -auto
    
    • #59283

      Gorstag
      Participant

      Thanks a bunch. Will get to tinkering 🙂

You must be logged in to reply to this topic.