Delete all texts before a particular xml tag in Powershell

This topic contains 18 replies, has 3 voices, and was last updated by  camelCreed 6 days, 3 hours ago.

  • Author
    Posts
  • #84104

    Rahul
    Participant

    Hi All

    I have a requirement wherein I want to remove all the Texts prior to a particular xml tag. Also, I dont want any junk characters to be in once this conversion happens.

    Sample xml file sample.xml

    As depicted in my sample.xml file, I want to create a new xml in a different path using sample.xml file, where I want to delete all the texts prior to tag . so my target xml would be as below:

    or, in other words, I want my target xml file to have everything between the tags and

  • #84107

    Rahul
    Participant

    Sample xml file sample.xml

     
    
    
    
    
    
    
    
    
    
    
    As depicted in my sample.xml file, I want to create a new xml in a different path using sample.xml file, where I want to delete all the texts prior to tag . so my target xml would be as below:
    
    
    
    
    
    
    
    or, in other words, I want my target xml file to have everything between the tags  and  
    
    
    
    
    
  • #84109

    Rahul
    Participant

    instead of my file starting with line 1 and line2. I want that the powershell should trim off the line 1 and line 2 so that I am just left with line 3 in all my xml files

    line 1 –
    line 2-
    line 3-

  • #84122

    Olaf Soyk
    Participant

    Did you try

    Get-Content -Path 'Your Sample XML File' | Select-Object -Skip 2

    ?

    • #84124

      Rahul
      Participant

      Hi Oalf

      Thanks for your reply!

      This will not work since the XML files are not always formatted. So,we can have this spanned in two different lines.
      At times, we have this in the same line. So skip – 2 will not work.

      I am looking forward to a program, which can trim off all the texts from an xml file before a particular keyword, and, write to a new file in a separate directory.

      Thanks
      Rahul Kumar

    • #84136

      camelCreed
      Participant

      Hey,

      Here is one way. Probably not the most elegant, and it does not export the new file at the end, but I imagine you can sort that part out.

      #grab your file
      $file = Get-Content -Path C:\MyScripts\myFile.txt
      
      #put the array on one line
      $oneLine = $file -join ''
      
      #index your keyword here
      $keyword = $oneLine.IndexOf("line 3")
      
      #grab both sides - before and after your keyword
      $beforeKeyword = $oneLine.Substring(0,$keyword)
      
      #here is the string you want. export it somehow.
      $afterKeyword = $oneLine.Substring($keyword)

      If you see this, Olaf, please show me the method you mentioned.

      Thank you

      *EDIT*

      I just found this Where() method, and it is awesome. You can use 'SkipUntil', if you have a keyword to use. Like this...

      #grab your file
      $file = Get-Content -Path C:\MyScripts\myFile.txt
      
      #set your keyord
      $keyword = "my keyword"
      
      #use the Where() method with a scriptblock to match on $keyword, and skip everything in the collection until keyword is found
      $keepAfterKeyword = $file.Where({$_ -match $keyword}, 'SkipUntil')
      

      Pretty awesome.

    • #84179

      Rahul
      Participant

      Hi

      Thanks for your reply!

      I think, I am unable to post codes here. I have xml files where I have to remove everything prior
      to a particular tag. The way the xml files are created have no specific order of placement of that
      DTD tag. It can be in line 1 or line2 or line2. So we cannot always rely on line numbers. If we
      can remove everthing in that file prior to that specific tag and write the contents into a new file,
      then that should be okay.

      Thanks

    • #84182

      camelCreed
      Participant

      I am not referencing any line numbers. I am using Get-Content and looking for a keyword. That is what you are asking to do.

      Post the code you are using. Thanks

  • #84130

    Olaf Soyk
    Participant

    If you know this particular key word you just have to search for it and delete everything in front of it. What's the actual problem?

    • #84140

      Rahul
      Participant

      Thanks, Let me try this.

    • #84142

      camelCreed
      Participant

      Hi Olaf,

      I imagine the problem to be that Rahul does not know how to do what you are suggesting. Please have a look at my methods above and share yours. We can all learn somehing :).

      Thank you

    • #84176

      Rahul
      Participant

      Skip until is not working.

      error below:

      Method invocation failed because [System.Object[]] doesn't contain a method named 'Where'.
      At line:6 char:33
      + $keepAfterKeyword = $file1.Where <<<< ({$_ -match $keyword}, 'SkipUntil') + CategoryInfo : InvalidOperation: (Where:String) [], RuntimeException + FullyQualifiedErrorId : MethodNotFound

    • #84178

      camelCreed
      Participant

      Where is your code?

    • #84185

      Rahul
      Participant
      
      
      $file_temp = "C:\DTD_R2_RAW"
      $xml_in = "C:\DTD_R2_REM"
      $file_archive="C:\D2_RAW_ARCHIVE"
      
      $xml_files = Get-ChildItem $file_temp *.XML 
      
      if($xml_files)
      {
      foreach ($file in $xml_files){
      $file1 = Get-Content -Path $file
      $keyword = ""
      $keepAfterKeyword = $file1.Where({$_ -match $keyword}, 'SkipUntil')
      cat $keepAfterKeyword | sc $xml_in\$file
      }
      }
      
      
    • #84190

      camelCreed
      Participant

      There are some strange things going on with your code. Look at this example and try to make it work for you. This works great for me. First create a directory to hold all of your output files. Then look at this...

      #set your directory
      $file_temp = "C:\DTD_R2_RAW"
      
      #grab your files
      $xml_files = Get-ChildItem $file_temp *.XML -Recurse
      
      #designate your keyword
      $keyword = "my keyword"
      
      #create your new 'keep' folder
      New-Item -ItemType Directory C:\DTD_R2_RAW\Keep
      
      #if there are files, do something...
      if ($xml_files) {
      
          #for each file, skip all characters until your find the keyword, then output everything from that point
          ForEach ($x in $xml_files) {
      
              $file = Get-Content -Path ($file_temp + '\' + $x.Name)
              
              $keep = $file.Where({$_ -match $keyword}, 'SkipUntil') | Out-File C:\DTD_R2_RAW\keep\$($x.name)  
      
          }
      }
    • #84214

      Rahul
      Participant

      thanks for extending your help, much appreciated!

      My Powershell version is version 2 and apparently the where method is not present there.
      Any workaround please

    • #84241

      camelCreed
      Participant

      Try using | Where-Object instead. Or update Powershell. You should be on version 5. Maybe later if it is available.

      You could also try the more long-winded approach I gave first...

      #grab your file
      $file = Get-Content -Path C:\MyScripts\myFile.txt
      
      #put the array on one line
      $oneLine = $file -join ''
      
      #index your keyword here
      $keyword = $oneLine.IndexOf("line 3")
      
      #grab both sides - before and after your keyword
      $beforeKeyword = $oneLine.Substring(0,$keyword)
      
      #here is the string you want. export it somehow.
      $afterKeyword = $oneLine.Substring($keyword)

      But if I were you, I would update your Windows Management Framework.

    • #84215

      Rahul
      Participant

      HI,

      thanks for your reply, Appreciate your help.

      getting below error, apparently where clause will not work with my version of Powershell.

      Mode LastWriteTime Length Name
      —- ————- —— —-
      d—- 14-Nov-17 10:00 AM Keep
      Method invocation failed because [System.Object[]] doesn't contain a method named 'Where'.
      At C:\MYLAN\PROJECT\ARGUS_UPGRADE\BFC\EMA_Rule_Increased_Files\CODE\PROCESSING\Camel.ps1:33 char:28
      + $keep = $file.Where <<<< ({$_ -match $keyword}, 'SkipUntil') | Out-File Thanks

    • #84187

      Rahul
      Participant

      Hi

      thanks for your inputs, one more comment worth mentioning. Iam trying to load xml files
      into oracle via sql ldr. Not sure, why most if the files error out with this error.
      Apparently junk characters. Any way to deal with this so that this is taken care of

      Record 4: Rejected – Error on table TABLE_XML, column XMLDATA.
      ORA-31011: XML parsing failed
      ORA-19202: Error occurred in XML processing
      LPX-00210: expected '<' instead of '¿' Error at line 1 ORA-06512: at "SYS.XMLTYPE", line 5

You must be logged in to reply to this topic.