Script to Find Replace Multiple strings in multiple text files using Powershell

This topic contains 8 replies, has 3 voices, and was last updated by Profile photo of Sriram Prabhakar Sriram Prabhakar 2 years, 3 months ago.

  • Author
    Posts
  • #18164
    Profile photo of Sriram Prabhakar
    Sriram Prabhakar
    Participant

    I am new to scripting, and Powershell. I have been doing some study lately and trying to build a script to find/replace text in a bunch of text files (150 to 200 text files, with each text file having code, not more than 10000 lines. Sample DOG0001.g attached). However, I would like to keep the FindString and ReplaceString as variables, for there are multiple values, which can in turn be read from a separate csv file.

    I have come up with this code, which is functional, but I would like to know if this is the optimal solution for the requirement. I would also like to keep the FindString and ReplaceString as regular expression compatible in the script, as I would also like to Find/Replace patterns.

    Sample contents of Input.csv (Number of objects in csv may vary from 50 to 1500) and sample text file in which text needs to be replaced, are attached.

    [b]The Code[/b]

        $Iteration = 0
        $FDPATH = 'D:\opt\HMI\Gfilefind_rep'
        #& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
        $GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
        $FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
        foreach($Graphic in $Graphicslist){
            Write-Host "Processing Find Replace on : $Graphic"
            foreach($item in $FindReplaceList){
            Get-Content $Graphic | ForEach-Object { $_ -replace "$($item.FindString)", "$($item.ReplaceString)" } | Set-Content ($Graphic+".tmp")
                Remove-Item $Graphic
                Rename-Item ($Graphic+".tmp") $Graphic
                $Iteration = $Iteration +1
                Write-Host "String Replace Completed for $($item.ReplaceString)"
            }
        }
    

    I have gone through posts here and in other forums such as Stackoverflow, and gathered valuable inputs, based on which the code was built.

    To summarize,

    [ol][b]I would like to know if the above code can be optimized for execution, since I feel it takes a long time to execute.[/b]
    I would like to add the number of Iterations being carried out in the loop. I was able to add the current Iteration number onto the console, but couldn't figure how to pipe the output of
    Measure-Command onto a variable, which could be used in Write-Host Command.
    I would like to know, if it is possible to know the number of replacements made, at the end of execution.
    I would also like to display the time taken for code execution, on completion.[/ol]

    Thanks for the time taken to read this Query. Much appreciate your support!

  • #18172
    Profile photo of Don Jones
    Don Jones
    Keymaster

    Write-Host is rarely optimal :).

    Something about the file names you used prevented the attachments from working, sorry.

    You can possibly do everything you want, although counting the number of replacements will make this a lot more complex. It's usually easier to just tackle one problem at a time... Is there one thing you'd like to start with?

  • #18176
    Profile photo of Sriram Prabhakar
    Sriram Prabhakar
    Participant

    Thanks for reply, Mr. Jones.

    From what you said, I deduce, I would need to drop the Write-Host from my loop. Noted. But with such a large volume of execution, how do you suggest I monitor the progress?

    [b]My primary objective is to optimize the loop for efficient (fast and functional) execution[/b]. The others, like adding time taken, Iterations completed and replacements made, will be good to have. But they are secondary in nature.

    Hope I am making my requirement clear. Thanks again for taking time and helping me out with my Query.

  • #18177
    Profile photo of Don Jones
    Don Jones
    Keymaster

    Well, for a huge file, there's not a ton of optimization you can really do. A ForEach loop is often faster than a ForEach-Object statement, but will require more memory as the entire input has to be in RAM. In terms of just the find and replace loop, it's probably memory-optimized already.

    How large is the input file?

  • #18179
    Profile photo of Sriram Prabhakar
    Sriram Prabhakar
    Participant

    Typically, the code in each text file ranges between 2500 to 10000 Lines (Sample attached in the original post). I will be handling 100 to 150 text files at a time on an average. During freak instances it might go up to 300 or more.

    The Input.csv file which contains the find / replace strings will be having anywhere between 50 to 1500 instances. (Sample attached in the original post).

    Glad to hear from you that the loop is memory optimized already!! That's a relief.

  • #18194
    Profile photo of Don Jones
    Don Jones
    Keymaster

    That's a good-sized file. I don't expect you're going to be able to get it to run much faster without getting extremely complex. Although if anyone else has suggestions, I'm sure they'll jump in.

  • #18198
    Profile photo of Sriram Prabhakar
    Sriram Prabhakar
    Participant

    Thanks for the insight Mr. Jones! Much appreciate your support.

    As suggested, I'll remove the Write-Host in the nested Loop, and try playing with the Write-Progress cmdlet. [b]I hope Write-Progress does not bring down the efficacy of a loop like what a Write-Host does.[/b]

    In the meantime, I'll be sure to keep checking this space, if there is any better way to go about it.

  • #18280
    Profile photo of Tim Pringle
    Tim Pringle
    Participant

    Hey Sriram,

    There's quite a bit of disk activity going on that can slow things down, so this can be cut down a bit.

    I did some testing and even with a dummy file with 10000 rows each of 500 characters, PowerShell is able to read a text file fully into a string variable. Using Out-String allows us to cast this variable as a string, whilst maintaining it's format, and removing the need for the additional loop with get-content–>set-content action for every search and replace.

    I took your maximum parameters (number of files and lines in the csv and graphic files) created and dummy files from the ones you had provided. After some alterations, the revised script on my T440s is able to process your maximums in just over 8 mins, including the use of a progress meter. Of course timescales for processing this can vary based on content and number of read/write operations in memory.

    Hope this helps.

    
    
    $fdPath = 'c:\data\test'
    $graphicsFiles = Get-ChildItem -Path $FDPATH\*.txt | ForEach-Object FullName
    $findReplaceList = Import-Csv -Path $FDPATH\Input.csv
    
    
    $totalitems = $graphicsFiles.count
    $currentrow = 0
    foreach ($graphicFile in $graphicsFiles)
    {
        $currentrow += 1
        Write-Progress -Activity "Processing record $currentrow of $totalitems" -Status "Progress:" -PercentComplete (($currentrow / $totalitems) * 100)
        [string] $txtGraphicFile = Get-Content $graphicFile | Out-String
        
        ForEach ($findReplaceItem in $findReplaceList)
        {
            $txtGraphicFile = $txtGraphicFile -replace "$($findReplaceitem.FindString)", "$($findReplaceitem.ReplaceString)"
        }
        
        $txtGraphicFile | Set-Content ($graphicFile)
    }
    
    
    
  • #18378
    Profile photo of Sriram Prabhakar
    Sriram Prabhakar
    Participant

    This was precisely the solution I was looking for Mr. Pringle!

    Nothing more, nothing less. Works perfectly fine.

    There is a minor glitch though, at the end of every replaced file, I find an extra newline {\r\n} being added. (I use the compare plugin in Notepad++ to compare the two text files). This is an inconvenience, and I need to go about deleting this extra added line in all the files.

    The code works absolutely perfectly, and performance is much faster than I had anticipated. But I am not able to understand, why this newline is being added at the end of file. Kindly help comprehend.

    Thanks again for the support Mr. Pringle, you made my day!!!

You must be logged in to reply to this topic.