Powershell – Delete line from text file if it contains certain string

Welcome Forums General PowerShell Q&A Powershell – Delete line from text file if it contains certain string

  • This topic has 18 replies, 4 voices, and was last updated 1 month ago by
    Participant
    .
Viewing 4 reply threads
  • Author
    Posts
    • #278826
      Participant
      Topics: 4
      Replies: 17
      Points: 52
      Rank: Member

      Hello everyone,

      I have script below which will delete a line that contains a certain string which works fine but it is extremely slow since file2 is big (over 50MB). To increase performance, how do I modify it to do:

      1. Delete only one line on first match (don’t know if this will improve performance)

      2. file2 get’s saved on every run which may cause performance issue?

      Other ideas to improve performance will be greatly appreciated. Thank you.

       

      foreach ($string in (Get-Content c:\strings.txt))

      {

      (Get-Content ‘c:\file2.csv’) -notmatch $string | Set-Content ‘c:\file2.csv’

      }

    • #278871
      Participant
      Topics: 4
      Replies: 102
      Points: 463
      Helping Hand
      Rank: Contributor

      You are on the right track with not writing the changes to the file in the loop. Instead save the changes to a variable and set it at the end. Also, save your file content to a variable so you dont have to keep opening it each iteration of the loop. Also, using .NET speeds things up exponentially over using the Cmdlets. Not noticeable for small tasks, but at the files get larger it makes a big difference!

      • #279057
        Participant
        Topics: 4
        Replies: 17
        Points: 52
        Rank: Member

        Thank you  but the script only delete the match string.  I would like for remove the entire line when a partial string is matched.

        Eg.  string = abc

        line to be deleted = abc.def

        Thank you for any further assistance.

         

         

         

        • This reply was modified 1 month, 1 week ago by Andy.
      • #279153
        Participant
        Topics: 4
        Replies: 102
        Points: 463
        Helping Hand
        Rank: Contributor

        I am glad you found my suggestion helpful. I apologize for my misunderstanding your objective. If you want to remove the whole line rather than just the match then I think select-string is your best bet.

      • #279249
        Participant
        Topics: 4
        Replies: 17
        Points: 52
        Rank: Member

        Hi Logan, switching to $Content = $Content | Select-String -Pattern $string -NotMatch” works but it is now back to slow :).  Any other suggestions?  Thanks.

    • #278895
      Participant
      Topics: 4
      Replies: 424
      Points: 744
      Helping Hand
      Rank: Major Contributor

      You could try Select-String which is usually faster than Get-Content:

      • #279951
        Participant
        Topics: 4
        Replies: 102
        Points: 463
        Helping Hand
        Rank: Contributor

        OK, so how about we create a string to use for “-replace” rather than returning all the strings that don’t match? I did some simple testing and this seems to be faster than using the “-NotMatch” parameter of Select-String. Does this work any faster in your use case?

      • #279987
        Participant
        Topics: 4
        Replies: 17
        Points: 52
        Rank: Member

        Hello Logan, seems a little faster but still very slow.  1.5MB strings.txt file and 44MB file2.csv file takes over 2 hours to complete.  I have files that are much larger than that so it will take days :(.  If all else fails then I will use SQL scripting as a workaround.  Thank you for your help thus far.

        • This reply was modified 1 month ago by Andy.
      • #279996
        Participant
        Topics: 4
        Replies: 424
        Points: 744
        Helping Hand
        Rank: Major Contributor

        Just wondering if you’ve tested the code I posted? Get-Content could be the bottleneck and in both of Logan’s examples, Get-Content is being used. You don’t need to use Get-Content with Select-String and not using it can give a performance increase?

        Just a quick example with a 2.5MB file:

      • #280020
        Participant
        Topics: 4
        Replies: 17
        Points: 52
        Rank: Member

        Hello Matt, yes I tested your script also and it appears to be a little faster but still too slow.  Maybe it has something to do with constantly reading and writing to the file.  Perhaps I will use SQL as it is better for managing larger data.   Thank you.

      • #280110
        Participant
        Topics: 4
        Replies: 424
        Points: 744
        Helping Hand
        Rank: Major Contributor

        There is an inbetween method that might work for you that doesn’t involve SQL (as a server) but does use OleDB that might be worth investigating as you’re using CSV files.

        Chrissy LeMaire (founder of dbatools) has some good articles on handling large CSV files.

        This script is probably good a starting point for what you’re trying to do (in theory, it’s just a case of modifying the query on line 43 🙂 )

        https://blog.netnerds.net/2015/01/quickly-find-duplicates-from-csv-using-powershell/

        I just ran a test using the 2 million sales records file here:

        Downloads 18 – Sample CSV Files / Data Sets for Testing (till 5 Million Records) – Sales

        which is 2 million rows and 238 MB and it took just under 5 minutes to find the >385000 duplicate rows.

      • #280308
        Participant
        Topics: 4
        Replies: 17
        Points: 52
        Rank: Member

        Looks complicated but promising.  I will check it out.  Thanks.

      • #280353
        Participant
        Topics: 4
        Replies: 424
        Points: 744
        Helping Hand
        Rank: Major Contributor

        It’s not too bad.  Here’s how I modified it to get it working for your use case.

        What I did was build an array using the strings I want to filter against.  Admittedly, this is a much smaller list than yours and I’m not sure how well it will scale, but you could split it into multiple queries.

        Relevant lines 42 & 43:

        $filters = Get-Content E:\temp\files\countries.txt
        $q = $filters -join “‘, ‘”
        This makes an array of strings to use in the query that looks like this
        country1, ‘country2’, ‘country3’, ‘country4 – the outer quotes, which are missing, are added in the query itself.

        I then ran a select on all rows where $q does not appear in the countries column (this is referenced as F2) – I couldn’t get column names working so I stuck with the default provided by OleDB.

        Relevant line 46:

        $sql = “SELECT * FROM [$tablename] WHERE F2 NOT IN (‘$q’) “
        Once finished, the filtered data is in the datatable referenced by $dt so you can just export it as normal.
        $dt | Export-CSV E:\Temp\Files\FilteredData.csv -NoTypeInformation
        Now, as I said, my filter list was much smaller than yours, but the data file was 25MB (I made a subset of the example file previously posted).   It took just 12.9 seconds to get 47000 rows (from 200,000) into $dt.
        My modified script is below.  Full credit for this goes to Chrissy LeMaire, my edits are no more than tinkering.
        Note: I can’t get VS Code to use the x86 shell so run this in the ISE or terminal.

         

      • #280533
        Participant
        Topics: 4
        Replies: 17
        Points: 52
        Rank: Member

        Trying now..thanks.

        • This reply was modified 1 month ago by Andy.
      • #280359
        Participant
        Topics: 4
        Replies: 424
        Points: 744
        Helping Hand
        Rank: Major Contributor

        Andy, I just posted example code which should point you in the right direction but it’s been lost to the spam filter.  I have requested it be released so please check back for it.

      • #280017
        Participant
        Topics: 4
        Replies: 102
        Points: 463
        Helping Hand
        Rank: Contributor

        Incorporating Matt’s recommendation to skip loading the file content to memory and passing the path to Select-String directly does appear to be quite a bit faster. Another thing I noticed in one of Matt’s posts is providing the entire collection of strings as the pattern rather than looping through it, which would also reduce the processing time. With his suggestions in mind this looks like its about as fast as I can get it to be. Good call, on skipping loading the content into memory, Matt!

        I added the -SimpleMatch parameter to get past potential hangups with RegEx tokens appearing in the string collection being passed to -Pattern. It’s the same as escaping the string with the RegEx class static method.

      • #280311
        Participant
        Topics: 4
        Replies: 17
        Points: 52
        Rank: Member

        Thanks Logon.  I think this will do even though speed seems to be the same.  I will just let it run overnight :).  Thanks again for your assistance.

    • #278961
      Participant
      Topics: 9
      Replies: 706
      Points: 2,837
      Helping Hand
      Rank: Community Hero

      Also could use a switch statement.

    • #278973
      Participant
      Topics: 4
      Replies: 17
      Points: 52
      Rank: Member

      Thank you everyone for responding.  I ended up using Logon’s suggestion which is much quicker than what I had.

Viewing 4 reply threads
  • You must be logged in to reply to this topic.