PowerShell Performance: The += Operator (and When to Avoid It)

In PowerShell, there are always many different ways to accomplish a given task. Sometimes these different options offer trade-offs in performance and code clarity: faster execution at the expense of higher memory usage (or vice versa), or better performance at the expense of code that isn’t as easy to read. Depending on how much data you need to process, the differences between options may not really matter, and you can pick whatever is most aesthetically pleasing. However, if your script needs to scale well with large data sets, you’ll want to know how to make sure your script isn’t wasting a lot of CPU time or memory. This article, possibly the first in a series, touches on one such performance “gotcha”: using the += operator on strings or arrays.

Every so often, I see a blog post or script posted online containing code that looks something like this:

They may not be both appending to a string and to an array in the same block, but this illustrates both ideas at once. When the loop only executes 10 times (or even 1000 times), the performance of this block of code isn’t so bad. It runs in less than 100 milliseconds on my computer when I change the loop limit to 1,000. When I bump it up to 10,000, though, it takes over 5 seconds to run (and it just gets worse from there: 12.5 seconds for 15000 elements, 26 seconds for 20000 elements, and so on). The increase in execution time is exponential, not linear. In other words, this code does not scale well at all.

The reason for this is that Arrays and Strings cannot be resized and appended to in the .NET Framework. Every one of those += operators caused .NET to have to create a new Array or String, copy the contents of the original over (plus its one new line or array element), and discard the original. As the size of the string or array goes up, each new copy takes longer and longer to complete.

The .NET Framework offers classes to address both of these performance problems. Instead of appending to Strings directly, there is System.Text.StringBuilder. As an alternative to arrays, you can use either System.Collections.ArrayList or System.Collections.Generic.List. In a PowerShell script, the difference usually doesn’t matter; in the next example, I’ll use List. This requires me to specify the type of elements that will be contained in the list, but will perform better than ArrayList in some situations (and since this is a Performance post, I may as well use the best option.)

Here’s how you can test the performance of the original example code, and compare it to the performance of StringBuilder and List:

In this case, the code clarity hasn’t suffered at all, in my opinion. $list.Add and $stringBuilder.Append are both very clear in their meaning, just as easy to read as the += operator.

Notice that I snuck in a difference in scale, there. The “+=” block only had to process 20,000 elements, and the StringBuilder / List block was cranked up to a million. The results?

Using += operators:
TotalMilliseconds : 26024.1599

Using StringBuilder and List:
TotalMilliseconds : 8334.3011

Even though they had to process 50 times more data, the StringBuilder and List classes did the job in less than one third the time.

4 thoughts on “PowerShell Performance: The += Operator (and When to Avoid It)

  1. Pingback: Another take on using the += operator | The Powershell Workbench

  2. Allen Neil

    Thank you for this. I had a script running for over an hour processing strings from a file. Switching to a list reduced run time to under 10 seconds. Great post 🙂

  3. George

    I believe this has some potential to help me out with an issue we just experienced. I have a script that reads in the directory and searches for known ransomware patterns. (robocopy is used to get past the directory/filename length limitations of using gci).

    $RegExPatterns = “\.locky$”,
    “\.cry$”, # CryLocker

    $scanroot = “\\myfileserver\targetfolder”
    $scanfolders = gci “$scanroot” | Where-Object {$_.PSiscontainer -eq 1}
    foreach ($fold in $scanfolders)
    $foundinfolder = robocopy.exe $($fold.fullName) “c:\somefakedir” /e /l /fp /nc /ns /njh /njs /r:0 /w:0 |select-string -pattern $RegExPatterns

    write-host “no pattern match for: $($fold.fullname)” -ForegroundColor Green
    get-date | Write-Host -ForegroundColor Green
    Write-Host “Pattern match for: $($fold.fullname)” -foregroundcolor Red -BackgroundColor white
    Get-Date | Write-Host -ForegroundColor Red -BackgroundColor White
    $foundmatch += $foundinfolder

    this has worked great when there have been handfuls of files, but when the number of files in a directory were in the thousands it would basically locks up.

    is it going to be possible to use list(.add) with output from robocopy?

    Best Regards!

    1. George

      it’s when there are thousand of matches that it locks up, not thousands of files in a directory. Just to clear that up.

Comments are closed.