Make powershell script run faster

This topic contains 7 replies, has 4 voices, and was last updated by  Hil 3 weeks, 1 day ago.

  • Author
    Posts
  • #91868

    Hil
    Participant

    I need to evaluate the lines in a file. Each line is made up of loan data like price, credit score, etc etc . I need to filter on these values and if selected write the line to an output file.

    The problem is the file has 400,000 + lines.
    Each line has about 20+ values to filter on,
    And I cannot take more than 60 seconds for this part of the script.
    Any ideas how i can make the code faster?

    Here is the code:

    $File_NotesDownload = "C:\temp\NotesDownload.csv"
    $File_NotesGood = "C:\temp\NotesGood.txt"
    if (Test-Path $File_NotesGood) { remove-item $File_NotesGood }
    $Start = Get-Date
    
    $Notes = Get-Content $File_NotesDownload
    foreach($Note in $Notes) 
       {
        $Note = $Note -replace '"',''
        $Loan_Id = ($Note -split ',')[0].trim()
        $Loan_Id = $Loan_Id -as [int]
        if (($Loan_Id -is [int])  -eq $false) {Continue}
    
        $Loan_Status     = ($Note -split ',')[5].trim()
        $Ask_price       = ($Note -split ',')[6].trim()
        $Markup_Discount = ($Note -split ',')[7].trim()
        $YTM             = ($Note -split ',')[8].trim()
        $DaysSinceLastPayment  = ($Note -split ',')[9].trim()
        $CreditScoreTrend  = ($Note -split ',')[10].trim()
        $FicoEndRange      = ($Note -split ',')[11].trim()
        $NeverLate         = ($Note -split ',')[13].trim()
        $Loan_Class        = ($Note -split ',')[14].trim()
        $Loan_Maturity     = ($Note -split ',')[15].trim()
        $Interest_Rate     = ($Note -split ',')[17].trim()
        $RemainingPayments = ($Note -split ',')[18].trim()
    
        if ($Loan_Maturity -ne 36) {Continue}
        if (($Interest_Rate -lt 4) -Or ($Interest_Rate -gt 25)) {Continue}
        if ($Ask_price -gt 20) {Continue}
        if ($Markup_Discount -gt -0.01) {Continue}
        if ($YTM -lt 10) {Continue}
        if ($CredScoretrend -ne "UP") {Continue}
    
        $NotesList += $Note + "`r`n"
        }
    $NotesList | out-file $File_NotesGood
    
    $End = Get-Date
    $TotalTime = New-Timespan -Start $Start -End $End
    $TotalTimeSecs = $TotalTime.seconds
    $TotalTimeSecs
    
  • #91871

    postanote
    Participant

    Notes on this topic —

    There are things (commands, functions, API's, modules, constructs, etc.) in any language that are just slow, and then the are things you introduce into your code which make it slow.

    The former, means, things like native cmdlet/functions/operations. Loops are just slow by design, especially if you are reading in large data. Some cmdlets are just slow by design, for example the Web cmdlets.
    Improvements in the core language occur over time. Take a look at PowerShell Core (PowerShell v6) as an example. PSCore is not a replacement for PowerShell. There are lots of things missing to make it cross platform capable.

    The later is allow about you and what you are doing. Large files are going to be slow, period. Loops on large files are going to be slow.
    Get-Content loads the entire file into memory (once loaded, then it acts on it) and if you don't have enough, you will run into issues, performance and failures.

    If you have large files, don't use Get-Content, use the .Net libraries (StreamReader) instead. There are lots of articles on this topic on the web about how to use the .Net libraries and or Get content to read large files in chunks in order to not tax memory and improve performance.

    So, by example...

    The Get-Content cmdlet does not perform as well as a StreamReader when dealing with very large files. You can read a file line by line or in a batch using Get-Content or StreamReader. The .Net StreamReader is still faster.

    System.IO.File.ReadLines()
    It returns all the lines of a file, but lets you begin iterating over the lines immediately which means it does not have to store the entire contents in memory.

    https://msdn.microsoft.com/en-us/library/dd383503.aspx

    $path = 'C:\A-Very-Large-File.txt'
    $r = [IO.File]::OpenText($path)
    while ($r.Peek() -ge 0) {
    $line = $r.ReadLine()
    # Process $line here...
    }
    $r.Dispose()

    # Some performance comparisons:

    Measure-Command {Get-Content .\512MB.txt > $null}

    Total Seconds: 49.4742533

    Measure-Command {
    $r = [IO.File]::OpenText('512MB.txt')
    while ($r.Peek() -ge 0) {
    $r.ReadLine() > $null
    }
    $r.Dispose()
    }

    Total Seconds: 27.666803

    Using Get-Content leveraging batching

    $path = "E:\Documents\Projects\ESPS\Dev\DataFiles\DimProductionOrderOperation"
    $infile = "14SEP11_ProdOrderOperations.txt"
    $outfile = "PROCESSED_14SEP11_ProdOrderOperations.txt"

    $batch = 1000

    Here is one of the better articles on the topic.
    http://www.happysysadm.com/2014/10/reading-large-text-files-with-powershell.html

    • #91889

      Hil
      Participant

      Thanks Postanote,
      Your script worked great. Originally I believe I received about several minutes to maybe an hour.
      With your script I managed to get it down to 9- 12 seconds !!!!

    • #91904

      postanote
      Participant

      No worries, glad it helped. I have to deal with large log files daily, so I get your pain on this one. Especially when trying to get results to folks on such large files who have low to no patience.

    • #92323

      Hil
      Participant

      I have another problem with the speed.
      As I add more lines to the $NotesList the script takes longer and longer. I believe .NET is duplicating the entire array in memory, adding the new item, and deleting the old copy in memory https://blogs.technet.microsoft.com/ashleymcglone/2017/07/12/slow-code-top-5-ways-to-make-your-powershell-scripts-run-faster/. Is there a way around it?

      $File_Path = "C:\Temp\"
      $File_NotesDownload = $File_Path  + "SecondaryMarketAllNotes.csv"
      $File_NotesEval     = $File_Path  + "SecNotesEval.txt"
      
      $r = [IO.File]::OpenText($File_NotesDownload)
      while ($r.Peek() -ge 0) 
          {
           $Note = $r.ReadLine()
          $NotesList += $Note
          }
      $NotesList | out-file $File_NotesEval
      $r.Dispose()
      
  • #91886

    Matt Bloomfield
    Participant

    Have a read of some Chrissy LeMaire articles; she has blogged on techniques for increasing performance when working with CSV files.

    Specifically, I think the Streaming a 32-bit OleDbconnection section in the second article, which describes performing a SELECT on CSV file using a WHERE clause, will be of use to you. You'll just need to modify it to write to a file instead of a SQL database.

  • #91888

    Sam Boutros
    Participant

    Please post a dozen or so lines as representative sample of what a line in this file looks like. Make sure to remove/replace any PII

  • #91892

    Hil
    Participant

    Thanks Matt. Great tip

You must be logged in to reply to this topic.