Make powershell script run faster

Welcome Forums General PowerShell Q&A Make powershell script run faster

This topic contains 7 replies, has 4 voices, and was last updated by

Hil
 
Participant
9 months, 2 weeks ago.

  • Author
    Posts
  • #91868
    Hil

    Participant
    Points: 1
    Rank: Member

    I need to evaluate the lines in a file. Each line is made up of loan data like price, credit score, etc etc . I need to filter on these values and if selected write the line to an output file.

    The problem is the file has 400,000 + lines.
    Each line has about 20+ values to filter on,
    And I cannot take more than 60 seconds for this part of the script.
    Any ideas how i can make the code faster?

    Here is the code:

    $File_NotesDownload = "C:\temp\NotesDownload.csv"
    $File_NotesGood = "C:\temp\NotesGood.txt"
    if (Test-Path $File_NotesGood) { remove-item $File_NotesGood }
    $Start = Get-Date
    
    $Notes = Get-Content $File_NotesDownload
    foreach($Note in $Notes) 
       {
        $Note = $Note -replace '"',''
        $Loan_Id = ($Note -split ',')[0].trim()
        $Loan_Id = $Loan_Id -as [int]
        if (($Loan_Id -is [int])  -eq $false) {Continue}
    
        $Loan_Status     = ($Note -split ',')[5].trim()
        $Ask_price       = ($Note -split ',')[6].trim()
        $Markup_Discount = ($Note -split ',')[7].trim()
        $YTM             = ($Note -split ',')[8].trim()
        $DaysSinceLastPayment  = ($Note -split ',')[9].trim()
        $CreditScoreTrend  = ($Note -split ',')[10].trim()
        $FicoEndRange      = ($Note -split ',')[11].trim()
        $NeverLate         = ($Note -split ',')[13].trim()
        $Loan_Class        = ($Note -split ',')[14].trim()
        $Loan_Maturity     = ($Note -split ',')[15].trim()
        $Interest_Rate     = ($Note -split ',')[17].trim()
        $RemainingPayments = ($Note -split ',')[18].trim()
    
        if ($Loan_Maturity -ne 36) {Continue}
        if (($Interest_Rate -lt 4) -Or ($Interest_Rate -gt 25)) {Continue}
        if ($Ask_price -gt 20) {Continue}
        if ($Markup_Discount -gt -0.01) {Continue}
        if ($YTM -lt 10) {Continue}
        if ($CredScoretrend -ne "UP") {Continue}
    
        $NotesList += $Note + "`r`n"
        }
    $NotesList | out-file $File_NotesGood
    
    $End = Get-Date
    $TotalTime = New-Timespan -Start $Start -End $End
    $TotalTimeSecs = $TotalTime.seconds
    $TotalTimeSecs
    
  • #91871

    Participant
    Points: 206
    Helping Hand
    Rank: Participant

    Notes on this topic —

    There are things (commands, functions, API's, modules, constructs, etc.) in any language that are just slow, and then the are things you introduce into your code which make it slow.

    The former, means, things like native cmdlet/functions/operations. Loops are just slow by design, especially if you are reading in large data. Some cmdlets are just slow by design, for example the Web cmdlets.
    Improvements in the core language occur over time. Take a look at PowerShell Core (PowerShell v6) as an example. PSCore is not a replacement for PowerShell. There are lots of things missing to make it cross platform capable.

    The later is allow about you and what you are doing. Large files are going to be slow, period. Loops on large files are going to be slow.
    Get-Content loads the entire file into memory (once loaded, then it acts on it) and if you don't have enough, you will run into issues, performance and failures.

    If you have large files, don't use Get-Content, use the .Net libraries (StreamReader) instead. There are lots of articles on this topic on the web about how to use the .Net libraries and or Get content to read large files in chunks in order to not tax memory and improve performance.

    So, by example...

    The Get-Content cmdlet does not perform as well as a StreamReader when dealing with very large files. You can read a file line by line or in a batch using Get-Content or StreamReader. The .Net StreamReader is still faster.

    System.IO.File.ReadLines()
    It returns all the lines of a file, but lets you begin iterating over the lines immediately which means it does not have to store the entire contents in memory.

    https://msdn.microsoft.com/en-us/library/dd383503.aspx

    $path = 'C:\A-Very-Large-File.txt'
    $r = [IO.File]::OpenText($path)
    while ($r.Peek() -ge 0) {
    $line = $r.ReadLine()
    # Process $line here...
    }
    $r.Dispose()

    # Some performance comparisons:

    Measure-Command {Get-Content .\512MB.txt > $null}

    Total Seconds: 49.4742533

    Measure-Command {
    $r = [IO.File]::OpenText('512MB.txt')
    while ($r.Peek() -ge 0) {
    $r.ReadLine() > $null
    }
    $r.Dispose()
    }

    Total Seconds: 27.666803

    Using Get-Content leveraging batching

    $path = "E:\Documents\Projects\ESPS\Dev\DataFiles\DimProductionOrderOperation"
    $infile = "14SEP11_ProdOrderOperations.txt"
    $outfile = "PROCESSED_14SEP11_ProdOrderOperations.txt"

    $batch = 1000

    Here is one of the better articles on the topic.
    http://www.happysysadm.com/2014/10/reading-large-text-files-with-powershell.html

    • #91889
      Hil

      Participant
      Points: 1
      Rank: Member

      Thanks Postanote,
      Your script worked great. Originally I believe I received about several minutes to maybe an hour.
      With your script I managed to get it down to 9- 12 seconds !!!!

    • #91904

      Participant
      Points: 206
      Helping Hand
      Rank: Participant

      No worries, glad it helped. I have to deal with large log files daily, so I get your pain on this one. Especially when trying to get results to folks on such large files who have low to no patience.

    • #92323
      Hil

      Participant
      Points: 1
      Rank: Member

      I have another problem with the speed.
      As I add more lines to the $NotesList the script takes longer and longer. I believe .NET is duplicating the entire array in memory, adding the new item, and deleting the old copy in memory https://blogs.technet.microsoft.com/ashleymcglone/2017/07/12/slow-code-top-5-ways-to-make-your-powershell-scripts-run-faster/. Is there a way around it?

      $File_Path = "C:\Temp\"
      $File_NotesDownload = $File_Path  + "SecondaryMarketAllNotes.csv"
      $File_NotesEval     = $File_Path  + "SecNotesEval.txt"
      
      $r = [IO.File]::OpenText($File_NotesDownload)
      while ($r.Peek() -ge 0) 
          {
           $Note = $r.ReadLine()
          $NotesList += $Note
          }
      $NotesList | out-file $File_NotesEval
      $r.Dispose()
      
  • #91886

    Participant
    Points: 4
    Rank: Member

    Have a read of some Chrissy LeMaire articles; she has blogged on techniques for increasing performance when working with CSV files.

    Specifically, I think the Streaming a 32-bit OleDbconnection section in the second article, which describes performing a SELECT on CSV file using a WHERE clause, will be of use to you. You'll just need to modify it to write to a file instead of a SQL database.

    Quickly Find Duplicates in Large CSV Files using PowerShell

    High-Performance Techniques for Importing CSV to SQL Server using PowerShell

  • #91888

    Participant
    Points: 59
    Rank: Member

    Please post a dozen or so lines as representative sample of what a line in this file looks like. Make sure to remove/replace any PII

  • #91892
    Hil

    Participant
    Points: 1
    Rank: Member

    Thanks Matt. Great tip

The topic ‘Make powershell script run faster’ is closed to new replies.