Workflow work scope question

Welcome Forums General PowerShell Q&A Workflow work scope question

Viewing 3 reply threads
  • Author
    Posts
    • #200750
      Pj
      Participant
      Topics: 10
      Replies: 42
      Points: -1
      Rank: Member

      To preface this I have a working script as is but because it is searching 300,000 to 400,000 lines in 4 separate files and the time to complete operations is not desirable.  What I would like to happen as an end result would be the foreach match operation to simultaneously scan each of the 4 files at once and I’m not sure if this should be termed as parallel operation via workflow or threading via scriptblock and job operations.  I tinkered with both and the workflow sample bellow is the closest I’ve come to what I am trying to achieve except I cannot seem to get the results back for additional work.  Via operation time I can presume the work is being done as its relative to the original operation if i limit it to just one of the four files but it would seem I need some direction to move forward on this.`

      
      $SawComp = Get-Content “C:\Log\Complete.log”
      $SawComp1 = Get-Content “C:\Log\Complete.log.1”
      $SawComp2 = Get-Content “C:\Log\Complete.log.2”
      $SawComp3 = Get-Content “C:\Log\Complete.log.3”
      $PrinterReport = “C:\WIP\PrinterReport.htm”
      $PrintErrors = @()
      $SawCombinedLogs = $SawComp + $SawComp1 + $SawComp2 + $SawComp3
      
      If (Test-Path $PrinterReport) {Remove-Item -Path $PrinterReport -Force}
      workflow test
      {
      param($SawCombinedLogs)
      
      foreach -parallel ($_ in $Using:SawCombinedLogs )
      {
      if ($_ -match “Part( Started -| Complete)|PRT(0004|0005|0009)|ENG(0029|0037)|OPR0078|WaitingForPrinterTrigger”) { $Output = $_}
      $PrintErrors += $Output | Out-String
      }
      }
      
      $PrintErrors
      
      

       

      • This topic was modified 2 months ago by Pj.
      • This topic was modified 2 months ago by Pj.
    • #200807
      Participant
      Topics: 12
      Replies: 1489
      Points: 1,987
      Helping Hand
      Rank: Community Hero

      You’re not executing the workflow, try something like this:

      workflow test {
      param($files)
      
          $results = foreach -parallel ($_ in $files ) {
              $content = Get-Content $_
      
              if ($content -match 'file') {$_}
          }
      
          $results
      }
      
      $files = 'C:\Scripts\file1.txt', 'C:\Scripts\file2.txt'
      
      test $files
      
    • #200882
      Pj
      Participant
      Topics: 10
      Replies: 42
      Points: -1
      Rank: Member

      I assume the last line where you call “test” is like calling a function which i have tinkered with but still am not getting any results in $PrinterErrors.  Also why are you putting the file list variable after test?  I’m afraid I’m not following how to apply this, can you explain better what your doing in this example thats different besides calling the workflow name?

    • #200897
      Participant
      Topics: 12
      Replies: 1489
      Points: 1,987
      Helping Hand
      Rank: Community Hero

      If you have 2 files, we’ll say 100k rows for each file, and you do this:

      $file1 = Get-Content 'C:\Scripts\file1.txt'
      $file2 = Get-Content 'C:\Scripts\file2.txt'
      
      $file1 + $file2
      

      Get-Content creates and array of lines, using the + is joining the $file1 and $file2 content into a 200k single array, which means you are processing rows from both files in one big array. Let’s start with a typical\normal loop, process each file, process each row

      One file at a time:

      $files = 'C:\Scripts\file1.txt', 'C:\Scripts\file2.txt'
      
      $results = foreach ($file in $files ) {
          $content = Get-Content $file
      
          foreach ($row in $content) {
              if ($row -match 'file') {$row}
          }
      }
      
      $results
      

      Parallel processing:

      You want to process multiple files at once, so the process is the same, but you’re just adding -parallel:

      workflow test {
      param($files)
      
          $results = foreach -parallel ($file in $files ) {
              $content = Get-Content $file
      
              foreach ($row in $content) {
                  if ($row -match 'file') {$row}
              }
          }
      
          $results
      }
      
      $files = 'C:\Scripts\file1.txt', 'C:\Scripts\file2.txt'
      
      test $files
      

      Use Measure-Command to see how long each process takes. There is a lot of information on using Powershell to process large files. Recommend you do research to find the best approach to read a single file. Even processing these 4 huge files is going to take a lot of memory, so look at processes like this:

      https://stackoverflow.com/questions/9439210/how-can-i-make-this-powershell-script-parse-large-files-faster
      Optimizing Performance of Get-Content for Large Files
      https://community.spiceworks.com/scripts/show/3651-large-file-parser-in-powershell

      Once you have found the best approach for a single file, then you can look at doing parallel operations.

      • This reply was modified 1 month, 4 weeks ago by Rob Simmers.
Viewing 3 reply threads
  • You must be logged in to reply to this topic.