Workflow

Tagged: 

This topic contains 1 reply, has 2 voices, and was last updated by  Don Jones 3 weeks, 5 days ago.

  • Author
    Posts
  • #75517

    iain Barnetson
    Participant

    I've got a workflow that I intended to run a script on multiple servers, and in multiple databases on each of them, at the same time.
    What I've found is that it runs on one server at a time and only in 2 databases at a time.
    I'm looking for advice, suggestions, to get this working as I intended, if that's possible.

    #requires -version 3.0
    cls
    
    Import-Module sqlps -DisableNameChecking;
    set-location c:
    
    WorkFlow Run-CarrierInfo
    {
        param(
            [Parameter(Mandatory=$true)]
            [string[]]$Servers, 
            [Parameter(Mandatory=$true)]
            [string]$FilePath
        )
    
        foreach -parallel ($Server in $Servers)
        { foreach -parallel ($database in Invoke-SqlCmd -ServerInstance $Server -Query "SELECT name FROM sys.databases WHERE database_id > 4 AND name NOT LIKE 'reporting_%' AND name NOT IN ('db1','db2','db3','db4')") 
            { 
                Write-Output $($database.name)
                $Results = Invoke-SqlCmd -ServerInstance $server -Database $($database.name) -InputFile $FilePath -querytimeout ([int]::MaxValue)
                Write-Output $Results
            } 
        }
    } 
    
    
    
    $StopWatch = [system.diagnostics.stopwatch]::StartNew()
    
    $carrierInfoquery = "X:\Scripts\CarrierInfo\carrierInfo.sql"
    $Servers = "srv1","srv2"
    $carrierInfoReport = "X:\Reports\CarrierInfo\$((GET-DATE).ToString("yyyyMMdd"))_carrierInfo.csv"
    $r = Run-CarrierInfo -Servers $servers -FilePath $carrierInfoquery | Select -Property SERVER_NAME,ENVIRONMENT,CONTROL_ID,PROCESS_ID,DESCRIPTION,VENDOR,OCCURANCE,EXPORT_FILENAME,EXPORT_NAME,IS_BUILDER_FILE 
    $r | Export-Csv $carrierInfoReport -NoTypeInformation -Append
    
    $StopWatch.Stop()
    Write-Output "Elspsed minutes for v1 are: $($StopWatch.Elapsed.TotalMinutes)"
    
    
  • #75719

    Don Jones
    Keymaster

    Yyyyyeah.

    So, you may just be running into a limitation of what the .NET stuff underlying Invoke-SqlCmd is willing to do. But I'm not even sure you needed to do this in Workflow.

    Let me ask a few qualifying questions.

    Is PowerShell Remoting set up in your environment? If so, it'd be much easier to simply send the SQL command to the actual servers, via Invoke-Command. Or, keep your workflow, but instead of enumerating servers yourself, use the common -PSComputerName parameter that all Workflows get for free. That'll push the workflow out to the servers themselves, which can then attempt to parallelize the database piece.

    I'll also note that there's little value in saving the results to a variable, only to immediately write-out that variable. You're consuming unnecessary memory, which is harsh in a parallel situation.

    Similarly, saving everything to $r, only to export that, is wasted RAM. Just pipe directly to Export-CSV.

    Also, your results are going to be a bit corrupted, because you're outputting each database name as a string to the pipeline, and then loading in object results. That creates inconsistent objects in the pipeline, which Export-Csv won't handle as gracefully as you might like. I'd try to get the database name as part of the object output, as in:

    Select -Property @{n='DATABASE_NAME';e=$database.name}},SERVER_NAME,ENVIRONMENT,CONTROL_ID,PROCESS_ID,DESCRIPTION,VENDOR,OCCURANCE,EXPORT_FILENAME,EXPORT_NAME,IS_BUILDER_FILE 
    

    The rules on the -parallel switch aren't well-understood. We know there's some resource-based throttling, but we don't know the rules. Keep in mind that Workflow isn't PowerShell per se; it's being handed off to WWF, which has its own rule set. It could be that the memory you're consuming (much of it unnecessarily, I think) is having an impact; it could be that it won't matter one way or another. WWF is a bit opaque; it's one reason I don't like Workflow.

    WWF parallelism isn't thread-based, which means underlying restrictions – like whether or not .NET is willing to open a bunch of connections at once – can also be a limitation. Moving to jobs for parallelism, for example, might bypass that by getting things out into different processes.

You must be logged in to reply to this topic.