Help with workflows and parallel loops

This topic contains 13 replies, has 3 voices, and was last updated by  Max Kozlov 2 months, 2 weeks ago.

  • Author
    Posts
  • #76789

    David Flores
    Participant

    So I just recently started investigating PowerShell workflows and loops with the -parallel switch. My first thought, when I started reading about it was: "This is great. I should be able to write a script that queries every domain controller in our environment for a user's 'lastlogon' attribute, and instead of waiting for every DC to respond (we've got about 80 in our environment) query every DC, essentially simultaneously."

    However, my first attempt at a script didn't work out the way I wanted. He're what I tried:

    —————————begin script—————————————–

    workflow querydcs {

    param([string[]]$computers)

    foreach -parallel ($computer in $computers)

    {
    get-aduser "MyUserAccount" -server $computer -properties lastlogon | select @{name="DC" ; expression={$computer}},lastlogon
    }

    }

    $computers = get-addomaincontroller -filter * | select -expandproperty name

    querydcs $computers

    ——————————————–end script—————————————————-

    (Note: for simplicity's sake, in this example I'm not converting the lastlogon attribute to human readable form)

    So, the script appears to work as intended EXCEPT my output only includes the "lastlogon" value from the select statement but not the "DC" attribute value, which is just the value of the looping variable.

    Output looks something like:

    DC:

    LastLogon: 5/10/2017 10:50:23 PM

    PSComputerName: localhost

    PsSourceJobInstanceID: hexhexhexh-hex-hex-hex-hexhexhex

    Clearly there's a fair amount going on under the hood that I don't understand, here (why are PSComputerName and PsSourceJobInstanceID being returned, for instance). My biggest gripe, though, is that knowing the last time a user authenticated against a random DC is significantly less useful than also knowing the what DC he authenticated against. So what am I doing wrong here?

  • #76801

    Don Jones
    Keymaster

    Yeah, a LOT. Workflows aren't run by PowerShell. They're run by WWF, and the rules are entirely different.

    Before we dive into this, have you considered the lastLogonTimestamp value instead, which is replicated across DCs? So you only have to query one?

    • #76837

      David Flores
      Participant

      Thanks, Don. In 99% of cases I use lastLogonTimestamp but there are cases where HR wants to know precisely the last time a user touched the network (say an employee was terminated but the account wasn't disabled on time) and lastLogonTimestamp is only updated every 14 days essentially. Also, lastLogonTimetamp is a pretty unreliable value, as it can be tripped by merely doing an "effective permissions" against the user on some random ACL (something that SharePoint does for large collections of users from time to time)!

    • #76852

      David Flores
      Participant

      OK, so this works:

      ——————————–Script————————-

      workflow querydcs {

      param([string[]]$computers)

      foreach -parallel ($computer in $computers)

      {
      InlineScript {
      get-aduser "MyUser" -server $using:computer -properties lastlogon | select @{Name="DC" ; expression = {"$using:computer"}},@{Name="LastLogon" ; expression = {[datetime]::FromFileTime($_.lastLogon)}}
      }

      }
      }

      $computers = get-addomaincontroller -filter * | select -expandproperty name
      querydcs $computers

      —————————————-end script—————————–

      But it's pretty slow. I was expecting this script to return a collection of 80 queries in about the same time it takes to query a single DC, but it doesn't feel like that. Almost feels like I'm doing a simple, serial loop rather than a bunch of tasks in parallel. Now, due to IPSec rules and distance we do have some machines that take a while to respond. I wonder if the task is merely taking as long as the slowest machine.

  • #76840

    Max Kozlov
    Participant

    you can try to use workflow with PSComputername scheme:

    workflow querydc { param($user) get-aduser $user -properties lastlogon }
    >querydc -PSComputerName $dclist -user $username
    
    • #76849

      David Flores
      Participant

      Thing is: I'm trying to associate the lastlogon value with a specific domain controller. That code snippet doesn't look like it would do this.

    • #76878

      Max Kozlov
      Participant

      Did you try this or it just your thought ?
      Did you understand what changed by -PSComputerName parameter usage ?

      this code can get lastlogon for specific user from domain controllers list in parallel (as you wish, by workflow) because it will execute command on DC's itself just like Don's Invoke-Command. This is not about 'how to choose the last timestamp from many returned'

    • #77001

      David Flores
      Participant

      OK, max I got you. Unfortunately your script won't work for me because is can only be run under Domain Admin credentials (due to logon restrictions on DCs). I need a script that can query a DC but doesn't have to be run on the DC.

    • #77011

      Max Kozlov
      Participant

      so, because Get-ADUser seems not thread safe I can suggest you only adsi + RSJob way

      something like

      $dcs | Start-RSJob {
         $dchost = $_
         $ds = New-Object System.DirectoryServices.DirectorySearcher
         $ds.SearchRoot = "LDAP://$($dchost)/DC=yourcorp,DC=com"
         $ds.Filter = "(ANR=YourUserName)"
         [void]$ds.PropertiesToLoad.Add('LastLogon')
         $ds.FindAll() | Foreach-Object {
            [PSCustomObject]@{
              DC = $dchost
              LastLogon = $_.Properties.lastlogon
            }
         }
      } | Wait-RSJob | Receive-RSJob
      
    • #77038

      David Flores
      Participant

      Thanks, Max. I may look into this. To be honest I was asking the question not so much because I needed to solve that particular problem but because I wanted to better understand the capabilities and limitations of workflows and looping with the parallel switch. That particular problem was just one I thought would make sense to tackle with a workflow.

      I'll have to look into RSJob separately and see how that might be a useful tool. To be honest this is the first I've heard of it.

    • #77071

      Max Kozlov
      Participant

      when workflow first appeared I was dreamed that it can help me with many things
      but in fact I now use it only in old scripts that I dont want to change @because it works@

      imho it can be used only in fire-and-forget scenarios on local machine and with -PSComputername parameter on remote one's.

      PoshRSJob (https://github.com/proxb/PoshRSJob) or Invoke-Parallel – this is my choise

  • #76854

    Don Jones
    Keymaster

    So, here's your problem.

    InlineScript – whether explicit, or when used implicitly around a PowerShell command for which a Workflow Activity isn't available – is always going to launch a new PowerShell instance, and essentially be its own scope. Many of Workflow's alleged advantages evaporate when you don't have a Workflow Activity and are instead working with InlineScript. WWF itself will start to throttle parallelism, using an invisible and uncontrollable algorithm, if you start to suck down too much RAM or CPU – and launching multiple PowerShell processes will certainly tip into that at some point. Each process, in your case, also has to load the AD module, which is non-trivial.

    That's why I initially asked if you were hell-bent on using Workflow, or if you just wanted this to work quickly. Workflow is literally the most complex way to do essentially anything, and it involves a completely different execution rule set and environment. It only looks like PowerShell on the surface.

    For example, if your domain controllers have Remoting enabled, this would be _vastly_ easier if you just used Invoke-Command, which also offers parallelism, tracks which machine a result came from, and works entirely inside PowerShell. And when I say "vastly," I mean, like, "one line of code." Maybe two.

    A lot of people get... "tricked" into Workflow. I get it. It's shiny, and the docs make a lot of big promises. But for what Workflow was supposed to accomplish, the actual implementation was about the worst way Microsoft could have gotten there. Heck, the underlying WWF is basically deprecated, which tells you how committed the .NET team is to it.

  • #76857

    Don Jones
    Keymaster

    E.g.,...

    Invoke-Command -Computer (get-addomaincontroller -filter * | select -expandproperty name)
                   -ScriptBlock {
    
    get-aduser "MyUser" -properties lastlogon | select @{Name="LastLogon" ; expression = {[datetime]::FromFileTime($_.lastLogon)}}
    
                   }
    

    The idea being that this runs ON each DC, locally, as if you'd logged into the console. Remoting will automatically add a PSComputerName to the return value, so you'll know which DC returned which value. And they'll run in parallel, and you can control the -Throttle for that (defaults to 32). Because each DC is querying itself, it ends up with a really fast connection to... well, to itself.

    On DCs running 2012 or later, this will "just work by default." Earlier (back to 2003) would need PowerShell v2 or later, and would need to have Enable-PSRemoting run to enable Remoting. Which is the same thing Workflow's -PSComputerName parameter would have been using.

    • #76861

      David Flores
      Participant

      Thanks, Don. Helpful stuff here. I'd hate to bring a server to its knees because I was running 10,000 instances of PowerShell!

      You're right. Workflows do seem to promise unlimited wealth and power (not to mention getting repetitive tasks done more quickly) but I'll be mindful of their limitations.

You must be logged in to reply to this topic.