PowerShell Great Debate: Piping in a Script


Take a look at this:

# version 1
Get-Content computers.txt |
ForEach-Object {
  $os = Get-WmiObject Win32_OperatingSystem -comp $_
  $bios = Get-WmiObject Win32_BIOS -comp $_
  $props = @{computername=$_;
  New-Object PSObject -Prop $props
# version 2
$computers = Get-Content computers.txt
foreach ($computer in $computers) {
  $os = Get-WmiObject Win32_OperatingSystem -comp $computer
  $bios = Get-WmiObject Win32_BIOS -comp $computer
  $props = @{computername=$computer;
  New-Object PSObject -Prop $props

These two snippets do the same thing. The first uses a more "pipeline" style approach, and I've personally never felt the urge to do that in a script. Probably habit - I come from the VBScript world, so a construct like foreach($x in $y) is natural for me. I've seen folks get into that "pipeline" approach inside a script and get into trouble, and if I'm scripting I often prefer to use the more formal, structured approach of the version 2 snippet.
What're your thoughts? For me, version 1 has some downsides - forcing yourself into that pipeline structure can be limiting, and I find the approach in version 2 to be more readable and a bit easier to follow. Frankly, I'm never a fan of having to mentally track what's in $_.
(Which brings up a sidebar: I tend to evaluate a script's goodness based on how well I can understand what it does without running it. That's a common criteria, in fact, and one I personally think helps aid in debugging as well as maintaining scripts.)

Anyway... discuss!
[boilerplate greatdebate]

13 Responses to " PowerShell Great Debate: Piping in a Script "

  1. JBLewisMN says:

    I used to use the piping method (It was new and exciting!), but since migrating to Exchange 2010, I’ve shifted to the more structured method, because with the implicit remoting in the Exchange Management Shell, you can’t pipe one exchange cmdlet into another.
    I’m not saying I’m totally consistent, though!

  2. I tend to use the first example, or in general try to define things as a variable as much as possible when writting a script. However, there are times where I will pipe simple things. Such as when selecting a dataset from SQL. I will be guilty of doing $DataSet.Tables[0] | Where-Object {$_.Employee_ID -like “2234*”}

  3. tomstrike says:

    I wish I’d gotten far enough through your book to have an opinion Don. But I do look forward to following this discussion. And maybe after a month of lunches I’ll return with an opinion. 🙂

  4. Mike Shepard says:

    I rarely ever use foreach-object loops like this in a script. The number of lines of code is the same, but you are stuck with $_ instead of $computer (in this example). I do use pipelines, of course, but not for stuff like this.

  5. Poshoholic says:

    It depends on the scenario (as do most things in programming).
    foreach (the keyword) is faster than ForEach-Object (the cmdlet), because the introduction of a pipeline comes with a performance penalty. But foreach (the keyword) requires the entire collection that it is processing to be loaded in memory, and objects will not be processed until that collection is loaded in memory. ForEach-Object (the cmdlet) will process objects as soon as they are sent to it from the pipeline, and it doesn’t have as large of a memory footprint, but the overall runtime will always be longer. If you’re writing something that is outputting object data to screen (console or a GUI), you may want to use ForEach-Object because you’ll start getting data faster and you don’t have to wait until the entire collection comes back just to see data (useful if you want to Ctrl+C the command). If you’re writing a script where the collection isn’t very large and you just want it to run as fast as possible though, always use the foreach keyword instead, especially when you already have the collection in memory already.
    The foreach variants aside, I always favor avoiding pipelining in scripts unless there is a use case where I want to see data more quickly on screen or where I am dealling with a very large set of data. In the ad-hoc shell though I’ll use foreach-object almost exclusively because it is easier to work with in an ad-hoc manner.
    Kirk out.

  6. Piotr says:

    in interactive commandline (which is 99% of my powershell usage – every day, 8 hours a day I have powershell.exe open and execute one-liners) I always use command | % {loop} and in scripts I use foreach ONLY when I’m 100% sure the collection unwinding thing will not make me run out of memory. Don’t want that scheduled task to fail only because there’s too much data to fit in already crowded RAM.

  7. Matthew Marchese says:

    I actually prefer the pipeline method for several reasons.
    1) PowerShell is the first scripting language I have learned and I actually found that $_ was easier to understand vs the old VBScript way. The idea of creating a new variable within the foreach parenthesis was confusing to me (mostly because I had learned how to use the pipeline first).
    2) I find the idea of creating an extra variable unneeded since the pipeline handles all of it for me. If I want to explicitly store the piped in value to a variable I’ll do it within the loop for use later on in the loop. I’ve trained myself to know that when I see $_ that I need to look at where it is coming from to understand what information is being piped into it. If I see a variable named $computers and then underneath it a variable called $computer I actually find that MORE confusing because I then have to differentiate between the two variables themselves.
    I’m just weird like that but since I learned PowerShell first I guess my mind finds it easier to process $_ rather than ($computer in $computers).

  8. I have no scripting background and I am not fluent enough in PowerShell to say which I prefer, however the second approach I can logically follow better than the first. I mean I don’t fully understand what the value is assigned to $_ is which doesn’t help me much when trying to reproduce it.

  9. Rob Campbell says:

    Echoing Kirk, Foreach is much faster than Foreach-Object, In addition to that, I’ll add that you cannot construct a nested loop using only Foreach-Object. At least one of the loop constructs must be a Foreach.
    When working with large data sets (large enough that holding all the data in memory is impractical or impossible) Foreach cannot replace Foreach-Object, but I’ve found that a filter can be used instead, and can produce much better performance.
    (which wil probably lead to another debate about functions vs filters)

  10. jkavanagh58 says:

    I prefer version 2 as well. I might just a pipeline in a script for sorting an array or something along those lines, but in general I will use ForEach because IMHO it is easier to write as well as read.

  11. Dave Wyatt says:

    I use foreach loops if the collection is already in memory, and ForEach-Object otherwise (to avoid storing a potentially large list in memory, even if only temporarily).
    Interesting to see the difference in performance of ForEach-Object versus a filter, though. I’ll keep that in mind for future scripts.

  12. Jaykul says:

    I think, reading the responses, that in this great debate the example is leading the discussion away from the question.
    Reading comments, it’s clear that:
    * People don’t like it when you use $_ a lot (you could fix that by assigning to a named variable)
    * People don’t like ForEach-Object in general (it’s slow and forces the use of $_)
    But it’s not clear to me that people don’t like the pipeline.