Coding style (pipelines and script blocks)

This topic contains 4 replies, has 3 voices, and was last updated by Profile photo of Dave Wyatt Dave Wyatt 3 years, 9 months ago.

  • Author
    Posts
  • #9920
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    I'm trying to decide how to write certain kinds of pipelines to achieve the best clarity, while not changing the functionality. Specifically, when using ForEach-Object in the middle of a pipeline. Here's some example code I posted in another thread:

    Import-Csv -Path test.csv |
    ForEach-Object {
        $emailAddress = Get-MailBox $_ | Select-Object -ExpandProperty EmailAddresses | Select-Object -First 1 -ExpandProperty SmtpAddress
    
        if ($emailAddress -ne $null)
        {
            Add-Member -InputObject $_ -MemberType NoteProperty -Name EmailAddress -Value $emailAddress -PassThru
        }
    } |
    Export-Csv -Path new.csv -NoTypeInformation
    

    The last two lines ('} |' followed by another cmdlet) look ugly to me. It's easy to overlook that the end of the ForEach-Object block is being piped to that second command, rather than the next cmdlet just being executed after the ForEach block completes. Even putting the Export-Csv command on the same line as the closing brace and pipe character doesn't seem like much of an improvement (and breaks the pattern of starting a new line after each pipe symbol in the rest of the pipeline).

    You could write a filter (or function), and call that from inside the pipeline instead (maybe separating the code that adds the EmailAddress property from the code that eliminates rows for users where the email address is null, for further clarity enhancements), something like this:

    # I couldn't decide on a name for a filter that both adds the email address column and
    # removes records that don't contain an email address.  Sort of violates the "one command,
    # one task" guideline.
    
    Import-Csv -Path test.csv |
    SomeUsefulFilterName |
    Export-Csv -Path new.csv -NoTypeInformation
    
    # Separating the tasks makes the filter easier to name, and the pipeline easier to understand at a glance
    # (though at a slight performance cost):
    
    Import-Csv -Path test.csv |
    Add-EmailAddressProperty |
    Where-Object { $_.EmailAddress -ne $null } |
    Export-Csv -Path new-csv -NoTypeInformation
    

    Both of those are definitely easier to read than the first code, but you'd have to go hunting for your filter / function if you wanted to read all of the actual code involved in the pipeline, instead of having it all right there at a glance.

    On a side note, the filter will get much better performance than ForEach-Object, if that's a concern at some point. Right now I'm just concerned with code clarity, though. If you had to take over maintenance or development of some PowerShell code that you've never seen before, which style would you prefer to read? Do you like having all of the code in the ForEach-Object block right there with the rest of the pipeline (even if it looks a bit ugly in the middle of the pipeline like that), or do you prefer moving that code out into a filter somewhere, and having a more tidy pipeline calling it?

  • #9921
    Profile photo of Poshoholic
    Poshoholic
    Member

    I use ForEach-Object in the middle of pipelines all the time. My preference is to use the backtick and prefix the line with the pipe though, because it makes for easier to read code. For example:

    Import-Csv -Path test.csv `
        | ForEach-Object {
            $emailAddress = Get-MailBox $_ | Select-Object -ExpandProperty EmailAddresses | Select-Object -First 1 -ExpandProperty SmtpAddress
    
            if ($emailAddress -ne $null) {
                Add-Member -InputObject $_ -MemberType NoteProperty -Name EmailAddress -Value $emailAddress -PassThru
            }
        } `
        | Export-Csv -Path new.csv -NoTypeInformation
    

    I find the pipe at the beginning of the line tells me exactly what that line is doing, vs. a pipe at the end of the line which requires me to look at the previous line to know what is going on. Others don't like the backtick, but it does make for nice visual style in scripts.

  • #9922
    Profile photo of Poshoholic
    Poshoholic
    Member

    Oh, and as an added plus, if you try to invoke a line starting with a pipe, PowerShell raises an error, so if you forget a backtick your script simply won't run. If, on the otherhand you forget a pipe at the end of the line, your script may try to run and depending on the command used it may actually work, with unpredictable behaviour.

  • #9924
    Profile photo of Charles Downing
    Charles Downing
    Participant

    So, is this for a script or are you trying to keep it to a one-liner? If it's a script/function that you're building, I'd say expand it to multiple lines using intermediate variables and the foreach construct instead of ForEach-Object. That makes for longer code, but it's easier to understand and easier to comment.

    $accountsAdded = @()
    
    $accounts = Import-Csv -Path test.csv
    foreach ($accounts in $accounts) {
        $emailAddress = Get-MailBox $account | 
    		Select-Object -ExpandProperty EmailAddresses | 
    		Select-Object -First 1 -ExpandProperty SmtpAddress
    
        if ($emailAddress -ne $null) {
            $accountsAdded += Add-Member -InputObject $account -MemberType NoteProperty -Name EmailAddress -Value $emailAddress -PassThru
    	}
    }
    
    Export-Csv -Path new.csv -NoTypeInformation -InputObject $accountsAdded
    

    If you're shooting for a one-liner or you just really want to use the pipeline, (it's Posh, so you feel like you have to, right?), make use of whitespace. If you indent all of the lines after the Import-Csv, it may make all the difference in the world! (I'd also break the Get-Mailbox line into multiple lines broken by a pipe to keep with your "standard")

    Import-Csv -Path test.csv | 
    	ForEach-Object {
    	    $emailAddress = Get-MailBox $_ | 
    			Select-Object -ExpandProperty EmailAddresses | 
    			Select-Object -First 1 -ExpandProperty SmtpAddress
    
    	    if ($emailAddress -ne $null)
    	    {
    	        Add-Member -InputObject $_ -MemberType NoteProperty -Name EmailAddress -Value $emailAddress -PassThru
    	    }
    	} |
    	Export-Csv -Path new.csv -NoTypeInformation
    

    Personally, I would tighten everything up and write it like this, but I can see how that would be a little harder to read:

    Import-Csv -Path test.csv | ForEach-Object {
    	    $emailAddress = Get-MailBox $_ | Select-Object -ExpandProperty EmailAddresses | Select-Object -First 1 -ExpandProperty SmtpAddress
    
    	    if ($emailAddress -ne $null) {
    	        Add-Member -InputObject $_ -MemberType NoteProperty -Name EmailAddress -Value $emailAddress -PassThru
    	    }
    	} | Export-Csv -Path new.csv -NoTypeInformation
    

    Btw, if you're using v3 or newer, the -PassThru parameter on Add-Member isn't necessary.

  • #9930
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    I probably should have mentioned this in the original post, but I try to avoid the first approach you mentioned (saving the results of each command to a temporary variable, then using the foreach loop). While the code is more readable, the functionality is slightly different. When you're working with very large result sets, saving the entire set to a variable can cause problems with excessive memory utilization. Using the pipeline can avoid this problem by streaming objects one at a time, never storing the whole list in memory.

    I tend to write all of my code to be v2-compatible, out of habit. At least until my company gets around to upgrading the version of PowerShell on our 5000 or so Windows 2008 servers. 🙂

You must be logged in to reply to this topic.