Search
Generic filters
Exact matches only
Filter by Custom Post Type

Executing LINQ Queries in PowerShell - Part 1

Greetings PowerShellers!

Lately, I've been itching to write something up on Microsoft’s Language-Integrated Query (LINQ). You've likely encountered it if you've done any development in C#. LINQ is an incredibly powerful querying tool for performing look-ups, joins, ordering, and other common tasks on large data sets. We have a few similar cmdlets built into PowerShell, but other than the '.Where()' method on collection objects nothing that comes close to the speed at which LINQ operates.

To dig into this topic, we're going to have to do a quick high level overview of a couple of other .NET staples often encountered in the C# world. You see, unlike most .NET methods which accept object types like integers, strings, and the like, LINQ uses static extension methods which only accept delegate object types.

What are delegates? In application development, there is an occasional need for objects within memory to communicate with each other for things such as "button click events." To address this, the Windows API uses function pointers to create callback functions which then report back to other functions in your applications. Within the .Net Framework, these are called delegates.

Delegates are objects that point to another method, or possibly many methods, by storing three key pieces of information: the address of the method on which it makes calls, the parameters (if any) of this method, and the return type (if any) of this method. With this information, a delegate object is able to invoke these methods dynamically at runtime, either synchronously or asynchronously. With this information, a delegate object is able to invoke these methods dynamically at runtime, either synchronously or asynchronously.

A simple example of this in C# looks like this:

using System;
namespace SimpleDelegate
{
    //Delegate declaration
    public delegate void PrintMessage(string msg);
    // Create a class with the method to bind to the delegate
    public class MessagePrinter
    {
        public static void PrintLine(string msg)
        { 
            Console.WriteLine(msg); 
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            // Create a PrintMessage delegate object that
            // "points to" MessagePrinter.PrintLine().
            PrintMessage p = new PrintMessage(MessagePrinter.PrintLine);
            p("Hi Animatronio!");
            Console.ReadLine();
        }
    }
}

Clearly, in this example the use of delegates is not necessary. I'm just trying to frame up how they would be declared and subsequently called. To simplify all of the above, Microsoft has created two generic delegate definitions. For delegates with no output, we can use Action<> and for delegates with output, we can use Func<>. These two beauties are what give us PowerShellers access to LINQ. Today we're going to use Func<> because we want output. The syntax for doing so looks like this:

[Func[int,int]]$Delegate = { param($i); return $i + 1 }

Let's break this down left to right:

  1. Declare Func<>
  2. Tell it the type of parameter(s) to expect. In this case we're passing a single integer parameter.
  3. Tell it the type of output to produce, again an integer will be returned.
  4. Name the delegate variable.
  5. Define the delegate with a scriptblock. We're just doing a very simple addition step on the parameter and returning the output.

And now we've finally arrived at the meat of this article. Let's initialize a mock dataset with ~2 million objects to play with:

$Dataset = @()
0..1000 | Foreach-Object { $Dataset += (Get-Verb)[(Get-Random -Maximum 98)] }
0..10 | ForEach-Object {$Dataset += $Dataset}

Next we'll measure how long it takes to filter down to only the objects which equal "Get" using Where-Object on three different Windows Server OS's running on the same Azure compute instances:

Measure-Command { ($Dataset | Where-Object Verb -eq "Use") }
# 2008 R2: TotalSeconds : 23.3399981
# 2012 R2: TotalSeconds : 61.7634027
# 2016   : TotalSeconds : 18.0190367

Now let's do the same query using LINQ:

[Func[object,bool]] $Delegate = { param($v); return $v.verb -eq "Use" }
Measure-Command { [Linq.Enumerable]::Where($Dataset,$Delegate) }
# 2008 R2: TotalSeconds : 11.3967464
# 2012 R2: TotalSeconds : 25.6511816
# 2016   : TotalSeconds : 12.8999417

As you can see, in the older operating systems, LINQ is over twice as fast (also, what's the deal with 2012 R2??). In 2016, it's only about 50% faster. But of course, calling '.Where()' directly on the object is still by far the fastest way to filter on a dataset:

Measure-Command { $Dataset.Where( {$_.Verb -eq "Use"}) }
# 2008 R2: TotalSeconds : 5.5102392
# 2012 R2: TotalSeconds : 17.5893828
# 2016   : TotalSeconds : 6.1834444

Initially I had suspected it was translating the scriptblock as an anonymous function and tapping into the LINQ extension method behind the scenes, but Bruce Payette set me straight. According to Bruce, It's using a very low level API to invoke the scriptblock. Source code is here.

So if '.Where()' is so much faster, why did I bother writing this? I wanted to open with a familiar concept. The true power in LINQ comes from its SQL-like ability to aggregate and manipulate data. In the next blog, we'll take a look at grouping data, using joins, and why that's awesome.

Until then, happy tinkering!

-Eli

Recommend0 recommendationsPublished in PowerShell for Admins, Tips and Tricks

  1. js said on May 23, 2018

    5.0 in 2012 R2 says this:

    PS C:\Windows\system32> [Func[int][int]]$Delegate = { param($i); return $i + 1 }
    At line:1 char:12
    + [Func[int][int]]$Delegate = { param($i); return $i + 1 }
    + ~~~
    Unexpected token 'int' in expression or statement.
    + CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : UnexpectedToken

  2. js said on May 23, 2018

    Hmm, I figured out how to declare it, but I can't get it to run.

    PS C:\users\js> [Func[int,int]]$Delegate = { param($i); return $i + 1 }
    PS C:\users\js> [int[]]$dataset = 1..10
    PS C:\users\js> [Linq.Enumerable]::Where($Dataset,$Delegate)
    Cannot find an overload for "Where" and the argument count: "2".
    At line:1 char:1
    + [Linq.Enumerable]::Where($Dataset,$Delegate)
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : NotSpecified: (:) [], MethodException
    + FullyQualifiedErrorId : MethodCountCouldNotFindBest

    • Great catch js. I updated the post and added that comma in. As for what you're trying to do, you're running into an issue because the delegate you're passing to [Linq.Enumerable]::Where() isn't filtering anything, it just increments a value passed to it. If you use my example further down of which delegate to pass and do something like this:
      [Func[object,bool]] $Delegate = { param($v); return $v.verb -eq "Use" }

      That should get you better results. The Where() method takes in a delegate which returns a bool to do its filtering. You can see the method definition here:
      https://msdn.microsoft.com/en-us/library/bb534803(v=vs.110).aspx

      It can be a bit difficult to interpret if you're not used to dealing with C#, but the important bit for our purposes is the parameters. The first is the source list of objects, the second is a delegate which follows this type definition:
      Func[TSource, Boolean]

  3. js said on May 24, 2018

    Ok, I got it to work, even if it doesn't make much sense, lol.

    [int[]]$delegate = 1..10
    [Func[int,bool]]$Delegate = { param($i); return $i + 1 }
    [Linq.Enumerable]::Where($Dataset,$Delegate)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10

Skip to toolbar