Author Posts

February 9, 2016 at 4:35 pm

Hi PowerShell Community,

I've got a script that frequently creates hashtables from collections. I used to do this "by hand" until I realized that Group-Object already provides this functionality through its -AsHash parameter. I've replaced some of the "by hand" code with Group-Object calls and I've realized that it's no longer as fast.

My question is, am I using the cmdlet wrong and causing this performance hit? If I'm not, I'm also wondering how the call to Group-Object (a cmdlet presumably written in C#) could be slower than the "by hand" PowerShell code.

I've already done a bit of investigating myself by writing a script that creates hashtables using both methods ("by hand" and Group-Object) and timing each. I've found that Group-Object is only slightly slower when the number of keys in the hash table is ~5000 or lower. However, once you get to something like 10,000 keys the difference in performance is staggering and Group-Object takes much longer.

The lowdown on the Gist script:
Just dot source it and run Compare-HashCreation with the required params

EXAMPLE:
Compare-HashCreation -NumValues 50000 -NumKeys 1000

This will create a list of 50,000 tuples of the form (Num, "foobar") where N is a random number from 0-999. Then it will create two hashtables via both methods and using the the Num property of the tuple for the hashtable keys.

Gist:

Thanks, Garrett

February 9, 2016 at 4:40 pm

I apologize; your account is flagged in the global WordPress system as a spam originator, and so your many posts on this topic have all been held. I've released this one.

February 9, 2016 at 4:42 pm

At a guess, I'd attribute this to the way .NET itself handles hash tables and arrays generally, meaning when you add an element to one, it more or less has to re-create the entire array. As the array grows progressively larger, that process obviously takes longer and longer.

February 9, 2016 at 4:42 pm

Thank you so much! Sorry for spamming but I spent like 2 hrs crafting this post so I didn't want it to go un-posted.

February 9, 2016 at 5:01 pm

Thanks for thoughts Don. I thought that at first but I found that the hashtable values returned by Group-Object are actually of Collection type:

[16:53:49] PS> (dir | Group-Object Mode -AsHashTable -AsString).Values | % { $_.GetType() }

IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Collection`1 System.Object
True True Collection`1 System.Object

[16:54:01] PS>

I not 100% positive about this but I don't think the Collection objects should run into any expensive array copying problems when they grow.

February 9, 2016 at 5:06 pm

Interesting... I've never looked at the Group-Object cmdlet's code before, and it behaves a bit oddly in this method (decompiled with ILSpy):

The first bit of code (based on the result of TryGetValue) is what you'd expect of code that adds to a dictionary. What's interesting is the "for" loop in the else block, which iterates over all of the groups instead of using a dictionary-based lookup. I'm not sure why that code needs to be there, but that is definitely the sort of thing that could make it take a long time to execute if you're dealing with a large data set.

// Microsoft.PowerShell.Commands.GroupObjectCommand
internal static void DoGrouping(OrderByPropertyEntry currentObjectEntry, bool noElement, List groups, Dictionary groupInfoDictionary, OrderByPropertyComparer orderByPropertyComparer)
{
	if (currentObjectEntry != null && currentObjectEntry.orderValues != null && currentObjectEntry.orderValues.Count > 0)
	{
		object key = PSTuple.ArrayToTuple(currentObjectEntry.orderValues.ToArray());
		GroupInfo groupInfo = null;
		if (groupInfoDictionary.TryGetValue(key, out groupInfo))
		{
			if (groupInfo != null)
			{
				groupInfo.Add(currentObjectEntry.inputObject);
				return;
			}
		}
		else
		{
			bool flag = false;
			for (int i = 0; i < groups.Count; i++)
			{
				if (orderByPropertyComparer.Compare(groups[i].GroupValue, currentObjectEntry) == 0)
				{
					groups[i].Add(currentObjectEntry.inputObject);
					flag = true;
					break;
				}
			}
			if (!flag)
			{
				GroupObjectCommand.tracer.WriteLine(string.Format(CultureInfo.InvariantCulture, "Create a new group: {0}", new object[]
				{
					currentObjectEntry.orderValues
				}), new object[0]);
				GroupInfo groupInfo2 = noElement ? new GroupInfoNoElement(currentObjectEntry) : new GroupInfo(currentObjectEntry);
				groups.Add(groupInfo2);
				groupInfoDictionary.Add(key, groupInfo2);
			}
		}
	}
}

February 9, 2016 at 5:06 pm

Man, I tried to edit bc I forgot the Gist link and it got marked as spam again 🙁

February 9, 2016 at 6:31 pm

Hey Dave, thanks for your reply. I kinda see what you're talking about but I'll need a bit more time to digest exactly what's going on in the code you pasted.

Somewhat unrelated, I've never heard of ILSpy but I just downloaded it bc it seems pretty useful. However, I'm not sure how you navigated to the Group-Object code. Could explain how you did that?