Author Posts

November 6, 2013 at 10:38 am

Currently, I pull data from AD (approx. 25000 records) , DFS Namespace 1 (approx. 13000 records), DFS Namespace 2. I have a function that loops thru each AD user, does a lookup in the DFS Namespace 1 and 2 and builds a result PSObject. As requirements have continually changed, I've added more for loops and if blocks for each user to be processed, so this script that was running in 45 minutes is now taking like 5 hours.

I'm looking for ways to make the script more efficient. For instance, if I build each data object with a common attribute (e.g. SamAccountName), has anyone attempted to do a Join on multiple datasets with an attribute. I've seen blogs like http://blogs.msdn.com/b/powershell/archive/2012/07/13/join-object.aspx, but they still take a while to Join. Is it possible to convert them to DataTables and do SQL type join to get a final result set (like LINQ). Just curious if any of the experts have worked something like this and what was found to be the most efficient way to build a final result with all of the data joined together.

I can provide code, but I'm just curious if someone has had a similar scenario and if they've measured 15 different options and came to a consensus on the best approach. Thanks

November 6, 2013 at 12:39 pm

When you're working with a small enough data set that everything can be stored in memory, I build hashtables for this purpose. Doing a lookup on the key of a hashtable is quite fast. For example:

$adUsers = @{}

Get-ADUser -Filter * |
ForEach-Object {
    $adUsers[$_.SamAccountName] = $_
}

# Later:

$adUser = $adUsers[$someSamAccountName]
if ($null -ne $adUser)
{
    # do stuff
}

If the data's too large for that, I'd probably just use PowerShell to stick all that data into a SQL database, which can handle joins and full-text indexing, rather than trying to reinvent that particular wheel myself.

November 6, 2013 at 12:46 pm

I was actually writing the code to try option 2 as I agree with not re-inventing the wheel, writing to SQL and doing a query with joins. Never used a hash table for lookups, so I may try to rewrite the code using a hash versus a | Where-Object search on a PSObject just to do some measuring. I didn't want to spend time rewriting code if there was something blatantly simple I was missing. Thanks for the prompt response.

November 6, 2013 at 2:23 pm

I dont know if this will help but I had an issue where I had very large csv files (tens of thousands of records) that I needed to "join". Key lookup methods were too time consuming so I wrote this:
http://gallery.technet.microsoft.com/Join-CSV-A-simple-SQL-c2f92b60

Cheers

David Z

November 7, 2013 at 4:05 pm

[quote=11228]I dont know if this will help but I had an issue where I had very large csv files (tens of thousands of records) that I needed to “join”. Key lookup methods were too time consuming so I wrote this:
http://gallery.technet.microsoft.com/Join-CSV-A-simple-SQL-c2f92b60
Cheers
David Z
[/quote]

Your script is very similar to the algorithm used by Compare-Object (sorting and enumerating two collections together.) It still holds everything in memory while that's being done, though. For really large datasets (that would exceed PowerShell's maximum memory limits), it's more of a pain.