Do While and Do Until through Array Items

This topic contains 0 replies, has 1 voice, and was last updated by Profile photo of Forums Archives Forums Archives 5 years, 6 months ago.

  • Author
    Posts
  • #5759

    by Matt Hitchcock at 2013-04-03 19:28:19

    Hi

    I'm having some difficulty with manipulating some information into a text file. Basically, I receive some file in one format that I need to evaluate and convert into a different format through an automated script. The CSV file starts out as:

    EmployeeID,AD Account UPN,Firstname,Lastname
    001,mh@domain1.com,Matt,Hitchcock
    001,matty@domain2.com,Matt,Hitchcock
    001,matthew@domain3.com,Matt,Hitchcock
    001,hitch@domain4.com,Matt,Hitchcock
    001,hitch@domain5.com,Matt,Hitchcock

    EmployeeID is not a unique value, basically this is for a migration project where users have accounts in different domains. I need to re-work this into a format that Quest Quick Connect can use to stamp employee numbers onto the accounts in different Forests, so I end up with:

    EmployeeID,Domain1UPN,Domain2UPN,Domain3UPN,Firstname,Lastname,AllOtherAccounts
    001,mh@domain1.com,matty@domain2.com,matthew@domain3.com,Matt,Hitchcock,"hitch@domain4.com,hitch@domain5.com"

    The AllAccounts column exists because all accounts not matched to one of the domains we are looking for is populated into a multi-valued attribute for reference.

    The issue I am having is that I while my script works fine to import the file, build an object for each employee, identify what we want and put the object into an array to output to a CSV at the end, the import file I receive is now 111,000 lines long as is taking an immense amount of time to process.

    My logic so far is (roughly explained) as follows:
    1. Import the CSV File into an array $Records
    2. Loop through the Array and for each employee number not in the arry $UniqueEmployeeIDs, add it (so we get a count of unique employees)
    3. For each $UniqueEmployeeID, go through the $Records array and where an object has the same employee ID, pull it into the $CurrentEmployee array to work with it
    4. Build a new Object for the User and create attributes with values i.e. Name = Domain1UPN, Value = mh@domain1.com. etc.
    5. Add the new Object to the $ProcessedObjects array and move to the next $UniqueEmployeeID and repeat
    6. Output $ProcessedObjects into a CSV

    Step 3 is where I hit the problem, I am using this on what is now a 111,000 array! (it was fine when it was smaller!):
    [Array]$CurrentEmployee = $Records | Where-Object {$_.EmployeeID -like $UniqueEmployeeID}

    I have the $UniqueEmployeeIDs array sorted into Numerical order. Is there a way that I can find the first matching object in the array, when I hit a match, start adding the objects to the $CurrentEmployee array then once I hit an object that does not match i.e. the employeeID is greater, stop processing and move onto the next $UniqueEmployeeID? Rather than evaluating every object in the array?

    It's difficult to share the script as it has a lot of environment specific information which I can't really post online, I am working to change it all so I can post it, but in the meantime I guess I wanted to ask is this possible? To go through an array until you match an object then do something until they objects stop matching and move onto the next? I am thinking Do While and Do Until's would make this possible but just can't see the code in my head!

    I am thinking something like the following but am yet to try:

    foreach ($i in $Records) {

    while ($i.EmployeeID -notlike $Employee.EmployeeID) {
    Continue
    Do Until ($i.EmployeeID -like $Employee.EmployeeID) {
    $CurrentEmployee += $i
    Do Until ($i.EmployeeID -notlike $Employee.ID)
    }
    }
    }

    Thanks

    by mjolinor at 2013-04-03 20:41:24

    I'd be tempted to forget the import-csv and just deal with it as text lines, splitting the data fields out of the text.

    If it's sorted data, you should be able to trigger the output of a completed object and starting the creation of a new one based on the EmployeeID changing from the last line that was read.

    by Matt Hitchcock at 2013-04-03 22:46:31

    Thanks for the reply. Not importing the CSV sounds better but I don't completely follow what you mean, how would I do what you're suggesting?

    by nohandle at 2013-04-04 03:08:05

    [quote="Matt Hitchcock"]Not importing the CSV sounds better but I don't completely follow what you mean, how would I do what you're suggesting?[/quote]
    I suppose this is what he means:
    If the data are already sorted by unique identifier, you are just a step away from grouping the data to create bulks from which you can generate the result -> effectively skipping the "get me list of unique employee ids and then create arrays of user data by looking them up in the huge table". (in this case the best candidate for uniques identifier is obviously the employee ID).

    In attempt to implement this solution I came up with this pipeline, where i dumbly follow the structure of the data (rely on the data being properly sorted). It processes 150000 lines in about 25 seconds 🙂
    $delimiter = ";"

    Get-Content .\book1.csv | foreach -Begin {
    $skipHeader = $true
    $userIdLength = 3
    $userData = @()
    } -Process {
    #using first to skip header
    if (-not ($skipHeader))
    {
    $lineCurrent = $_
    #to avoid as much searching as possible I rely highly on the structure of the data
    $userIdCurrent = $lineCurrent.Substring(0,$userIdLength)

    #create groups based on user id
    if ($userIdCurrent -eq $userIdPrevious)
    {
    $userData += $lineCurrent
    }
    else
    {
    #output the group as array
    if ($userData) {,$userData}

    $userData = @()
    $userIdPrevious = $userIdCurrent
    $userData += $lineCurrent
    }

    }
    else
    {
    $skipHeader = -not $skipHeader
    }
    }-End {
    #make sure you output the last group
    ,$userData
    } |
    foreach {
    $groupCurrent = $_
    $EmployeeId,$Domain1UPN,$Firstname,$Lastname = $groupCurrent[0] -split $delimiter,$null,'simplematch'
    $Domain2UPN = ($groupCurrent[1] -split $delimiter,3,'simplematch')[1]
    $Domain3UPN = ($groupCurrent[2] -split $delimiter,3,'simplematch')[1]

    #process rest of the groups to all other accounts
    $groupCurrentCount = $groupCurrent.Count
    if ($groupCurrentCount -gt 3)
    {
    $allOtherAccountsTemp = @()
    for ($index = 3; $index -lt $groupCurrentCount; $index++)
    {
    $allOtherAccountsTemp += ($groupCurrent[$index] -split $delimiter,3,'simplematch')[1]
    }
    $AllOtherAccounts = $allOtherAccountsTemp -join ','
    }
    "$EmployeeID,$Domain1UPN,$Domain2UPN,$Domain3UPN,$Firstname,$Lastname,`"$AllOtherAccounts`""

    $EmployeeId = $Domain1UPN = $Domain2UPN = $Domain3UPN = $Firstname =$Lastname = $AllOtherAccounts = $allOtherAccountsTemp =@()

    }

    Of course more logic can be placed in the second part (second foreach) if needed.

    note: When I tested the runtime I piped the output to out-Null.

    by nohandle at 2013-04-04 03:18:08

    This is my input:
    PS C:\users\ja07602\desktop> Get-Content .\book1.csv
    EmployeeID;AD Account UPN;Firstname;Lastname
    001;mh@domain1.com;Matt;Hitchcock
    001;matty@domain2.com;Matt;Hitchcock
    001;matthew@domain3.com;Matt;Hitchcock
    001;hitch@domain4.com;Matt;Hitchcock
    001;hitch@domain5.com;Matt;Hitchcock
    002;mh@domain1.com;Matt;Hitchcock
    002;matty@domain2.com;Matt;Hitchcock
    002;matthew@domain3.com;Matt;Hitchcock
    002;hitch@domain4.com;Matt;Hitchcock
    002;hitch@domain5.com;Matt;Hitchcock
    003;mh@domain1.com;Matt;Hitchcock
    003;matty@domain2.com;Matt;Hitchcock
    003;matthew@domain3.com;Matt;Hitchcock
    003;hitch@domain4.com;Matt;Hitchcock
    003;hitch@domain5.com;Matt;Hitchcock

    This is my output:
    001,mh@domain1.com,matty@domain2.com,matthew@domain3.com,Matt,Hitchcock,"hitch@domain4.com,hitch@domain5.com"
    002,mh@domain1.com,matty@domain2.com,matthew@domain3.com,Matt,Hitchcock,"hitch@domain4.com,hitch@domain5.com"
    003,mh@domain1.com,matty@domain2.com,matthew@domain3.com,Matt,Hitchcock,"hitch@domain4.com,hitch@domain5.com"

    by mjolinor at 2013-04-04 04:15:13

    Another possible method – not tested but should work.

    $inputfile = $outputfile = $UserObj = $null

    $NewUser = # script block to clean up and output user object and set up for next user
    {

    if ($UserObj){
    $UserObj.AllOtherAccounts = $UserObj.AllOtherAccounts -replace ',$',''
    $UserObj
    }

    $UserObj =
    [PSCustomObject]@{
    EmployeeID = $null
    Domain1UPN = $null
    Domain2UPN = $null
    Domain3UPN = $null
    Firstname = $null
    Lastname = $null
    AllOtherAccounts = $null
    }
    $DomainCount = 1
    }

    &{
    Get-Content $inputfile |
    select -skip 1 |

    foreach {
    $parts = $_.split(',')
    if ($parts[0] -ne $UserObj.EmployeeID) {.$NewUser}

    Switch ($DomainCount)
    {
    1 {
    $UserObj.EmployeeID = $parts[0]
    $UserObj.Domain1UPN = $parts[1]
    $UserObj.Firstname = $parts[2]
    $UserObj.Lastname = $parts[3]
    $DomainCount++
    Break
    }

    2 {
    $UserObj.Domain2UPN = $parts[1]
    $DomainCount++
    Break
    }

    3 {
    $UserObj.Domain3UPN = $parts[1]
    $DomainCount++
    Break
    }

    4 {
    $UserObj.AllOtherAccounts += $parts[1] + ','
    }
    }
    }

    .$NewUser # output last object

    } | export-csv $outputfile -notype

    by nohandle at 2013-04-04 05:03:14

    Looks very elegant. 🙂

    but it does not work for me.

    Stepping through the code, unfortunately I get several error.
    Error regarding the new user label.
    After removing it, I get few errors about calling method on null-values expression originating from the newuser scriptblock.

    This does not look right.
    $UserObj.Domain1UPN = $patrs[1]
    $UserObj.Domain1UPN = $parts[2]

    and this in what I get in the output csv:
    False,"False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Object","7"
    also the object content does not look right but maybe I just broke your code:
    Name Value
    ---- -----
    AllOtherAccounts
    Firstname Hitchcock
    Domain1UPN Matt
    Lastname
    Domain2UPN Matt
    Domain3UPN Matt
    EmployeeID 001

    by mjolinor at 2013-04-04 06:12:42

    [quote]
    Looks very elegant. 🙂
    [/quote]
    Thatnks. That's what I was going for.

    You're right, that Domain1UPN is busted.

    There may be some more detail bugs besides that. As I said, I didn't have time to test that yet, but wanted something the OP could easily follow and debug/modify.

    I've got another script I need to go work on (one of those ASAP deals), but I patched the obvious bugs. I'll come back and do some testing later if there's still issues.

    by nohandle at 2013-04-04 06:54:38

    [quote="mjolinor"]$UserObj.Domain1UPN = $patrs[1][/quote]
    Still a typo there.

    by mjolinor at 2013-04-04 09:30:24

    OK. Finally got a chance to get back to this.
    Should work better now.

    by nohandle at 2013-04-04 09:49:19

    Gave it quick test and the output is still like this:
    "False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+V
    alueCollection","System.Object","7"

    by mjolinor at 2013-04-04 10:00:20

    I'm using this for inputdata:

    EmployeeID,AD Account UPN,Firstname,Lastname
    001,mh@domain1.com,Matt,Hitchcock
    001,matty@domain2.com,Matt,Hitchcock
    001,matthew@domain3.com,Matt,Hitchcock
    001,hitch@domain4.com,Matt,Hitchcock
    001,hitch@domain5.com,Matt,Hitchcock
    002,mh@domain1.com,Matt,Hitchcock
    002,matty@domain2.com,Matt,Hitchcock
    002,matthew@domain3.com,Matt,Hitchcock
    002,hitch@domain4.com,Matt,Hitchcock
    002,hitch@domain5.com,Matt,Hitchcock
    003,mh@domain1.com,Matt,Hitchcock
    003,matty@domain2.com,Matt,Hitchcock
    003,matthew@domain3.com,Matt,Hitchcock
    003,hitch@domain4.com,Matt,Hitchcock
    003,hitch@domain5.com,Matt,Hitchcock

    And getting this result:

    EmployeeID : 001
    Domain1UPN : mh@domain1.com
    Domain2UPN : matty@domain2.com
    Domain3UPN : matthew@domain3.com
    Firstname : Matt
    Lastname : Hitchcock
    AllOtherAccounts : hitch@domain4.com,hitch@domain5.com

    EmployeeID : 002
    Domain1UPN : mh@domain1.com
    Domain2UPN : matty@domain2.com
    Domain3UPN : matthew@domain3.com
    Firstname : Matt
    Lastname : Hitchcock
    AllOtherAccounts : hitch@domain4.com,hitch@domain5.com

    EmployeeID : 003
    Domain1UPN : mh@domain1.com
    Domain2UPN : matty@domain2.com
    Domain3UPN : matthew@domain3.com
    Firstname : Matt
    Lastname : Hitchcock
    AllOtherAccounts : hitch@domain4.com,hitch@domain5.com

    Here's the raw content of the output file:
    "EmployeeID","Domain1UPN","Domain2UPN","Domain3UPN","Firstname","Lastname","AllOtherAccounts"
    "001","mh@domain1.com","matty@domain2.com","matthew@domain3.com","Matt","Hitchcock","hitch@domain4.com,hitch@domain5.com"
    "002","mh@domain1.com","matty@domain2.com","matthew@domain3.com","Matt","Hitchcock","hitch@domain4.com,hitch@domain5.com"
    "003","mh@domain1.com","matty@domain2.com","matthew@domain3.com","Matt","Hitchcock","hitch@domain4.com,hitch@domain5.com"

    by nohandle at 2013-04-04 10:06:10

    Then something is broken on my side probably. Hope it works for the OP. Thanks for lecture in code elegance 🙂

    by mjolinor at 2013-04-04 10:10:20

    I was going more for "example" than "lecture", and hoping I came up with a good example.

    by Matt Hitchcock at 2013-04-08 00:35:06

    Thanks for the input, there is some great stuff I can use here, I'll have a play and update with how I get on.

    Appreciate the help.

    by Matt Hitchcock at 2013-04-10 02:10:43

    Hi

    Just wanted to say thanks, this worked great! I modified it a bit after it has gathered all lines for a single user to run all of the UPN's against a list of naming conventions to determine the domain that they belong to and whether the account has been verified (there is a prefix in the import file) and it works like an absolute charm. 111,000 rows of data re-worked into an importable CSV in just a few minutes!

    Thanks for your help, it was a much more efficient way of doing it.

    by nohandle at 2013-04-10 02:23:10

    Great to hear back from you. 🙂
    Which approach you used btw mjolinor's or mine?

    by Matt Hitchcock at 2013-04-10 03:14:55

    [quote="mjolinor"]Another possible method – not tested but should work.

    $inputfile = $outputfile = $UserObj = $null

    $NewUser = # script block to clean up and output user object and set up for next user
    {

    if ($UserObj){
    $UserObj.AllOtherAccounts = $UserObj.AllOtherAccounts -replace ',$',''
    $UserObj
    }

    $UserObj =
    [PSCustomObject]@{
    EmployeeID = $null
    Domain1UPN = $null
    Domain2UPN = $null
    Domain3UPN = $null
    Firstname = $null
    Lastname = $null
    AllOtherAccounts = $null
    }
    $DomainCount = 1
    }

    &{
    Get-Content $inputfile |
    select -skip 1 |

    foreach {
    $parts = $_.split(',')
    if ($parts[0] -ne $UserObj.EmployeeID) {.$NewUser}

    Switch ($DomainCount)
    {
    1 {
    $UserObj.EmployeeID = $parts[0]
    $UserObj.Domain1UPN = $parts[1]
    $UserObj.Firstname = $parts[2]
    $UserObj.Lastname = $parts[3]
    $DomainCount++
    Break
    }

    2 {
    $UserObj.Domain2UPN = $parts[1]
    $DomainCount++
    Break
    }

    3 {
    $UserObj.Domain3UPN = $parts[1]
    $DomainCount++
    Break
    }

    4 {
    $UserObj.AllOtherAccounts += $parts[1] + ','
    }
    }
    }

    .$NewUser # output last object

    } | export-csv $outputfile -notype[/quote]

    Thanks for this, will try to test it also. It looks a lot nice than mine does 🙂

    by Matt Hitchcock at 2013-04-10 03:16:47

    I modified yours (nohandle), I was going to test both and see what worked but I ran through yours and saw straight away how I needed to modify it so just went with it 🙂

You must be logged in to reply to this topic.