Check for duplicate value in CSV file

This topic contains 8 replies, has 4 voices, and was last updated by Profile photo of Dan Potter Dan Potter 1 year, 11 months ago.

  • Author
    Posts
  • #26305
    Profile photo of David Schmidtberger
    David Schmidtberger
    Participant

    I am looking for a quick way to check if a value is duplicated in the csv after i import it.

    the script i'm workin with is related with disabling active directory accounts and we recieve data in a similar format

    status,networkid
    update,e11111
    terminate,e12345
    update,e12345

    as the script processes the information, i need to check if the network id is duplicated anywhere else in the file (essentially do not process the terminate if it exists anywhere else in the column)

    (i do recieve a large amount of other information in the file, but really am only concerned with the networkid and the status)

    i'm sure this is doable, just can't get my mind wrapped around it

  • #26314
    Profile photo of Will Anderson
    Will Anderson
    Keymaster

    Hey there David,

    Don Jones actually posted an article on this on Hey! Scripting Guy some time back. It's a good read. Long and short, here's an example:

    $CSV = Import-Csv C:\scripts\Network.csv
    
    $CSV.networkid | Group-Object | Where-Object {$_.Count -gt 1}

    Give that a shot and see if it works for you. Here's the article if you'd like to read it:
    http://blogs.technet.com/b/heyscriptingguy/archive/2008/01/31/how-can-i-use-windows-powershell-to-retrieve-the-non-unique-items-in-a-list.aspx

  • #26321
    Profile photo of David Schmidtberger
    David Schmidtberger
    Participant

    well that definately tells me if objects are duplicated, what i really need is to be able to check that value as it processes hundreds of records. i have managed to get something cobbled together that tells me if the id is duplicated. but not really efficient to use as a check to see if to proceed on each line

    $path         = Split-Path -parent $MyInvocation.MyCommand.Definition
    $input  = $path + "\*.csv"
    $csv = Import-Csv $input
    
    $duplicate = $CSV.networkid | Group-Object | Where-Object {$_.Count -gt 1}
    
    foreach ($user in $csv)
    {
    if ($user.status -like "*TERMINATION*")
    {
    	foreach ($dupe in $duplicate)
    	{
    
    	if ($dupe.NAME -like $user.networkid)
    
    	{
    	Write-Host "$($user.networkid) set for termination with 2 records"
    	}
    	else
    	{
    	}
    	}
    	
    }
    else
    {
    }
    }
    
  • #26327
    Profile photo of Dan Potter
    Dan Potter
    Participant

    $csv |group networkid |select name,count | ? {$_.count -gt 1}

  • #26333
    Profile photo of Dan Potter
    Dan Potter
    Participant

    or this?

    $csv |group networkid | ? {$_.count -gt 1} |select -ExpandProperty group

  • #26336
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    In this scenario you have above, my assumption is you would NOT terminate the user because there was an update, so take a look at something like this:

    #Get your CSV and remove any exact duplicates
    $csv = Import-CSV C:\Temp\test.csv | Select * -Unique
    
    #Create a collection of all update1s
    $terminations = $csv | Where{$_.Status -eq "terminate"}
    #Create a collection of updates
    $updates = $csv | Where{$_.Status -ne "terminate"}
    
    #Process your updates
    foreach ($update in $updates){
        "Processing {0} for {1}" -f $update.Status, $update.NetworkID
        # - CODE FOR UPDATE - #
        # Check if there are any other updates for the same network ID
        $termCheck = $terminations | Where{$_.NetworkID -eq $update.NetworkID -and $_.Status -eq "terminate"}
        if ($termCheck) {
            #There is a termination record, so..
            "Found {0} terminate request employee {1}, removing from terminations" -f @($termCheck).Count, $update.networkid
            #Update terminations collection and remove the network ID with a terminate status
            $terminations = $terminations | Where{$_.NetworkID -ne $update.NetworkID -and $_.Status -eq "terminate"}
        }
    }
    
    #Now process terminations...
    if ($terminations) {
        foreach ($term in $termination) {
            "Processing {0} for {1}" -f $term.Status, $term.NetworkID
            # - CODE FOR TERMS - #
        }
    }
    else {
        "No terminations to process"
    }
    

    Output:

    Processing update for e11111
    Processing update for e12345
    Found 1 terminate request employee e12345, removing from terminations
    No terminations to process
    
  • #26337
    Profile photo of David Schmidtberger
    David Schmidtberger
    Participant

    Thanks rob, that is similar to how i was thinking i might have to do it.

    what i was hoping for was a relatively simple true/false check on each id being duplicated as the existing script processes. (written by someone else years ago)
    due to changes in back-end data we have to perform this new check.

  • #26340
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    You could simplify it even more if you didn't want to log that someone was remove from the term pool, but I would assume it would be logged somewhere. Update the post as resolved if this works for you.

  • #26344
    Profile photo of Dan Potter
    Dan Potter
    Participant

    Then not duplicated and should be terminated.

    $csv |group networkid | ? {($_.count -eq 1) -and (($_.group).status -eq 'terminate')}

You must be logged in to reply to this topic.