by davehook at 2012-09-26 01:52:39
I have a spreadsheet which has duplicate entries in column a. Column a is eventually used in a SQL table but is primary key therefore i cant have any duplicates. I cant find an example script that shows how to remove the row where there is a duplicate entry in column a only. Can you help?
by DonJ at 2012-09-26 08:07:34
This isn't something I think PowerShell will do an excellent job at – I'm not sure any scripting language could. I think this is something you should probably consider coding within Excel itself. At least, that's the approach I'd try, first. Failing that, I'd probably do something in SQL Server Integration Services, since they're more specifically meant for this kind of thing.
The *easiest* approach, if you decided to go the PowerShell route, would be to get the spreadsheet sorted on that column. Then you'd open it up and go through each row, one at a time. For each row, you'd look to see if the next row had the same value (which, since you sorted them, it would if there was a duplicate). Then you'd remove that row.
Although, you know... another way to think about this: SQL Server simply won't let you import duplicate primary keys. So... just leave the duplicate entries in there. Let SQL Server spew error messages on those duplicates. It'll get over it.
by RichardSiddaway at 2012-09-27 04:04:08
If I was doing this I think I'd look at exporting the data to a csv. You may then be able to sort on the primary key column and remove duplicates.
Working inside Excel will be very slow.
How are you going to decide which duplicated row should be removed? is the data identical?
by davehook at 2012-10-01 08:52:36
Thanks Richard, Im working on your idea