Marketing_East & Marketing_East2 are identical in all but name.
I'd like to find a script that would automate the search through each row of my CSV File to identify instances where there are "fuzzy" duplicate groups like the example.
I'm still getting the hang of Power shell scripting, to be honest, and my efforts so far have been directed more towards "getting" the information, than "processing" it.
I searched, but I'm having a tough time finding examples of searching through a CSV row by row for, what I would call "fuzzy duplicates" (groups or items that are spelled nearly alike, but differ only a little at either the beginning or end of the field)
Are there any example scripts out there or does anyone have an example they can share?
by DonJ at 2013-02-22 04:34:28
Nothing I've seen. This is a pretty tough task, because the shell doesn't have any native functionality to do this. You'll essentially have to make a collection of every group name, and then enumerate that and perform some kind of wildcard comparison. It might be easier to load them into a SQL Server database, since you could then take advantage of SQL-side comparisons like SOUNDEX(), which is explicitly a fuzzy-comparison. PowerShell doesn't have anything native that's quite like it.
by notarat at 2013-02-22 05:08:54
Thanks for the response(!) even though it was a confirmation of my fears...
I'm even less adept at SQL than Power Shell, lol. I guess I'll be buying some SQL Books this weekend, haha