edit many CSV files

This topic contains 6 replies, has 2 voices, and was last updated by Profile photo of Dave Nicholls Dave Nicholls 4 months, 2 weeks ago.

  • Author
    Posts
  • #47449
    Profile photo of Dave Nicholls
    Dave Nicholls
    Participant

    i need some guidance. i have hundreds of CSV files i need to work with–they have some bad data we have to manually clean up prior to an import–as we visually verify the bad data 🙁

    first, i wanted to get all of the bad data into one file so i wouldnt have to sort and edit hundreds of them. working based on the "date" column, i got everything with an empty date or a bad date of "00-00-0000" copied into a new file with this:

    $csvPath = "\\server\share\CSV Files"
    $outpath = "H:\Documents\Scripts\csv_admin_empty.csv"
    $colHeader = 'Date'
    
    Get-ChildItem $csvPath -Recurse -Include *.csv | ForEach-Object { import-csv $_.FullName | Where-Object {$_.$colHeader -eq '' -or $_.$colHeader -match "00-00-0000"}} | Export-Csv -Path $outpath -NoTypeInformation
    

    second, i want to now remove all rows with bad date info from the existing CSV files–that way we wont import bad data/duplicates, and i can update the CSV file i just made with good data for import.

    but...i am still learning powershell and cant get what i want. i have tried a few iterations of this but i get a blank file, or i get my file "updated" where every empty column looks like: ,"", instead of just ,, /facepalm

    so im looking for some guidance on how im processing this wrong. ive done some reading and found some examples but im not understanding something still. i dont think the -ne and -notmatch in this are matching correctly–if i import a single file with that pattern, then export it to a new file, it hasnt ignored the "" or "00-00-0000" rows like i want.

    $csvPath = "H:\Documents\Scripts\csv"
    $outpath = "H:\Documents\Scripts"
    $colHeader = 'Date'
    
    Get-ChildItem $csvPath -Recurse -Include *.csv | ForEach-Object { import-csv $_.FullName | Where {$_.$colHeader -ne '' -or $_.$colHeader -notmatch "00-00-0000"} | Export-Csv $_.fullname -NoTypeInformation }
    

    simplified sample CSV data

    site,id,Date,Form Id,Doc ID,DOB,Pat Loc,FileType,apptla
    DPL,,1/26/2006, Records,100052,,,5,CRMI
    DPL,,, Records,100052,,,5,CRMI
    DPL,,1/27/2006, Records,100006,,,5,CRMI
    DPL,,00-00-0000, Records,100028,,,5,CRMI
    DPL,,00-00-0000, Records,100028,,,5,CRMI

    any help would be appreciated.
    thanks
    -dave

  • #47453
    Profile photo of Craig Duff
    Craig Duff
    Participant

    Powershell will quote fields with the csv cmdlets; even empty fields. If your application doesn't like the quotes, then the only way around that is to write your own Export-Csv or post process the data to remove quotes.

    Provided you're confident there are no double quotes that should be in the file:

    $colHeader = 'Date'
    
    $data = @"
    site,id,Date,Form Id,Doc ID,DOB,Pat Loc,FileType,apptla
    DPL,,1/26/2006, Records,100052,,,5,CRMI
    DPL,,, Records,100052,,,5,CRMI
    DPL,,1/27/2006, Records,100006,,,5,CRMI
    DPL,,00-00-0000, Records,100028,,,5,CRMI
    DPL,,00-00-0000, Records,100028,,,5,CRMI
    "@
    
    $data | 
    ConvertFrom-Csv | #Up to this point simulate Import-Csv
    Where {
        $_.$colHeader -ne '' -or 
        $_.$colHeader -notmatch "00-00-0000"
    } |
    ConvertTo-Csv -NoTypeInformation |
    ForEach-Object {
        $_.Replace('"','')
    } # | pipe to a cmdlet like Set-Content or Out-File

    Also you can't set-content to a file that is open in the pipeline, so you either have to use temporary files or import the data first then loop over that apart from the pipeline. That way the file handle gets closed for editing.

    ForEach($file in (Get-ChildItem #...#) {
    $data = $file | Import-Csv
    
    $data | #where-object etc... | set-content
    }
    • This reply was modified 4 months, 3 weeks ago by Profile photo of Craig Duff Craig Duff.
    • This reply was modified 4 months, 3 weeks ago by Profile photo of Craig Duff Craig Duff.
    • #47506
      Profile photo of Dave Nicholls
      Dave Nicholls
      Participant

      thank you, that is very helpful!

      i decided i can just output the files in another folder with the same name– it makes sense that it cant work on them as they are open.

      in a test group [4 files] takes a moment, so i think its processing them all with:

      $csvPath = "H:\Documents\Scripts\csv"
      $colHeader = 'Date'
      
      ForEach($file in (Get-ChildItem $csvPath -Recurse -Include *.csv)) 
      {
      $name = (gi $file).name
      $outpath = "H:\Documents\Scripts\csv\updated\$name"
      $data = Import-Csv $file}
      $data | Where-object {
          $_.$colHeader -ne '' -or 
          $_.$colHeader -notmatch "00-00-0000"
      } | ConvertTo-Csv -NoTypeInformation |
      ForEach-Object {
          $_.Replace('"','')
      } | set-content $outpath
      

      i think that where-object bit is giving me a fit. i am only testing on 4 of the 800+ files. that block just outputs one of them. 3 of 4 files have criteria in 'date' that are blank or 00-00-0000 so something isnt right–id like to get them all back, and only have 3 of the come out different.

      the file that has "" date entries gets output–but the rows with "" date entries are still in the file

      files with 00-00-0000 date entries do not get output at all.

      re: "Powershell will quote fields with the csv cmdlets; even empty fields. "

      why doesnt the output of my OP command have quotes in it? is it just because it was basically copying rows and not doing anything else? not that important but i will keep in mind that it quotes everything in the future. thank you for pointing that out.

    • #47508
      Profile photo of Craig Duff
      Craig Duff
      Participant

      Try moving the closing brace } that is at the end of the Import-Csv line to after Set-Content. I think that the logic in the where-object ought to be -and instead of -or.

      • This reply was modified 4 months, 3 weeks ago by Profile photo of Craig Duff Craig Duff.
    • #47528
      Profile photo of Dave Nicholls
      Dave Nicholls
      Participant

      aha! i understand why moving the } gave me all the files–i had tried enclosing the output similarly before; i should have tried again 🙂

      going from -or to -and worked for the match. im not sure i understand why, since -or worked fine for the first round where i was copying the data out i assumed it would work this way as well. ill have to read better about the operators /assumptions, grr

      thanks so much!
      -dave

    • #47543
      Profile photo of Craig Duff
      Craig Duff
      Participant

      There are two comparisons each. Your first set was positive, -eq and -match. The second is negative, -ne and -notmatch. That is where it changes.

      Take an illustration trying to compare on your name. Assume $name is 'Dave'.

      $name -eq 'Dave' -or $name -eq 'John'
      # becomes
      $True -or $False
      # becomes
      $True

      So if your name is Dave or John it'll be true. Now reverse it for trying to find if the name isn't Dave or John, but try to use -or

      $name -ne 'Dave' -or $name -ne 'John'
      # becomes
      $False -or $True
      # because 'Dave' -ne 'Dave' is false
      # but 'Dave' -ne 'John' is True
      # so the whole statement is
      $True

      So in the case of multiple negative comparisons you need -and, like:

      $name -ne 'Dave' -and $name -ne 'John'
      #becomes
      $false -and $true
      # since both have to be $true for a result of $true
      # it becomes:
      $false
      • This reply was modified 4 months, 2 weeks ago by Profile photo of Craig Duff Craig Duff.
  • #47638
    Profile photo of Dave Nicholls
    Dave Nicholls
    Participant

    that is a good explanation, and something i am likely to forget. i'll definitely make a note of it.

    thanks for your help
    -dave

You must be logged in to reply to this topic.