Code Efficiency – CSV Trim and Replace Operations

Welcome Forums General PowerShell Q&A Code Efficiency – CSV Trim and Replace Operations

Viewing 2 reply threads
  • Author
    Posts
    • #209799
      Participant
      Topics: 2
      Replies: 2
      Points: 8
      Rank: Member

      Hi

      My code below finds all CSVs recursively in a directory, loads located CSVs, converts the CSVs into TSVs, trims first and end double quotes and then removes all double quotes between a tab space.

      My Question:

      Is there a way to make the code more efficient? I can’t seem to find a way to remove the ForEach-Object loop. Please help. Thanks. (For some reason the accent symbol does not work to display the code below)

      $content = "C:\somedir"
      
      $files = Get-ChildItem -Path $content -Recurse -Filter *.csv | 
      	Select-Object -Property *, @{
      		expression ={
      			Write-Host $_.FullName
      			$source = $_.FullName
      			$dest = $_.DirectoryName + "\" + $_.BaseName + ".tsv"
      			
      			Import-Csv -Path $source | 
      			convertto-csv -NoTypeInformation -Delimiter "`t" | 
      			% { 
      				ForEach-Object {
      					$_.Trim('"') |
      					% { $_ -replace (""t"" ,"t") }
      				}
      			} | Set-Content "$dest"
      		}
      	}
      
      • This topic was modified 2 months, 3 weeks ago by vertigo88858.
      • This topic was modified 2 months, 3 weeks ago by vertigo88858.
    • #209871
      Participant
      Topics: 4
      Replies: 82
      Points: 251
      Helping Hand
      Rank: Contributor

      Which ForEach-Object loop are you trying to eliminate?  I count 3 of them in your code (line 12, 13, 15).  % is an alias for foreach-object.  Without samples of your csv, I think you might be better working with the raw data that multiple conversions (import-csv then  convertto-csv).  I’m not sure if my example below is more efficient, but you could use measure-command to see for sure.  Either way I find yours a little hard to read with nested loops, but that is just a preference.

      $content = ".\documents"
      $files = Get-ChildItem -Path $content -Recurse -Filter *.csv
      
      foreach ($item in $files)
      {
         (Get-Content $item.fullname -Raw) -replace ',',"`t" -replace '"','' |
             Set-Content -Path "$($item.fullname).tsv"
      }
    • #209895
      Participant
      Topics: 12
      Replies: 1623
      Points: 2,565
      Helping Hand
      Rank: Community Hero

      Loops are traditionally avoided if possible, but here are a couple of options. This is the CSV to mess with:

      $csv = @"
      "EmployeeId","FirstName","LastName","Manager","Password","Username"
      "tst00001","Angela","Tester","rsimmers","<RANDOM>","atester"
      "tst00002","Angelo","Tester","rsimmers","<RANDOM>","atester"
      "tst00003","Bob","Tester","rsimmers","<RANDOM>","btester"
      "tst00004","Chris","Tester","rsimmers","<RANDOM>","ctester"
      "tst00005","Derek","Tester","rsimmers","<RANDOM>","dtester"
      "@
      

      This would parse using Import-CSV, so if there are “rogue” double qoutes in the content, this is parsing it as a CSV and then gluing it together with string builder. While this is using for loops, there is no Replace operations, it’s just parse and glue it together:

      $csv = Import-Csv -Path 'C:\Scripts\temp.csv'
      
      $myNewCSV = [System.Text.StringBuilder]::new()
      
      #Header Row
      foreach ($row in $csv) {
          [void]$myNewCSV.Append("{0}rn" -f ($row.PSObject.Properties.Name -join ';'))
          break
      }
      
      #Data Row
      foreach ($row in $csv) {
          [void]$myNewCSV.Append("{0}rn" -f ($row.PSObject.Properties.Value -join ';'))
      }
      
      $myNewCSV.ToString() #Set Content
      

      Output:

      EmployeeId;FirstName;LastName;Manager;Password;Username
      tst00001;Angela;Tester;rsimmers;<RANDOM>;atester
      tst00002;Angelo;Tester;rsimmers;<RANDOM>;atester
      tst00003;Bob;Tester;rsimmers;<RANDOM>;btester
      tst00004;Chris;Tester;rsimmers;<RANDOM>;ctester
      tst00005;Derek;Tester;rsimmers;<RANDOM>;dtester
      
      #Measure Command
      Days              : 0
      Hours             : 0
      Minutes           : 0
      Seconds           : 0
      Milliseconds      : 15
      Ticks             : 152401
      TotalDays         : 1.76390046296296E-07
      TotalHours        : 4.23336111111111E-06
      TotalMinutes      : 0.000254001666666667
      TotalSeconds      : 0.0152401
      TotalMilliseconds : 15.2401
      

      Another approach is using Get-Content -Raw, which should bring the content in as raw content and not parse lines. This is available Powershell 4+.

      $content = Get-Content -Path 'C:\Scripts\temp.csv' -Raw
      $content.Replace('"','')
      

      Output:

      PS C:\Users\rasim> $content.Count
      1
      
      PS C:\Users\rasim> $content = Get-Content -Path 'C:\Scripts\temp.csv' -Raw
      $content.Replace('"','')
      EmployeeId,FirstName,LastName,Manager,Password,Username
      tst00001,Angela,Tester,rsimmers,<RANDOM>,atester
      tst00002,Angelo,Tester,rsimmers,<RANDOM>,atester
      tst00003,Bob,Tester,rsimmers,<RANDOM>,btester
      tst00004,Chris,Tester,rsimmers,<RANDOM>,ctester
      tst00005,Derek,Tester,rsimmers,<RANDOM>,dtester
      
      
      PS C:\Users\rasim> Measure-Command {
      $content = Get-Content -Path 'C:\Scripts\temp.csv' -Raw
      $content.Replace('"','')
      }
      
      
      Days              : 0
      Hours             : 0
      Minutes           : 0
      Seconds           : 0
      Milliseconds      : 12
      Ticks             : 127013
      TotalDays         : 1.47005787037037E-07
      TotalHours        : 3.52813888888889E-06
      TotalMinutes      : 0.000211688333333333
      TotalSeconds      : 0.0127013
      TotalMilliseconds : 12.7013
      
Viewing 2 reply threads
  • You must be logged in to reply to this topic.