Compare columns between 2 files and delete non common columns and data

Welcome Forums General PowerShell Q&A Compare columns between 2 files and delete non common columns and data

Viewing 5 reply threads
  • Author
    Posts
    • #246396
      Participant
      Topics: 1
      Replies: 1
      Points: 13
      Rank: Member

      Hello,

      I have a bunch of files in folder A and their corresponding metadata files in folder B. I want to loop though the data files and check if the columns are the same in the metadata file. If the columns in both match, no action to is to be taken. If Data file has more columns than metadata file, then those columns should be deleted from incoming data file.

      ex

      Data file is ps_job.dat

      “empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
      “1”|”Tom”|”10″|”11111″|”1000″|”M”
      “2”|”Ann”|”20″|”22222″|”2000″|”F”

      Meta data file is ps_job_metadata.dat

      “empid”|”name”|”zipcode”|”salary”

      I want my output to be

      “empid”|”name”|”zipcode”|”salary”

      “1”|”Tom”|”11111″|”1000″
      “2”|”Ann”|”22222″|”2000″

    • #246402
      Participant
      Topics: 5
      Replies: 2413
      Points: 6,215
      Helping Hand
      Rank: Community MVP

      k, welcome to Powershell.org. Please take a moment and read the very first post on top of the list of this forum: Read Me Before Posting! You’ll be Glad You Did!.

      When you post code, error messages, sample data or console output format it as code, please.
      In the “Text” view you can use the code tags “CODE“, in the “Visual” view you can use the format template “Preformatted“. You can go back edit your post and fix the formatting – you don’t have to create a new one.
      Thanks in advance.

      This forum is for scripting questions rather than script requests. We do not write customized and ready to use scripts or solutions on request.

      What have you tried so far? We expect you to make an own attempt to get your task done or to solve your problem. If you have done so already please document here what exactly you have done and show your code. Then we probably might be able to help you step further.

    • #246405
      Participant
      Topics: 5
      Replies: 2413
      Points: 6,215
      Helping Hand
      Rank: Community MVP

      When you crosspost the same question at the same time to different forums you should at least post links to the other forums along with your question to avoid people willing to you help making their work twice or more.

      https://stackoverflow.com/questions/63198313/compare-columns-between-2-files-and-delete-non-common-columns-using-powershell

      Thanks

    • #246444
      Participant
      Topics: 1
      Replies: 1
      Points: 13
      Rank: Member

      $outputFile = “C:\Users\Abdul\Desktop\Script_test\ps_job_mod.dat”

      $sample = Import-Csv -Path “C:\Users\Abdul\Desktop\Script_test\ps_job.dat” -Delimiter ‘|’

      $metadataLine = Get-Content -Path “C:\Users\Abdul\Desktop\Script_test\ps_job_metadata.txt” -First 1

      $desiredColumns = $metadataLine.Split(“|”).Replace(“`””,””)

      $sample | select $desiredColumns | Export-Csv $outputFile -Encoding UTF8 -NoTypeInformation -Delimiter ‘|’

    • #246459
      Participant
      Topics: 4
      Replies: 480
      Points: 1,753
      Helping Hand
      Rank: Community Hero

      Well at least he shared his solution from his cross posting. I was able to edit his formatting there. Oh well here is just another way in case someone needs it. I’m going to guess it’s much faster than the select-object method on very large datasets.

      Please note the phrase BACKTICKGOESHERE – replace it with the backtick character. I’m not sure how to escape it in this forum.

      $jobfile    = "ps_job.dat"
      $metafile   = "ps_job_metdata.dat"
      $outputfile = "some_file.csv"
      
      $meta = ((Get-Content $metafile -First 1 -Encoding UTF8) -split '\|').Trim('"')
      
      Class ColumnSelector : System.Collections.Specialized.OrderedDictionary {
      
          Select($line,$meta)
          {
              $meta | foreach{$this.add("$_",(iex "BACKTICKGOESHERE$line.$_"))}
          }
          ColumnSelector($line,$meta)
          {
              $this.select($line,$meta)
          }
      }
      
      import-csv $jobfile -Delimiter '|' | 
          foreach{[pscustomobject]([columnselector]::new($_,$meta))} | 
              Export-CSV $outputfile -Encoding UTF8 -NoTypeInformation -Delimiter '|'
      

      Output

      Get-Content $outputfile
      "empid"|"name"|"zipcode"|"salary"
      "1"|"Tom"|"11111"|"1000"
      "2"|"Ann"|"22222"|"2000"

      There has to be a simpler solution.

    • #246780
      Participant
      Topics: 0
      Replies: 8
      Points: 39
      Rank: Member

      Provided you want to keep those curly quotes and your code page supports all the characters, you can do the following:

      # Create array of properties delimited by |
      $headers = (Get-Content .\ps_job_metadata.dat -Encoding UTF8) -split '\|'
      Import-Csv ps_job.dat -Delimiter '|' -Encoding utf8 | Select-Object $headers

       

Viewing 5 reply threads
  • You must be logged in to reply to this topic.