Need to match column data across CSV's.

Welcome Forums General PowerShell Q&A Need to match column data across CSV's.

This topic contains 13 replies, has 3 voices, and was last updated by

 
Participant
1 week, 2 days ago.

  • Author
    Posts
  • #172871

    Participant
    Topics: 8
    Replies: 17
    Points: 127
    Rank: Participant

    I need to match data from CSV1 and pull matching data from CSV2 according to Value 2 in CSV1.

    I can easily set the values, but am unsure of the logic I'd use to find the Value2 of Value1 in CSV1 , and accordingly search CSV2 for a matching Value2 next to Value1, then pull Value4 and pipe it to CSV1 on that row.

     

     

    Example CSV:

    CSV1

    Value1   |   Value 2

    1                   2

    1                   3

    1                   2

     

    CSV2

    Value1 | Value 2 | Value 3

    1               2                4

    1               3                 5

    1               2                 4

     

    I need to move the matching Value3 from CSV2 to CSV1 for each row.   (Value 3 is updated constantly, so simply merging the CSV files is not an option.)

    Value 1 is a large number, again a variable so I need to match that with its corresponding Value2 in CSV1, find the matching value1 in CSV with a matching value2, then export value3 from that row in CSV2, and append to the row in CSV1.

     

    I'll first set the values in CSV1 – right? Then what? *Head scratch*

    import-csv $CSVPath |
    foreach-object {
    $Value1 = $_.Column1
    $Value2 = $_.Column2
    }

     

  • #172906

    Participant
    Topics: 1
    Replies: 1529
    Points: 2,587
    Helping Hand
    Rank: Community Hero

    Hmmm ... I don't know if others here understand better what you're trying to do but in both of your CSV files the first and the third row are identical. How do you tell the difference between the two? Shouldn't they have a unique attribute?

    And could you please try to post less white space? Thanks.

  • #172937

    Participant
    Topics: 9
    Replies: 423
    Points: 676
    Helping Hand
    Rank: Major Contributor

    Please provide a set of representative data with less abstraction (use actual column headers not value1,value2 with actual data pointing in the example to what the script needs to achieve)

  • #172942

    Participant
    Topics: 8
    Replies: 17
    Points: 127
    Rank: Participant

    Sorry, only double spacing is allowed in my editor for some reason. The first and third rows are in fact separate data sets. Value 1 will be a variable between 0 and lets say 999,999. Hence I'm not doing an "if -eq" cause that would be nuts. Value 3 will have maybe 3 potential values, 5 at most. So the data I posted above is in fact representative, and not abstract. For argument's sake, however, let's do this instead.

    Value 1 | Value 2

    976543        2

    976545        3

    976544        2

    976543         1

    CSV2:

    Value1 | Value2 | Value3

    976543      2              4

    976545      3               5

    976544      2               4

    976543      1                0

    976543      3                1

    976544      5                6

    I need to match value 1 in CSV 2, where Value 1 and Value 2 match, then export the Value 3 data for that row, and append to a new column in CSV1.

    CSV 2 contains a very specific number of Value 1 rows, 1-5 for each possible variant of value 2 for each value 1. So let's say 976543 has 4 entries. One of those has 2 as value 2. I need the value 3 on that row.   976545 has 3 possible entries. I need the value 3 from the row in CSV2 that contains a 3 as value2.

    Our value 3 is in fact dependent on value 2, but the data is broken up in such a way we must always match value 1 AND 2 to ensure we get the correct value 3. To simplify this, imagine value 1 is an item number, say , and we're matching upc, or barcode or something against other values. Does that make more sense?

     

     

     

  • #173014

    Participant
    Topics: 1
    Replies: 1529
    Points: 2,587
    Helping Hand
    Rank: Community Hero

    Sorry, only double spacing is allowed in my editor for some reason.

    You should have formatted it as code! 😉

    I'm still pretty unsure if I got what you need but if I got it right you could start with something like this:

    $CSVData01 = @'
    "Value1", "Value2"
    976543, 2
    976545, 3
    976544, 2
    976543, 1
    '@ | Convertfrom-Csv
    $CSVData02 = @'
    "Value1","Value2","Value3"
    976543,2,4
    976545,3,5
    976544,2,4
    976543,1,0
    976543,3,1
    976544,5,6
    '@ | Convertfrom-Csv
    
    foreach ($DataSet01 in $CSVData01) {
        $DataSet02 = $CSVData02 | Where-Object { $DataSet01.Value1 -eq $_.Value1 -and $DataSet01.Value2 -eq $_.Value2 }
        [PSCustomObject]@{
            Value1 = $DataSet01.Value1
            Value2 = $DataSet01.Value2
            Value3 = $DataSet02.Value3
        }
    }
  • #173176

    Participant
    Topics: 8
    Replies: 17
    Points: 127
    Rank: Participant

    Would this work instead? I am working with 40,000 and 500 rows, respectively.

    `$CSV1 =gc import-csv C:\path\to\CSV1\`
    `$array1=@()`
    
    `$csv2= import-csv C:\path\to\csv2\`
    `$array2=@()`
    
    `$CSV1 | foreach ($DataSet01 in $array1) {`
    `$DataSet02 = $array2 | Where-Object { $DataSet01.Value1 -eq $_.Value1 -and $DataSet01.Value2 -eq $_.Value2 }`
    `[PSCustomObject]@{`
    `Value1 = $DataSet01.Value1`
    `Value2 = $DataSet01.Value2`
    `Value3 = $DataSet02.Value3`
    `}`
    
    `} | out-file C:\path\to\final\csv.csv -notypeinformation -append`

    -Edit, no it does not work.
    When I DO get output, the new CSV contains (Note: I have modified "Value" columns to their actual names. No actual values are writing.)

    #TYPE System.Management.Automation.PSCustomObject		
    Value1	Value2	Value3
    
  • #173191

    Participant
    Topics: 1
    Replies: 1529
    Points: 2,587
    Helping Hand
    Rank: Community Hero

    Why do you enclose every single line in Mosquito poop? (`) 😉
    What should the code line #1 in your code do in your opinion?
    Do you debug your own code? (run it line by line in the ISE or in VSCode?
    Please explain for me whatfor you use the code line #2 and #5?
    In your code line #7 you pipe something to a foreach statement. That does not make sense at all.
    Please do yourself a favor make a step back and start with learning the very basics of Powershell from scratch. You save your self from a lot of wasted time and frustration.
    Did you run my code example? If you ran it – did it do what you expect it to do?

  • #173194

    Participant
    Topics: 8
    Replies: 17
    Points: 127
    Rank: Participant

    I'm not sure how to be more clear on this. I need to match two fields in one CSV against another CSV, pull a value from the matched row, and write that third value to another CSV. When I run this, I expect to be able to write each line out (Matched values with third value on the same line) , or to append it to another CSV (The third value only for each row.)

    I found an issue while debugging due to formatting on one of my CSV's. That was resolved. I am piping the array into a foreach as foreach statements are stored and processed as arrays anyways, so it should be able to handle it. Is that not the case? Correct me if wrong.

    Your code example is not useful,

      as it manually defines values that are not static

    , and were only being used as examples. The process you used, however, I can recognize, and was attempting to repurpose it to fit my needs. You seem to be misunderstanding the fact that I am not going to enter 40,000 values into my code daily to run this, it is reading them from a CSV. Your example neither took that into account nor demonstrated a clear example of it. HOWEVER, I did see you putting the objects into arrays, which is a good idea, therefore I ran with it.
    But thank you for the advice on going back to the basics. That's why I'm posting, and asking for help, after all. No?

    By the way, My code has mosquito poop due to me trying to figure out how to use pre tags. That's the format leftovers of switching from format->code to pre tags. Mosquito poop leftovers.

  • #173197

    Participant
    Topics: 8
    Replies: 17
    Points: 127
    Rank: Participant

    The gist is this – I am being as clear as possible.
    1. Export SQL Table to CSV. Search data for 2 items, match those against another CSV. Get value from matched line. Write matched line value(s) to another CSV. These are big CSV's. CSV's have roughly 50 columns each, and one has 40,000 plus rows.
    2.Avoid entering value data into script. Data is in flux, and will change daily. Avoid entering 40,000 lines of values into script daily.

    I found an issue while debugging due to formatting on one of my CSV's. That was resolved. I am piping the array into a foreach as foreach statements are stored and processed as arrays anyways, so it should be able to handle it. Is that not the case? Correct me if wrong.
    The process you used, however, I can recognize, and was attempting to repurpose it to fit my needs. You seem to be misunderstanding the fact that I am not going to enter 40,000 values into my code daily to run this, it is reading them from a CSV. Your example neither took that into account nor demonstrated a clear example of it. HOWEVER, I did see you putting the objects into arrays, which is a good idea, therefore I ran with it.
    But thank you for the advice on going back to the basics. That's why I'm posting, and asking for help, after all. No? 😉

    By the way, My code has mosquito poop due to me trying to figure out how to use pre tags. That's the format leftovers of switching from format->code to pre tags. Mosquito poop leftovers! 🙂

    Also, gc is a garbage collection command. It's cleaning up before executing this portion of my script. I'm running this as a single job in a much larger process, so I take care to clean up after each section/at the beginning of each section. That's a placeholder, and should not have been in my response example. It's not "Get-Content" as you may be assuming, and was not run as such.

  • #173200

    Participant
    Topics: 1
    Replies: 1529
    Points: 2,587
    Helping Hand
    Rank: Community Hero

    There is a distinct difference between a foreach statement and the Foreach-Object cmdlet. You cannot pipe something to a foreach statement (what you did in your code).

    Do your CSV files have 40,000 columns or rows? I understood that one CSV file has 2 columns and the second one has 3. If yes – do they have headers as well? Could you post these header names without revealing any secrets?

  • #173203

    Participant
    Topics: 8
    Replies: 17
    Points: 127
    Rank: Participant

    1. Gotcha. Should I use a foreach-object on my array, then?

    2. The first CSV has exactly 50 columns, and 44,780 rows. This will vary (Row count, not column count.) Headers are present, yes. We are using headers on column B and column K as value1 and value2 respectively. These headers are reading correctly in $array1 when called, as are the other 48 headers.
    Second CSV has ~17 rows, and only 837 columns. We are matching $CSV1.B and $CSV1.K against $CSV2.B and $CSV2.E , respectively. Once matched, we look on the line of the match in $CSV2 to obtain $CSV2.K < —this is what we want. We then want to write $CSV2.K to another place, preferably append it to a new column in a CSV with the original $CSV1 content. -FYI , I have split the multi-sheet CSV2 into a set of files, and am adding them to an array together, as you can see on line 3./Edit no, you cannot see that as the wrong code was pasted. LOL. Here it is.

    $CSV1 = import-csv C:\Users\blah\SQLExport.csv
    $array1=@()
    
    $csv2= Import-Csv -Path  (Get-ChildItem -Path C:\Users\blah\groupedCSVs\ -Filter '*.csv').FullName
    $array2=@()
    
     #Change to foreach-object?    $csv1 | foreach {
        $DataSet02 = $csv2 | Where-Object { $CSV1.ItemNumber -eq $_.ItemNumber -and $CSV1.SoldQty -eq $_.Sold }
        [PSCustomObject]@{
            Value1 = $CSV1.ItemNumber
            Value2 = $CSV1.SoldQty
            Value3 = $CSV2.TargetData
        }
    }
    
    $Csv1 | export-csv C:\Users\blah.csv
    
    
    #Manual runs for testing -ignore please.
    Clear-Variable csv1
    Clear-Variable csv2
    $array1.clear
    $array2.clear
    clear-variable array1
    clear-variable array2   
    
    
    I'm pretty sure my issue is with matching for CSV2.K , and then writing it out where I want it. 
     $csv1 | export-CSV

    is a placeholder, and is not being used. I am piping the end of the foreach statement to export-CSV, and it is writing out a psautomation object, as well as value1, value2, value3. This is improved from where we were, as it now looks to be iterating, which is good, but I'm unsure of why it's writing out the object itself instead of the value. EX:
    #TYPE System.Management.Automation.PSCustomObject
    Value1 Value2 Value3
    System.Object[] System.Object[] System.Object[]
    System.Object[] System.Object[] System.Object[]
    System.Object[] System.Object[] System.Object[]

  • #173233

    Participant
    Topics: 1
    Replies: 1529
    Points: 2,587
    Helping Hand
    Rank: Community Hero

    1. Gotcha. Should I use a foreach-object on my array, then?

    2. The first CSV has exactly 50 columns, and 44,780 rows. This will vary (Row count, not column count.) Headers are present, yes. We are using headers on column B and column K as value1 and value2 respectively. ............

    Where should we know all this??? Read your first post again and compare it to your last one – that's a complete different task. :-/
    You could try a calculated property like this ...

    $CSV1 = Import-Csv -Path 'C:\Users\blah\CSVFile1.csv'
    $CSV2 = Import-Csv -Path 'C:\Users\blah\CSVFile2.csv'
    
    $CSV1 |
        Select-Object -Property *,
            {
                Name = 'AdditionalColumn';
                Expression = {
                    $CurrentRow = $_
                    ($CSV2 | 
                        Where-Object {
                            $CurrentRow.ColumnB -eq $_.ColumnB -and
                            $CurrentRow.ColumnK -eq $_.ColumnE
                        }).ColumnK
                }
            }

    In your last post ...
    Lines #2 and #5 are completely unnecessary ... drop them! ... and do not do this anymore in the future. 😉
    In line #4 you use a query to provide the CSV file for the Import-CSV. This query might return more than one CSV file. Are you aware of this?

  • #173245

    Participant
    Topics: 8
    Replies: 17
    Points: 127
    Rank: Participant

    1. Remove the Arrays? Alright, I can use the variables only. For anyone who wants to know why this is, I actually looked into this, and by default import-csv inputs the data into an array, so there is no need. (This is better to know.)

    2. Yes, I am aware it will return -ALL- CSV files in the directory. This is intended. My static data (CSV2) is split into multiple CSV's. I have also verified it returns all data, as intended. This was done during my development of the script as searching across multiple sheets was proving annoying.

    My first post contains my goal. "Value 1 is a large number, again a variable so I need to match that with its corresponding Value2 in CSV1, find the matching value1 in CSV(2) with a matching value2, then export value3 from that row in CSV2, and append to the row in CSV1."     I am sorry if it was difficult to understand, I am not good with CSV functions in powershell.

  • #173248

    Participant
    Topics: 8
    Replies: 17
    Points: 127
    Rank: Participant

    No need to re-invent the wheel.
    On the lighter side of things, the code it improves on is hilariously notated. Enjoy reading the Github repository!
    https://www.powershellgallery.com/packages/Join-Object/2.0.1

You must be logged in to reply to this topic.