Importing CSV and Removing Data/Cells

Welcome Forums General PowerShell Q&A Importing CSV and Removing Data/Cells

This topic contains 15 replies, has 4 voices, and was last updated by

 
Participant
3 months ago.

  • Author
    Posts
  • #138714

    Participant
    Topics: 4
    Replies: 16
    Points: 30
    Rank: Member

    Hello!

    It's been a while since I've posted but here I am again looking for help!

    I have been tasked with somehow creating a way to import a CSV that has more than 2000 lines and scrubbing the data from it that isn't needed.

    The CSV Looks something like this:

    "Plugin","Plugin Name","Family","Severity","IP Address","Protocol","Port","Exploit?","Repository"

    "111111","Ping the remote host","Port scanners","Info","X.X.X.X","TCP","0","No","Individual Scan"

    Then has data that ends up coming in different strings such as:

    Object UUID :00000000000-0000-0000-000000000000
    UUID : xxx-xxxx-xxxx-xxxxx-xxxxxx version 1.0
    Description : Unknown RPC service
    Annotation : Impl friendly name
    Type : Local RPC service
    Named pipe : LRPC-1234567

    Then further down has webaddresses and hash numbers that start with 0x, and all of this data always shows up in the Plugin row.

    The script I have somewhat works, but I find that when I try to go in and remove data, or look at the CSV in notepad the data I thought was removed actually isn't, but doesn't show up if I open it in CSV.  I have tried using get-content, but it exports the CSV unformatted, and I've tried using other comparisons such as -notlike and -notmatching but they don't seem to work either.

     

    Imported/Exported Variables
    $CSVData = Import-Csv -Path "C:\Test.csv" 
    $MatchingExportFile = "C:\RemovedItems.csv"
    $NonMatchingExportFile = "C:\NewOutput.csv"
    
    $MatchingData = @()
    $NonMatchingData = @()
    
    #Strings to search and remove from excel - Possibility to Add More
    $RegexStrings = '*Object UUID*', '*UUID*', '*Description*','*Annotation*','*Type*','*senssvc*','*Named*','*Windows*','*0x*','*3A*','*navigation*','*Value*','*+ CGI*','*Methods*','*Argument*','*- https*','*static*','*Attached*'
    #Headers
    $headerSelection = "Plugin", "Plugin Name","Family","IP Address","Port","Exploit","DNS Name","NetBIOS Name","Plugin Text","Synopsis","Description","Solution","Check Type","Version"
    
    Foreach ($Row in $CSVData) {
    $MatchFound = $False
    Foreach ($TestString in $RegexStrings) {
    If ($Row.Plugin -like $TestString) {
    $MatchFound = $True
    Break
    }
    }
    If ($MatchFound) {$MatchingData += $Row}
    Else {$NonMatchingData += $Row}
    }
    
    If ($MatchingData.Count -gt 0) {
    $MatchingData | Export-Csv -Path $MatchingExportFile -Force -NoTypeInformation
    Write-Host "Matching Data exported to $MatchingExportFile"
    }
    Else {Write-Host "No matching data to export!"}
    If ($NonMatchingData.Count -gt 0) {
    $NonMatchingData | Select $headerSelection | Export-Csv -Path $NonMatchingExportFile -Force -NoTypeInformation
    Write-Host "Non-matching Data exported to $NonMatchingExportFile"
    }
    Else {Write-Host "No non-matching data to export!"}

    My goal is to import the CSV, delete the cells with the data that is not needed, and delete other blank cells that show up, then export to new CSV.  But I cannot get it to successfully do either.

    Can somebody provide some input on what I may be doing wrong?

    Thank you

  • #138820

    Moderator
    Topics: 8
    Replies: 815
    Points: 2,415
    Helping Hand
    Rank: Community Hero

    You will definitely get help, but please post this in General QnA Forum. This forum is dedicated for Pester related queries.

  • #139185

    Participant
    Topics: 4
    Replies: 16
    Points: 30
    Rank: Member

    Thanks I have moved it to the correct forum!  Hopefully I can get some help now!

  • #139197

    Participant
    Topics: 1
    Replies: 1331
    Points: 1,680
    Helping Hand
    Rank: Community Hero

    I could imagine that it would be helpful to have some sample source data and some sample targe data... I mean more than 1 line and fomratted as code ... the actual data. 😉

  • #139215

    Participant
    Topics: 8
    Replies: 1190
    Points: 639
    Helping Hand
    Rank: Major Contributor

    Here is some pseudo code, but you can probably do something like this with a Regex OR and calculated expression

    $regex = 'Object UUID|UUID|Description|Annotation|Type|senssvc|Named|Windows|0x|3A|navigation|Value|CGI|Methods|Argument|https|static|Attached'
    
    $test = @()
    $test += [pscustomobject]@{
        Plugin = 'This Annotation a string with a UUID in it'
    }
    $test += [pscustomobject]@{
        Plugin = 'This Description is awesome'
    }
    $test += [pscustomobject]@{
        Plugin = 'Nothing to match here'
    }
    $test += [pscustomobject]@{
        Plugin = 'dsdgsWindowssgafgafg'
    }
    
    $results = $test | 
               Select Plugin, 
                      @{Name='Match';Expression={$_.Plugin -match $regex}}
    

    Output:

    PS C:\> $results
    
    Plugin                                     Match
    ------                                     -----
    This Annotation a string with a UUID in it  True
    This Description is awesome                 True
    Nothing to match here                      False
    dsdgsWindowssgafgafg                        True
    

    Then you can simply filter the results and Export to CSV or whatever else you want to do:

    $results | Where{$_.Match -eq $true} | Export-CSV -Path C:\mymatches.csv -NoTypeInformation
    
  • #139216

    Participant
    Topics: 4
    Replies: 16
    Points: 30
    Rank: Member

    Thanks Rob I will see if I can test it today!

    Olaf,

    I have uploaded a sample CSV here I just make generic data however this is essentially what it looks like, but about 2000+ lines.  The data that we are looking to remove is always in the Plugin field.

  • #139219

    Participant
    Topics: 1
    Replies: 1331
    Points: 1,680
    Helping Hand
    Rank: Community Hero

    Hmmm ... am I wrong or do you want to remove all lines where you have more than 3 empty cells next to each other? That should be quite forward.

    Get-Content -Path C:\sample\sample.csv | 
        Where-Object {$_ -notlike '*,,,,,,,*'}

    ... works for me ... at least with the sample data you provided.
    Now it should be easy to use the data just like any other CSV data.

  • #139224

    Participant
    Topics: 4
    Replies: 16
    Points: 30
    Rank: Member

    You aren't wrong about that.  I do want to delete the empty lines, but I'm also trying to remove the UUID down to the named pipe, as well as the url and the lines that have the random alphanumeric combinations which occur.

  • #139227

    Participant
    Topics: 1
    Replies: 1331
    Points: 1,680
    Helping Hand
    Rank: Community Hero

    Did you try the code I posted?

    Do the original data have the format the sample data have? ... like this:

    Plugin,Plugin Name,Family,Severity,IP Address,Protocol,Port,Expl,Repository,MAC Address,DNS Name,NetBIOS Name,Plugin Text,Discovered,Last Observed,Frameworks,Synopsis,Description,Solution
    12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,,,,Plugin Output: Output Goes Here,Jan-19,Jan-19,,Synopsis Goes Here,Description Here,n/a
    12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,,,,Plugin Output: Output Goes Here,Jan-19,Jan-19,,Synopsis Goes Here,Description Here,n/a
    12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,,,,Plugin Output: Output Goes Here,Jan-19,Jan-19,,Synopsis Goes Here,Description Here,n/a
    ,,,,,,,,,,,,,,,,,,
    : senssvc,,,,,,,,,,,,,,,,,,
    Object UUID : 00000000-0000-0000-0000-000000000000,,,,,,,,,,,,,,,,,,
    UUID : abcd-1234-abcd-1234,,,,,,,,,,,,,,,,,,
    Description : Words Usually Follow,,,,,,,,,,,,,,,,,,
    Annotation : Something goes here,,,,,,,,,,,,,,,,,,
    Named pipe : Words,,,,,,,,,,,,,,,,,,
    ,,,,,,,,,,,,,,,,,,
    12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,,,,Plugin Output: Output Goes Here,Jan-19,Jan-19,,Synopsis Goes Here,Description Here,n/a
    12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,,,,Plugin Output: Output Goes Here,Jan-19,Jan-19,,Synopsis Goes Here,Description Here,n/a
    ,,,,,,,,,,,,,,,,,,
    ,,,,,,,,,,,,,,,,,,
    0000xxx000023231231,,,,,,,,,,,,,,,,,,
    0000xxx000023231231,,,,,,,,,,,,,,,,,,
    0000xxx000023231231,,,,,,,,,,,,,,,,,,
    0000xxx000023231231,,,,,,,,,,,,,,,,,,
    0000xxx000023231231,,,,,,,,,,,,,,,,,,
    ,,,,,,,,,,,,,,,,,,
    12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,,,,Plugin Output: Output Goes Here,Jan-19,Jan-19,,Synopsis Goes Here,Description Here,n/a
    12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,,,,Plugin Output: Output Goes Here,Jan-19,Jan-19,,Synopsis Goes Here,Description Here,n/a
    ,,,,,,,,,,,,,,,,,,
    ,,,,,,,,,,,,,,,,,,
    http://www.worldwideweb.com,,,,,,,,,,,,,,,,,,
    http://www.worldwideweb.com,,,,,,,,,,,,,,,,,,
    http://www.worldwideweb.com,,,,,,,,,,,,,,,,,,
    http://www.worldwideweb.com,,,,,,,,,,,,,,,,,,
  • #139231

    Participant
    Topics: 1
    Replies: 1331
    Points: 1,680
    Helping Hand
    Rank: Community Hero

    OK, this one here should even work when the original data is not valid CSV.

    Get-Content -Path C:\sample\sample.csv | 
        Where-Object {$_ -match '([a-zA-z0-9\.\s]+,){9,}'}

    It looks for at least 9 cells with something in it.

  • #139654

    Participant
    Topics: 4
    Replies: 16
    Points: 30
    Rank: Member

    Olaf,

     

    I tried what you put, the extra data seems to be coming from the plugin text.  I am not sure why It is moving it into the plugin column though.  When I ran your code it looked like it deleted all the plugin numbers from the column.

  • #139674

    Participant
    Topics: 1
    Replies: 1331
    Points: 1,680
    Helping Hand
    Rank: Community Hero

    Keith,

    hmmm ... what does that mean? Did it solve your problem? If not – what exactly is your problem? You may post again some example data and some of the data in the format you desire.

  • #139773

    Participant
    Topics: 8
    Replies: 1190
    Points: 639
    Helping Hand
    Rank: Major Contributor

    If you are just trying to find the rows that have a no data vs "full" data row vs only plugin, then you can do that with basic filtering. Using the example Olaf was using, you could do this:

    $csv = Import-CSV -Path C:\Plugin.csv
    $csv.Count
    
    #Remove null plugins
    $csv2 = $csv | Where{$_.'Plugin'}
    $csv2.Count
    
    #Get all of the rows where there is more than just the Plugin
    $csv3 = $csv2 | Where{$_.'Plugin Name'}
    $csv3.Count
    
    #Get all of the rows where there is only Plugin value
    $csv4 = $csv2 | Where{!($_.'Plugin Name')}
    $csv4.Count
    

    Output:

    All rows count: 29
    Count after removing NULL: 22
    Count of full row: 7
    Count of rows with only a Plugin value: 15
    

    If you just want the full rows, you can just do it with a one-liner:

    $csv = Import-CSV -Path C:\Plugin.csv |
           Where{$_.'Plugin Name'}
    
  • #139891

    Participant
    Topics: 4
    Replies: 16
    Points: 30
    Rank: Member

    So I am trying to delete the cells in Row A, that don't have an actual plugin number.  I found yesterday that the data I am trying to delete is actually coming from the Plugin Text cells and appending to the Plugin cell (row A), due to meeting the 32k excel/csv character limit.  This is all coming from a Nessus Security Center report, which is a known issue with exporting these reports to a csv.

  • #139909

    Participant
    Topics: 1
    Replies: 1331
    Points: 1,680
    Helping Hand
    Rank: Community Hero

    Excuse me, Keith, I don't mean to offend you, but is it possible that our replies overwhelm you? If our code examples did not do what you expected – what was wrong?

  • #140233

    Participant
    Topics: 4
    Replies: 16
    Points: 30
    Rank: Member

    No offense taken Olaf.  I initially thought the code worked, but that was because I ran the code with only a few headers selected.  When I ran the code with all of the headers selected, the Plugin Text column is where the data is generating from and reaching the 32k CSV limit and therefore appending it to the Plugins section.  I believe this is just an issue with the Nessus client reports itself and may be able to be fixed by working through the client with a Tenable rep.  I am going to attempt to contact them however and see if they can help.  I appreciate your help

  • #140284

    Participant
    Topics: 0
    Replies: 1
    Points: 13
    Rank: Member

    Get the (good) entries as objects while sending them to a file

    
    $objects = Get-Content -Path .\Bad.csv |
    Where-Object {$_ -notlike '*,,,,,,,,,,,,,,,,,,*'} |
    Tee-Object -FilePath Good.csv |
    ConvertFrom-Csv
    
    

The topic ‘Importing CSV and Removing Data/Cells’ is closed to new replies.

denizli escort samsun escort muğla escort ataşehir escort kuşadası escort