Import-CSV data to SQL with Save-ReportData

This topic contains 5 replies, has 2 voices, and was last updated by Profile photo of Don Jones Don Jones 2 years, 2 months ago.

  • Author
    Posts
  • #19058
    Profile photo of SF
    SF
    Participant

    Hi

    I'm trying to take some disk stats/info from csv files with Import-Csv and then use the Save-ReportData from the SQLReporting module in "Creating Historical and Trend Reports with PowerShell and SQL Server" to store the data in a MSSQL database.

    I've attached a sample of how the csv file looks. And here is the code I have as a starting point just to see if I can get it to work:

    $Path = "M:\Scripts\Reports\mdisk.csv"
    
    $CSV = Import-Csv -Path $Path -Header "Collected", "MDisk", "ReadIOPS", "WriteIOPS", "ReadBlocks", "WriteBlocks",
                                          "ReadExternalResponseTime", "ReadQueuedResponseTime", 
                                          "WriteExternalResponseTime", "WriteQueuedResponseTime", 
                                          "PeakReadExternalResponseTime", "PeakReadQueuedResponseTime",
                                          "PeakWriteExternalResponseTime", "PeakWriteQueuedResponseTime"
    
                                          
    
    
    #$CSV = $CSV | select -Skip 1
    
    
     $properties = @{'Collected'=$CSV.Collected;
                     'MDisk'=$CSV.MDsik;
                     'ReadIOPS'=$CSV.ReadIops;
                     'WriteIOPS'=$CSV.WriteIOPS;
                     'ReadBlocks'=$CSV.ReadBlocks;
                     'WriteBlocks'=$CSV.WriteBlocks;
                     'ReadExternalResponseTime'=$CSV.ReadExternalResponseTime;
                     'ReadQueuedResponseTime'=$CSV.ReadQueuedResponseTime;
                     'WriteExternalResponseTime'=$CSV.WriteExternalResponseTime;
                     'WriteQueuedResponseTime'=$CSV.WriteQueuedResponseTime;
                     'PeakReadExternalResponseTime'=$CSV.PeakReadExternalResponseTime;
                     'PeakReadQueuedResponseTime'=$CSV.PeakReadQueuedResponseTime;
                     'PeakWriteExternalResponseTime'=$CSV.PeakWriteExternalResponseTime;
                     'PeakWriteQueuedResponseTime'=$CSV.PeakWriteQueuedResponseTime}
    
        $obj = New-Object -TypeName PSObject -Property $properties
        $obj.PSObject.TypeNames.Insert(0,'Report.MDiskInfo')
     
            Write-Output $obj
    

    This is the error message I'm getting at this point:

    "Exception calling "ExecuteNonQuery" with "0" argument(s): "String or binary data would be truncated.
    The statement has been terminated."
    At C:\Users\xyz\Documents\WindowsPowerShell\Modules\SQLReporting\SQLReporting.psm1:64 char:13
    + $cmd.ExecuteNonQuery()
    + ~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : SqlException"

    I've tried to manually edit the headers in the csv file to the names I have in the code above, I have tried to only have one column in the csv file was well but I get the same error. So I'm obviously missing and doing this wrong. I've tried with full SQL 2012 SP2 server and a local SQL Express.

    I've run the Collect-DiskSpaceInfo.ps1 that's included in the ebook, and that works well. And from the verbose output I think that at least a part of the problem is that the it's trying to put for example all of the timestamp/collected values into the same value/row. I'm not sure what to try or how to proceed with this, any help, tips or guidance is appreciated.

  • #19060
    Profile photo of Don Jones
    Don Jones
    Keymaster

    You've tried a lot of stuff, but I don't think any of those things were possible culprits :).

    Usually, that error from SQL Server means a query tried to insert data that was too large for the table column – meaning data would have been truncated, or lost. So it refuses to do it. In other words, you're trying to put five pounds of apples into a one-pound bucket, and SQL Server ain't having it.

    When the SQLReporting module creates a table, it makes some guesses about how big the data is. For a lot of situations, it guesses right – for others, it guesses wrong, and sets up a column that's too small. For example, numeric types are always created as integers – so if you're trying to put in any decimal data, it'll fail with this error. Similarly, I think it creates strings to be a ams of 255 characters, so anything longer will fail.

    Use something like SQL Server Management Studio to look at the MDiskInfo table. Expand the Columns container, and you'll see the data types for each column. You may need to tweak some of them to hold whatever data you're putting in there. Without seeing the exact data you're generating with your script, I can't be certain, but this is the most likely cause.

    I'll also note that your code has a typo – "MDsik" instead of "MDisk" in one spot.

    And, you could probably simplify this a lot.

    Import-CSV whatever.csv | ForEach { $_.psobject.typenames.insert(0,'Report.MDiskInfo') } | Save-ReportData -Local MyDBName
    

    Something along those lines. The object coming from Import-CSV should be perfectly useful, it just needs the type name changed. Assuming, of course, you take care of the column data type and size problem first.

    I'll offer an observation, too: I'm not sure that storing disk response times has any useful purpose. Those response times are only valuable when you look at a set of them taken over a short period of time. Simply grabbing a single point in time doesn't really tell you anything useful. And, if the plan is to repeatedly query those counters in a short period of time, and log all that to SQL Server, you're probably going to impose a lot more overhead than you realize. Those disk counters in particular aren't low-impact.

  • #19064
    Profile photo of SF
    SF
    Participant

    Thanks for the explanation and feedback Don, really appreciated.

    Just a little background info:

    The disk counters are coming from our SAN, I've implemented a little tool called svcmon https://www.ibm.com/developerworks/community/blogs/svcmon/?lang=en

    The tool collects the disk counters from the SAN and puts it into a postgreSQL database, the only way (I know of) to get the data out is to run a perl script that is included in svcmon, you specify which date and for how long (in minutes) you want the disk counters/info for and it spits out .csv files with the counters for you.

    Running the "report command" and asking for counters for a week will generate a couple of csv files, where some are ~80-90mb, so this is probably not a good solution as you're saying but... well I wanted to just test it anyway 🙂 and now I'm just curious what it is I'm doing wrong.

    I've checked the table columns in management studio and they are all (nvarchar,(255), null). And if I understand this correctly; if none of the cells in the csv have more than 255 characters in it, shouldn't they "fit"? or does it need to be integers if the cells only hold numbers etc?
    If I'm not mistaken, if you look at the attached csv file, none of the cells have either decimals or are more than 255 characters.

    This is how it looks like at first when i run the command:

    VERBOSE: Connection string is Server=Sever1\SQLEXPRESS;Database=svcmon;Trusted_Connection=True;
    VERBOSE: Table name is MDiskInfo
    VERBOSE: SELECT COUNT(*) AS num FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo' AND LOWER(TABLE_NAME) = 'mdiskinfo'
    DEBUG: Tested for table
    DEBUG: CREATE TABLE dbo.[MDiskInfo] ([Collected] NVARCHAR(255),[MDisk] NVARCHAR(255),[PeakReadExternalResponseTime] NVARCHAR(255),[PeakReadQueuedResponseTime] NVARCHAR(255),[PeakWriteExternalResponseTime] NVARCHAR(
    255),[PeakWriteQueuedResponseTime] NVARCHAR(255),[ReadBlocks] NVARCHAR(255),[ReadExternalResponseTime] NVARCHAR(255),[ReadIOPS] NVARCHAR(255),[ReadQueuedResponseTime] NVARCHAR(255),[WriteBlocks] NVARCHAR(255),[Writ
    eExternalResponseTime] NVARCHAR(255),[WriteIOPS] NVARCHAR(255),[WriteQueuedResponseTime] NVARCHAR(255))
    DEBUG: CREATE NONCLUSTERED INDEX [idx_Collected] ON [MDiskInfo]([Collected])
    VERBOSE: INSERT INTO [MDiskInfo] ([Collected],[MDisk],[PeakReadExternalResponseTime],[PeakReadQueuedResponseTime],[PeakWriteExternalResponseTime],[PeakWriteQueuedResponseTime],[ReadBlocks],[ReadExternalResponseTime
    ],[ReadIOPS],[ReadQueuedResponseTime],[WriteBlocks],[WriteExternalResponseTime],[WriteIOPS],[WriteQueuedResponseTime]) VALUES('Timestamp 2014-09-22 09:09:07 2014-09-22 09:09:07 2014-09-22 09:09:07 2014-09-22 09:09:
    07 2014-09-22 09:09:07 2014-09-22 09:09:07 2014-09-22 09:09:07 

    this goes on for a while until I got the error I mentioned.

    It feels as if it's trying to put multiple timestamp values into the same field, I'm just guessing here and not sure if this what it should do/look like, but when comparing with Collect-DiskSpaceInfo it doesn't look right:

    VERBOSE: Connection string is Server=Server1\SQLEXPRESS;Database=PS;Trusted_Connection=True;
    VERBOSE: Table name is DiskSpaceInfo
    VERBOSE: SELECT COUNT(*) AS num FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo' AND LOWER(TABLE_NAME) = 'diskspaceinfo'
    DEBUG: Tested for table
    VERBOSE: INSERT INTO [DiskSpaceInfo] ([Collected],[ComputerName],[DeviceID],[FreeSpace],[Size]) VALUES('2014-09-24 16:34:41','localhost','C:','17036972032','85529194496')

    Here I see commas, and the values are inside single quotes, when I run my code, the first comma I see is after it has gone through all timestamps in the verbose output and before it's starting with the mdisks, also the comma seems to be the only thing inside the quotation marks, like this:

    09:04:07 2014-09-22 09:05:07 2014-09-22 09:06:07 2014-09-22 09:07:07 2014-09-22 09:08:07','MDisk mdisk0 mdisk1 mdisk10 mdisk11

    Again, not sure how it actually should look, but it seems wrong, and at the same time it seems as if it's something pretty stupid that I'm doing wrong.

  • #19065
    Profile photo of Don Jones
    Don Jones
    Keymaster

    Ah.

    Yeah, so, the CREATE TABLE command is correct. Given that, obviously, the CSV file delimited the numeric values in quotes, so Import-CSV spat them out as strings. My module saw strings, it made nvarchar() columns. Which means you won't be abel to do math (sum,avg,etc) as easily, which might have been the whole point.

    The INSERT query is wrong, though. I don't know WTF that timestamp is supposed to be, but it's not a timestamp. That's probably what's making it vomit. It's something in the way the CSV is built, I'm guessing, and thus confusing Import-CSV.

    I have to tell you, though – the whole point of that Reporting book and module was to get data into SQL Server so you could use SQL Server Reporting Services to build fun reports. SSRS can talk directly to PostgreSQL. There's not really a need for this intermediate step, unless you're just playing around.

    But that one-line excerpt isn't looking good. If that's what's in the actual CSV file, then it's definitely not going to work with my SQL module. Either the original CSV is malformed or... that's pretty much the only option.

    A CSV file should be...

    header,header,header
    data,data,data
    "data","data","data"
    

    If your CSV file doesn't look like that, then stop right there, because there's no point in even running Import-CSV against it. It won't work.

  • #19066
    Profile photo of SF
    SF
    Participant

    Oooh, thanks!

    I'll try to look at SSRS with PostgreSQL in this case. I actually started to look at tools that replicate/sync PostgreSQL databse with MSSQL to see if that could be an option 🙂

    Still love the ebook(s) and this module, will try to find something else to dump into a database and play with 🙂

    Thanks again.

  • #19068
    Profile photo of Don Jones
    Don Jones
    Keymaster

    And SSIS can do a scheduled data-copy from Postgre. Good luck.

You must be logged in to reply to this topic.