This topic contains 5 replies, has 4 voices, and was last updated by
September 6, 2018 at 8:40 pm #111107ParticipantPoints: 1Rank: Member
Was just playing with exporting directory listings to CSV and noticed a little strangeness. I hope someone can enlighten me why this happens?
Two commands which seem to work the same,
Get-ChildItem | Select-Object fullname,length | ConvertTo-Csv |out-file -FilePath dir-list.csv
Get-ChildItem | Select-Object fullname,length | Export-Csv -Path dir-list2.csv
The resulting files look the same in Notepad++ and when 'cat'ed. But when the file sizes are checked the first command always creates a file about twice the size of the second command. I have opened both files in a hex editor and the larger file shows NULL (hex code 00) characters separating every character which accounts for the size difference. Why is this happening?
September 6, 2018 at 8:56 pm #111136ParticipantPoints: 459Rank: Contributor
Ran a few tests myself, and I can say that it's not the CSV cmdlets causing the difference. The issue appears to be the Out-File cmdlet, and it's only present in Windows PowerShell 5.1, not PS Core (6.1.0 RC1). Unsure of prior versions, but it's likely that it's a long-standing bug that was fixed for PS Core at some point.
Instead, I'd suggest using the Set-Content or Add-Content cmdlet.
September 6, 2018 at 8:57 pm #111137ParticipantPoints: 1,184Rank: Community Hero
Use Notepad++ to check the encoding of the files. There you will see the difference. If you like to have it equally use this:
Get-ChildItem -exclude 'dir-list*.csv' | Select-Object fullname,length | ConvertTo-Csv -NoTypeInformation |out-file -FilePath dir-list.csv -Encoding utf8 Get-ChildItem -exclude 'dir-list*.csv' | Select-Object fullname,length | Export-Csv -Path dir-list2.csv -Encoding utf8 -NoTypeInformation
September 7, 2018 at 10:14 am #111170ParticipantPoints: 1Rank: Member
I found out what is happening but not the why. I went back to double check the files in Notepad++ as @olaf-soyk suggested but couldn't see any differences. I did notice that Notepad++ had decided that the files had difference encodings.
The smaller file was UTF-8
The larger file UCS-2 BE ROM
I haven't come across UCS-2 BE ROM encoding before but a quick websearch showed it to be a 16-bit encoding as opposed to the UTF-8 which is 8-bit. I suppose it should have been obvious when I saw the extra empty chars in the hex editor!
Using out-file with -encoding utf8 gives files of equivalent size. There is still some BOM characters at the beginning of the file though. Hope this helps someone.
September 7, 2018 at 11:09 am #111196ParticipantPoints: 1,184Rank: Community Hero
BTW: That does not affect the functionality of the files. It takes a little more space and you could save some more "exotic" charachters from the unicode table. But it will work the same as UTF8 encoded files in common environments. 😉
September 7, 2018 at 12:53 pm #111206ParticipantPoints: 749Rank: Major Contributor
Yep. I had a thread about this a little while ago. At first I thought it was unix text. PS 5's Out-File (or ">") encodes in what Notepad calls "unicode" and most other commands output in what Notepad calls "ansi". Some applications won't like it, like Infoblox (for csv import).
The topic ‘Convertto-CSV and out-file vs Export-Csv’ is closed to new replies.