Author Posts

July 20, 2018 at 11:12 am

I have a text block that I need to format. If I copy it into microsoft word and enable paragraph markers and formatting symbols I can see non breaking spaces and standard spaces. I need to replace the non breaking spaces with commas and the

Adventurer Amateur Mage Animal Friend AssassinBerserker

So far I have tried:

$test =  Get-Content C:\temp\Test.txt
$test =  $test.Replace([char]0xA0,',')
$test

What am i missing?

July 20, 2018 at 12:44 pm

Hmm. Hard to say for sure, but perhaps regex might have an easier time parsing the character.

$nbsp = [char]0xa0
$Replaced = $Test -replace $nbsp,','

The other thing you can do is read in the file and then check what the exact character codes are with Format-Hex, so you know how PowerShell is seeing it:

$File = Get-Content 'C:\Temp\Test.txt'
$File | Format-Hex

July 20, 2018 at 3:45 pm

Am I correct in thinking that a plain text file cannot store a nbsp ? I cannot get this to work. Should I use a different file type or encoding for the source information.

July 20, 2018 at 4:05 pm

Whether or not it's plain text doesn't really matter; it's the encoding on that file that matters. As far as I'm aware, a nbsp is just a specific ASCII code, so it shouldn't be tricky to store it. You may want to save the txt file in ASCII encoding and try reading it back in that way to see if there's a difference for you.

July 21, 2018 at 4:33 am

On mobile right now so can't test but I don't think regex will understand the [char] object you've assigned. In regex for unicode values you should be able to use 'u/00A0' to match Unicode characters. With the way that you've assigned your variable I don't think it'd work as you'd likely just get the literal character in your variable.

To split lowercase next to uppercase you should be able to do this:

"TextValue" -split "[a-z][A-Z]"