Split/replace between lowercase next to uppercase

Welcome Forums General PowerShell Q&A Split/replace between lowercase next to uppercase

This topic contains 4 replies, has 3 voices, and was last updated by

 
Participant
3 months, 4 weeks ago.

  • Author
    Posts
  • #104806

    Participant
    Points: 0
    Rank: Member

    I have a text block that I need to format. If I copy it into microsoft word and enable paragraph markers and formatting symbols I can see non breaking spaces and standard spaces. I need to replace the non breaking spaces with commas and the

    Adventurer Amateur Mage Animal Friend AssassinBerserker

    So far I have tried:

    $test =  Get-Content C:\temp\Test.txt
    $test =  $test.Replace([char]0xA0,',')
    $test

    What am i missing?

  • #104816

    Participant
    Points: 175
    Helping Hand
    Rank: Participant

    Hmm. Hard to say for sure, but perhaps regex might have an easier time parsing the character.

    $nbsp = [char]0xa0
    $Replaced = $Test -replace $nbsp,','

    The other thing you can do is read in the file and then check what the exact character codes are with Format-Hex, so you know how PowerShell is seeing it:

    $File = Get-Content 'C:\Temp\Test.txt'
    $File | Format-Hex
    • #104840

      Participant
      Points: 0
      Rank: Member

      Am I correct in thinking that a plain text file cannot store a nbsp ? I cannot get this to work. Should I use a different file type or encoding for the source information.

    • #104843

      Participant
      Points: 175
      Helping Hand
      Rank: Participant

      Whether or not it's plain text doesn't really matter; it's the encoding on that file that matters. As far as I'm aware, a nbsp is just a specific ASCII code, so it shouldn't be tricky to store it. You may want to save the txt file in ASCII encoding and try reading it back in that way to see if there's a difference for you.

    • #104875

      Participant
      Points: 48
      Rank: Member

      On mobile right now so can't test but I don't think regex will understand the [char] object you've assigned. In regex for unicode values you should be able to use 'u/00A0' to match Unicode characters. With the way that you've assigned your variable I don't think it'd work as you'd likely just get the literal character in your variable.

      To split lowercase next to uppercase you should be able to do this:

      "TextValue" -split "[a-z][A-Z]"
      

The topic ‘Split/replace between lowercase next to uppercase’ is closed to new replies.