String split on CRLF produces extra member

This topic contains 0 replies, has 1 voice, and was last updated by  Forums Archives 5 years, 8 months ago.

  • Author
    Posts
  • #5028

    by stocksp at 2012-08-21 10:46:00

    I have a string consisting of

    elements separated by CRLF.
    Stored in $tmp.

    if I just do
    $lines = $tmp .split("`r`n")

    $lines contains an extra blank line between each

    $lines[0] is correct. $lines[1] is a blank line. through the whole file

    To get what I want I'm using
    $lines = $tmp -replace("`r`n", "|")
    $lines = $lines.split("|")

    Which I know is ugly...
    How can I get a 'clean' array (no blank lines) without the -replace 'hack'

    by willsteele at 2012-08-21 11:02:44

    Due to the way the CR LF tags are handled it can be a little challenging. I have fought this before. An alternative may be to look at spliting on a binary character instead of an escaped character. CR is 0x0D and LF is 0x0A. Perhaps splitting on one or the other of those instead of both could help.

    $lines = $tmp -split 0x0D

    Without know your exact data I am guessing a bit, but, this will probably work.

    by DonJ at 2012-08-21 11:15:05

    I'm curious, how id you read in the string to begin with? I ask because Get-Content, when reading a text file, will normally handle this for you, putting each line into a unique object. Did you maybe query this from a Web server or something?

    by stocksp at 2012-08-21 11:28:42

    My data is very simple it looks like this in Notepad+

    1

    15

    14

    If I 'show symbols'. The editor shows a 'CR' and 'LF' at the end of each line. Very standard stuff.

    I tried
    $lines = $tmp -split 0x0D, 0x0A

    and it 'almost' works ... a couple of the lines are mangled (missing

    's)

    I assume I'm not passing both character correctly to -split.

    by stocksp at 2012-08-21 11:36:28

    Donj
    The data I'm working with is really nasty HTML that a program is spitting out (its an image of a print file). I need it as single string for removing large chunks of garbage. Once I've stripped it down to the area I'm after, then I can break it up into lines.

    by poshoholic at 2012-08-21 11:38:50

    This issue is easy to resolve once you understand what is happening behind the scenes.

    When you use the System.String Split method and you pass it "`r`n", you're calling the Char[] overload of this method. That method allows you to pass in an array of characters, and it will split the string on any character it finds in that array. By passing in "`r`n", it will split on "`r" and it will also split on "`n". That is why you end up with extra newlines. To fix this you need to do one of the following:

    Option A: Force it to split on entire strings, not characters.

    [script=powershell]$lines = $tmp.Split([string[]]"`r`n",'None')[/script]

    Option B: Use the regex -split operator instead.

    [script=powershell]$lines = $tmp -split "`r`n"[/script]

    I prefer option B, and personally I use a slightly modified version of it like this:

    [script=powershell]$lines = $tmp -split "`r`n|`r|`n"[/script]

    This version splits a string on the `r`n combo first, then it checks for `r by itself, and then `n by itself. I've dealt with strings with newline characters coming from enough sources to know that you don't always get `r`n as a pair of characters for newlines, so I prefer the robustness of that last technique to make sure I get the results I want no matter what the source is.

    by willsteele at 2012-08-21 11:45:49

    Ah, that second option was one I recall having seen Mjolinor use. Thanks for pointing that one out Kirk. Good approach.

You must be logged in to reply to this topic.