Sorting Not working as expected

This topic contains 8 replies, has 4 voices, and was last updated by  Sankhadip Roy 1 week, 1 day ago.

  • Author
    Posts
  • #104128

    Sankhadip Roy
    Participant

    Hi Guyz,

    I am trying to sort object by its name, but I cannot manage to bring the output in proper structure. The sample is at the bottom.

    Following are the file names from a directory. I want to sort them sequentially based on the starting number of each file. But unfortunately, it's not happening. Any idea what to do?

    I tried with all the switches of sort-object, but no luck. And have to filter on name only. These files are generated at random time. I can not rely on the creation time property. That is why the file names are created in this manner.

    Name
    —-
    1 incident.mp4
    10 Action.mp4
    11 Isolation.MP4
    12 Interogation.mp4
    13 Decision Making.MP4
    14 Final result.mp4
    2 report.mp4
    3 filing.mp4
    4 investigation.mp4
    5 report collection.mp4
    6 examine.mp4
    7 report.mp4
    8 Rep Analysis.mp4
    9 case study.mp4

    Any steps that I am missing?

    Thanks,
    Roy.

  • #104134

    Olaf Soyk
    Participant

    The numbers in front of your file names are actually not numbers they are strings. If you like them to sort correctly you will have to change the names to use leading zeros or you have to convert them to actual numbers with 'cutting' the numbers and cast them to [INT] before sorting.

    • #104135

      Sankhadip Roy
      Participant

      Hi Olaf,

      I agree with you about the string. Now, the doubt is how GUI reflecting those files properly? Also, the way you mentioned, can you please elaborate the concept. I can substring the numbers from those string, after that, how to arrange, what to do? Just explain your concept a little bit more.

      Thanks, n Regards,
      Roy.

  • #104156

    Olaf Soyk
    Participant

    Now, the doubt is how GUI reflecting those files properly?

    Why don't simply try it?

    You can iterate over your files and "extract" the digits with a regex for example. Then you fomrat the numbers smaller than 10 with leading zeros and the sorting will be correct even in the GUI.
    BTW: the sorting in the GUI depends on the Windows version and the according setting. Here you read a little more about it: Numerical File Name Sorting vs. Classic Literal Sorting.

  • #104161

    postanote
    Participant

    You or whomever is creating these files, need to rethink the file naming scheme and pad leading zeros to them when they are created vs forcing you to deal with this after. It just leads to a bunch of unnecessary string gymnastics. Just as coding / scripting has standards that should be followed, file naming is part of this as well.

    The sorting you are seeing, is not a PS issue. If you'd open this same list in MS Excel, you'd get the same thing. Sorting is always by character representation. 1 and 01, etc., are different of course. So, to sort, the leading part must have the same number of characters. In your case. 001..014, for example or whatever the max number span might be.

    So, either these files as they are created are already properly formed or you are going to have to do a far more cumbersome effort, inline code (lead zero padding using padleft for the string, or using the custom format switch while convert the number string to an integer ), or renaming the files on disk to get what you are after.

    Also what you posted, has leading spaces. If that is the case for real on the file system, vs a bad copy / paste here, you'll have to deal with that as well on that string using trim(). Again, common naming taxonomy would make this moot.

    IMHO, I'd just get the owner to rename the files on creation properly or get permission to rename them if they chose not to. It will make things far easier on you to use normal filesystem cmdlets to act on the files.

    As far as the string gymnastics, this is the sort of stuff you are getting yourself into if you do not address this at its root cause.

    Assuming from your posted code the way you are getting this if not, it also leading spaces So, you'll have to deal with that as well.

    # Trim leading spaces
    Clear-Host
    ((Get-Content -Path '.\MusicData.txt').Trim()) | Sort-Object
    
    # Results
    
    1 incident.mp4
    10 Action.mp4
    11 Isolation.MP4
    12 Interogation.mp4
    13 Decision Making.MP4
    14 Final result.mp4
    2 report.mp4
    3 filing.mp4
    4 investigation.mp4
    5 report collection.mp4
    6 examine.mp4
    7 report.mp4
    8 Rep Analysis.mp4
    9 case study.mp4
    

    Then you can use .padLeft or custom formatting to add leading zero, before you later sort.

    # Split on the first space and pad the number line
    Clear-Host
    ((Get-Content -Path '.\MusicData.txt').Trim() -split " ",2) | 
    %{If ($_ -match '^\d'){"{0:D4}" -f [int]$_}} | Sort-Object
    
    # Results
    
    0001
    0002
    0003
    0004
    0005
    0006
    0007
    0008
    0009
    0010
    0011
    0012
    0013
    0014
    

    Now, the string gymnastics, which can be avoided if a standard naming construct is used, or you just rename them before doing filesystem stuff with them.

    # Putting it all together
    #    Collect the files from disk. Note I am using a filename list since of course I don't have these and no real reason to create them
    #    Trim any leading spaces
    #    Parse the numbers in the string and pad with 4 zeros (using custom formatting vs padleft) so, this supports up to 0 - 9999 files
    #    Parse the string again, the keep the remaining text on the same line.
    Clear-Host
    ($MediaFiles = ForEach($Line in ((Get-Content -Path '.\MusicData.txt').Trim()))
    {$Line -replace '^\d*',("{0:D4}" -f ("{0:D4}" -f ([int]($Line -split ' ',2)[0])))}) | Sort-Object
    
    # Results 
    
    0001 incident.mp4
    0002 report.mp4
    0003 filing.mp4
    0004 investigation.mp4
    0005 report collection.mp4
    0006 examine.mp4
    0007 report.mp4
    0008 Rep Analysis.mp4
    0009 case study.mp4
    0010 Action.mp4
    0011 Isolation.MP4
    0012 Interogation.mp4
    0013 Decision Making.MP4
    0014 Final result.mp4
    

    There may be far more elegant ways in dealing with this. Yet, my original opinion, still is prudent. Save yourself the unnecessary headaches as you are experiencing now based on this thread thus far.

    Anyway HTH

    • #104213

      Sankhadip Roy
      Participant

      Thanks, Postanote, It really helps me.
      The thing is, these files are placed in a folder and there are multiple folders having the same type of files with different scenarios. Everything was good, now it's a request to put all those files from those different folders into a single folder with some predefined prefix.

      So I tried with PowerShell to achieve the same. If Files are more than 9 in a folder, then it's creating the problems.

      Two thing is not clear, which is highlighted in bold. Why you use the formatting lines two times? And what is the meaning of the comma used after the Regular expression?
      {$Line -replace '^\d*',("{0:D4}" -f ("{0:D4}" -f ([int]($Line -split ' ',2)[0])))}) | Sort-Object

      Thanks again for your contribution.

      Regards,
      Roy.

  • #104164

    Christian Sandfeld
    Participant

    There is another way where you do not have to change the file names, by using an expression for the sort.

    Sample Code:

    $files = @'
    Name
    1 incident.mp4
    10 Action.mp4
    11 Isolation.MP4
    12 Interogation.mp4
    13 Decision Making.MP4
    14 Final result.mp4
    2 report.mp4
    3 filing.mp4
    4 investigation.mp4
    5 report collection.mp4
    6 examine.mp4
    7 report.mp4
    8 Rep Analysis.mp4
    9 case study.mp4
    '@ | ConvertFrom-Csv
    
    $files | Sort-Object -Property @{ Expression = { [int]($_.Name -split ' ')[0] } }
    

    Result:

    Name                   
    ----                   
    1 incident.mp4         
    2 report.mp4           
    3 filing.mp4           
    4 investigation.mp4    
    5 report collection.mp4
    6 examine.mp4          
    7 report.mp4           
    8 Rep Analysis.mp4     
    9 case study.mp4       
    10 Action.mp4          
    11 Isolation.MP4       
    12 Interogation.mp4    
    13 Decision Making.MP4 
    14 Final result.mp4    
    
  • #104231

    postanote
    Participant

    @christian Sandfeld

    See, another elegant / simpler solution.
    However, @christian, @Sankhadip is pulling files directly from disk not a file as I used or the construct you are using.
    So, @Sankhadip would have to add the header 'Name' dynamically on the Get-ChildItem request, which you do not show in your sample.

    Doing this and the append Name to the variable content as the first entry, before the Get-ChildItem ...

    ($Files = (Get-ChildItem -Path D:\Temp).Name)
    

    ... doing this and dealing with the header.

    ($Files = Get-ChildItem -Path D:\Temp | Select Name)
    

    So, taking the former

    Clear-Host
    
    $MediaFiles = $null
    $MediaFiles = @()
    $MediaFiles = "Name"
    
    ForEach($Line in (Get-ChildItem -Path D:\Temp -File).Name)
    {$MediaFiles = $MediaFiles + "`n$Line" }
    
    $MediaFiles
    
    # Results
    
    Name
    1 passwordchangelog.txt
    10 passwordchangelog.txt
    11 passwordchangelog.txt
    4 passwordchangelog.txt
    
    
    $MediaFiles | ConvertFrom-CSV | Sort-Object -Property @{ Expression = { [int]($_.Name -split ' ')[0] } }
    
    # Results
    
    Name                    
    —-                    
    1 passwordchangelog.txt 
    4 passwordchangelog.txt 
    10 passwordchangelog.txt
    11 passwordchangelog.txt
    

    and taking the later

    Get-ChildItem -Path D:\Temp -File | 
    Sort-Object -Property @{ Expression = { [int]($_.Name -split ' ')[0] } } | 
    Select Name
    
    # Results
    Name                    
    ----                    
    1 passwordchangelog.txt 
    4 passwordchangelog.txt 
    10 passwordchangelog.txt
    11 passwordchangelog.txt
    

    @Sankhadip, as for...

    Two thing is not clear, which is highlighted in bold. Why you use the formatting lines two times? And what is the meaning of the comma used after the Regular expression?
    {$Line -replace '^\d*',("{0:D4}" -f ("{0:D4}" -f ([int]($Line -split ' ',2)[0])))}) | Sort-Object

    The two were needed in this construct to deal with the two strings. As for the comma in the split. It's part of other options that can be used. See this article.

    Using the Split Method in PowerShell

    Split on an array of strings with options
    The option to specify an array of strings to use for splitting a string offers a lot of possibilities. The StringSplitOptions enumeration also offers a way to control the return of empty elements. The first thing I need to do is to create a string.
    https://blogs.technet.microsoft.com/heyscriptingguy/2014/07/17/using-the-split-method-in-powershell

    See also...

    PowerShell Sort-Object gotcha's
    https://blogs.technet.microsoft.com/stefan_stranger/2013/11/13/powershell-sort-object-gotchas

    • #104257

      Sankhadip Roy
      Participant

      Thanks, Postanote and Christian,

      @christian, I saw your post, but it was not working as you directed and after clarification from postanote and with the help of mentioned steps, it's now working. It is really an easiest one. Thanks a lot.

      From all of you and with this post, I learned a lot. Thank you guyz. Hats off to all of you. 🙂

You must be logged in to reply to this topic.