Help removing random characters from multiple strings

This topic contains 7 replies, has 4 voices, and was last updated by  Matt Braithwaite 2 years, 5 months ago.

  • Author
  • #23086

    Matt Braithwaite

    I get a file sent to me by our staff in Accounting with information for users who are no longer employed. My eventual goal is to be able to import the file into a script and have it confirm that the users are disabled. However, I'm stuck on cleaning up the file that gets exported from the program that is used to keep track of employees, and am wondering if there is a way in Powershell to clean this up a bit. An example line in the text that is exported is below (confidential information is obviously cleaned up):

     1 000000       0000000000000000000000000NAME               0   SURNAME                   BOX 111                            SOMEWHERE, OUTTHERE                  
            A1A 1A1                           000000000000000000000000JOBTITLE - 00.00                        U        0000000000.00000000.00000000.00000000.00
          000                         0000000000     000000.00                                                                                                 

    Everything that is a 0 in that line is a number or letter other than 0, otherwise this would be incredibly simple. Each line also contains the exact same amount of characters, with the capital variables being the main variance in length.
    I've tried using Trim, Trimstart, and Trimend, but it seems that they need specific characters, not a numbered amount. I've tried Replace, but it also seems to need specific characters (unless I'm doing it wrong).

    Please help!


    EDIT: Out of that string, I need NAME, SURNAME, BOX (address), SOMEWHERE,OUTTHERE (city and province), A1A 1A1 (postal code), and JOBTITLE.

  • #23088

    Corey Thomas

    Without seeing the original data, it's kinda hard to interpret what this data is and what is needed. Any way you can give an example of actual data (fake actual data of course)

  • #23089

    Greg Altman

    Matt, I haven't tried with your data , but I've had success in the past with the Substring Method. Here's a link to an article ( the relevant portion is near the end)

  • #23090

    Dave Wyatt

    I'm not sure I understand the question. Do you know exactly how many characters you need to remove from the beginning and end of the string, but don't know what those characters are? If that's the case, the Substring method of the string class would probably help.

    Beyond that, I'd probably need to know more about the data and what you want to get out of it. In your example line, what do you want it to look like after you've manipulated it?

  • #23096

    Matt Braithwaite

    I have attached the file and sanitized it. Please ignore the goofy names – it's Friday 🙂
    Also, it comes out as a .blk, not .txt – however I have changed its extension so it can be uploaded.


  • #23098

    Corey Thomas

    Looks like the first, middle, and last names always start on the same index location. If that is the case, you could just strip out what you don't need.

  • #23099

    Dave Wyatt

    Looks to me like this is just a fixed-width data file. (Not sure what the ~0100 on the first line represents; you'd probably want to skip that.) Extracting data from a fixed-width file is fairly straightforward. You can use Substring() as I mentioned earlier (with an example below), or you can use ADO with a Schema.ini file (details here: , though that's an old article with VBScript code that you'd need to convert to PowerShell.)

    Example with Substring:

    # first data line from your sample file
    $line = '1 000000       000000000RP00000000000000NAME                    SURNAME                     STREET ADDRESS                     CITY, PROVINCE                      A1A 1A1                           000000000000000000000000JOB TITLE - 00.00                       U        0000000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00ACCOUNTANT           LN                          5555555555     000000.00                                                                                                                                                                                                                                                                   EES         '
    $name = $line.Substring(40,20).Trim()
    $initial = $line.Substring(60,4).Trim()
    $surname = $line.Substring(64, 28).Trim()
    # etc

    Extracting other bits of data would amount to the same thing (Substring(index, length).Trim()); you just need to figure out the starting index and length of each column.

  • #23100

    Matt Braithwaite

    Awesome. Thanks for the help!


You must be logged in to reply to this topic.