Help removing random characters from multiple strings

This topic contains 7 replies, has 4 voices, and was last updated by Profile photo of Matt Braithwaite Matt Braithwaite 1 year, 9 months ago.

  • Author
    Posts
  • #23086
    Profile photo of Matt Braithwaite
    Matt Braithwaite
    Participant

    Hi,
    I get a file sent to me by our staff in Accounting with information for users who are no longer employed. My eventual goal is to be able to import the file into a script and have it confirm that the users are disabled. However, I'm stuck on cleaning up the file that gets exported from the program that is used to keep track of employees, and am wondering if there is a way in Powershell to clean this up a bit. An example line in the text that is exported is below (confidential information is obviously cleaned up):

     1 000000       0000000000000000000000000NAME               0   SURNAME                   BOX 111                            SOMEWHERE, OUTTHERE                  
            A1A 1A1                           000000000000000000000000JOBTITLE - 00.00                        U        0000000000.00000000.00000000.00000000.00
    000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.0000000.0000
    0000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.000000
    00.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.000000000000     
          000                         0000000000     000000.00                                                                                                 
                                                                                                                                                               
           000         

    Everything that is a 0 in that line is a number or letter other than 0, otherwise this would be incredibly simple. Each line also contains the exact same amount of characters, with the capital variables being the main variance in length.
    I've tried using Trim, Trimstart, and Trimend, but it seems that they need specific characters, not a numbered amount. I've tried Replace, but it also seems to need specific characters (unless I'm doing it wrong).

    Please help!

    Thanks,
    Matt

    EDIT: Out of that string, I need NAME, SURNAME, BOX (address), SOMEWHERE,OUTTHERE (city and province), A1A 1A1 (postal code), and JOBTITLE.

  • #23088
    Profile photo of Corey Thomas
    Corey Thomas
    Participant

    Without seeing the original data, it's kinda hard to interpret what this data is and what is needed. Any way you can give an example of actual data (fake actual data of course)

  • #23089
    Profile photo of Greg Altman
    Greg Altman
    Member

    Matt, I haven't tried with your data , but I've had success in the past with the Substring Method. Here's a link to an article ( the relevant portion is near the end) https://technet.microsoft.com/en-us/library/ee692804.aspx

  • #23090
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    I'm not sure I understand the question. Do you know exactly how many characters you need to remove from the beginning and end of the string, but don't know what those characters are? If that's the case, the Substring method of the string class would probably help.

    Beyond that, I'd probably need to know more about the data and what you want to get out of it. In your example line, what do you want it to look like after you've manipulated it?

  • #23096
    Profile photo of Matt Braithwaite
    Matt Braithwaite
    Participant

    I have attached the file and sanitized it. Please ignore the goofy names – it's Friday 🙂
    Also, it comes out as a .blk, not .txt – however I have changed its extension so it can be uploaded.

    Matt

  • #23098
    Profile photo of Corey Thomas
    Corey Thomas
    Participant

    Looks like the first, middle, and last names always start on the same index location. If that is the case, you could just strip out what you don't need.

  • #23099
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    Looks to me like this is just a fixed-width data file. (Not sure what the ~0100 on the first line represents; you'd probably want to skip that.) Extracting data from a fixed-width file is fairly straightforward. You can use Substring() as I mentioned earlier (with an example below), or you can use ADO with a Schema.ini file (details here: https://msdn.microsoft.com/en-us/library/ms974559.aspx , though that's an old article with VBScript code that you'd need to convert to PowerShell.)

    Example with Substring:

    # first data line from your sample file
    $line = '1 000000       000000000RP00000000000000NAME                    SURNAME                     STREET ADDRESS                     CITY, PROVINCE                      A1A 1A1                           000000000000000000000000JOB TITLE - 00.00                       U        0000000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00000000.00ACCOUNTANT           LN                          5555555555     000000.00                                                                                                                                                                                                                                                                   EES         '
    
    $name = $line.Substring(40,20).Trim()
    $initial = $line.Substring(60,4).Trim()
    $surname = $line.Substring(64, 28).Trim()
    # etc
    

    Extracting other bits of data would amount to the same thing (Substring(index, length).Trim()); you just need to figure out the starting index and length of each column.

  • #23100
    Profile photo of Matt Braithwaite
    Matt Braithwaite
    Participant

    Awesome. Thanks for the help!

    Matt

You must be logged in to reply to this topic.