Grabbing last 4 characters of matches

This topic contains 15 replies, has 4 voices, and was last updated by Profile photo of Olaf Soyk Olaf Soyk 1 month, 1 week ago.

  • Author
    Posts
  • #55904
    Profile photo of Ronald Crisp
    Ronald Crisp
    Participant

    This is my current script that i have wrote:

    Select-string 'D:\Powershell\Sample.txt'-pattern ~TRN\*1\*\w+ -AllMatches | Select matches

    This is the result:

    Matches
    ——-
    {~TRN*1*10100000000*}
    {~TRN*1*10100000001*}
    {~TRN*1*10100000002*}
    {~TRN*1*10100000003*}
    {~TRN*1*10100000004*}
    {~TRN*1*10100000005*}
    {~TRN*1*10100000006*}

    What I would like my result to be is throughout all the text file look between the "~TRN*1*" and the next "*" and only show the last 4 characters between, so the end result would just show the following:

    Matches
    ——-
    0000
    0001
    0002
    0003
    0004
    0005
    0006

    Any help would be greatly appreciated, thanks!

  • #55906
    Profile photo of Olaf Soyk
    Olaf Soyk
    Participant

    Something like this?

    Get-Content -Path C:\_Temp\test\sample.txt | ForEach-Object -Process {$_ -match '~TRN\*1.*(\d{4})\*' | Out-Null ; $Matches[1]}
    
    • #55925
      Profile photo of Ronald Crisp
      Ronald Crisp
      Participant

      Thank you for the reply!

      By reading your setup, it has helped me get closer...

      The Regex output result is where i am getting jammed up

      All i want is what is in bold. Let me know if you need more info and thank you!

      what i currently get:
      {~TRN*1*10100000000*}
      {~TRN*1*10100000001*}
      {~TRN*1*10100000002*}
      {~TRN*1*10100000003*}
      {~TRN*1*10100000004*}
      {~TRN*1*10100000005*}
      {~TRN*1*10100000006*}

      what i want to get:

      0000
      0001
      0002
      0003
      0004
      0005
      0006

  • #55928
    Profile photo of Ronald Crisp
    Ronald Crisp
    Participant

    Making progress...

    Current script:
    Select-string 'D:\Powershell\Sample.txt'-pattern '(?<=~TRN\*1\*)(\w+)' -AllMatches | Select matches Current Results: Matches ——- {10100000000} {10100000001} {10100000002} {10100000003} {10100000004} {10100000005} {10100000006} Now I want it to do only the last 4 characters of these numbers, in example: Matches ——- {0000} {0001} {0002} {0003} {0004} {0005} {0006}

    • #55930
      Profile photo of Olaf Soyk
      Olaf Soyk
      Participant

      What do you get when you run my code? How does the txt file look like?

    • #55948
      Profile photo of Ronald Crisp
      Ronald Crisp
      Participant

      This is what the text file looks like (small sample but throughout):

      *ACH*CCP*01*111*DA*33*1234567890**01*111*DA*22*20100101~TRN*1*10100000000*1000000000~REF*EV*ETIN~DTM*405*20100101~N1*PR*NYSDOH~N3*OFFICE OF HEALTH INSURANCE PROGRAMS*CORNING TOWER, EMPIRE STATE PLAZA~N4*ALBANY*NY*122370080~PER*BL*PROVIDER SERVICES*TE*8003439000*UR*www.emedny.org~N1*PE*MAJOR MEDICAL PROVIDER*XX*9999999995~REF*TJ*000000000~LX*1~CLP*PATIENT ACCOUNT NUMBER*1*34.25*34.25**MC*1000210000000030*11~NM1*QC*1*SUBMITTED LAST*SUBMITTED FIRST****MI*LL99999L~NM1*74*1*CORRECTED LAST*CORRECTED FIRST~REF*EA*PATIENT ACCOUNT NUMBER~DTM*232*20100101~DTM*233*20100101~AMT*AU*34.25~SVC*HC:V2020:RB*6*6**1~DTM*472*20100101~AMT*B6*6~SVC*HC:V2700:RB*2.75*2.75**1~DTM*472*20100101~AMT*B6*2.75~SVC*HC:V2103:RB*5.5*5.5**1~DTM*472*20100101~AMT*B6*5.5~SVC*HC:S0580*20*20**2~DTM*472*20100101~AMT*B6*

      The results i get back from running your code:

      0580
      0580
      0580
      0580
      0580
      0580
      0580
      0580
      0580

  • #55934
    Profile photo of Rob Simmers
    Rob Simmers
    Participant

    Olaf's code worked for me. Although I'm not sure how the regex knows to get the last 4 digits versus the first 4.

    • #55940
      Profile photo of Peter Jurgens
      Peter Jurgens
      Participant

      In Olaf's regex he uses a capture group and the key element is placing the "\*" right after the capture group. This pretty much describes that the 4 digits he wants in the capture group "(\d{4})" come right before the asterisk in the sample text.

      Personally I prefer to name my capture groups if ever I use capture groups then you can access them via $matches["name"] rather than a numerical index.

  • #55943
    Profile photo of Ronald Crisp
    Ronald Crisp
    Participant

    This is the result i get with Olaf's code:

    0580
    0580
    0580
    0580
    0580
    0580
    0580
    0580
    0580

    The text file I have does not have any spaces but is using '*' for all the spaces if that makes sense. For example a small segment on how the text file is:

    *ACH*CCP*01*111*DA*33*1234567890**01*111*DA*22*20100101~TRN*1*10100000000*1000000000~REF*EV*ETIN~DTM*405*20100101~N1*PR*NYSDOH~N3*OFFICE OF HEALTH INSURANCE PROGRAMS*CORNING TOWER, EMPIRE STATE PLAZA~N4*ALBANY*NY*122370080~PER*BL*PROVIDER SERVICES*TE*8003439000*UR*www.emedny.org~N1*PE*MAJOR MEDICAL PROVIDER*XX*9999999995~REF*TJ*000000000~LX*1~CLP*PATIENT ACCOUNT NUMBER*1*34.25*34.25**MC*1000210000000030*11~NM1*QC*1*SUBMITTED LAST*SUBMITTED FIRST****MI*LL99999L~NM1*74*1*CORRECTED LAST*CORRECTED FIRST~REF*EA*PATIENT ACCOUNT NUMBER~DTM*232*20100101~DT

    Now this text file may have multiple ~TRN*1* in it which can vary in numerical length. I want to grab the last four numbers Between ~TRN*1* and the following '*' for every instance that it can be found in this text file. (forgive me guy's, I am new to a lot of this and i am trying to understand. So if i am missing something just please let me know) I appreciate all the help! 🙂

  • #55946
    Profile photo of Ronald Crisp
    Ronald Crisp
    Participant

    This is the result i have:
    0580
    0580
    0580
    0580
    0580
    0580
    0580
    0580
    0580

    This is a small example of the text document and how it is layed out:

    *ACH*CCP*01*111*DA*33*1234567890**01*111*DA*22*20100101~TRN*1*10100000000*1000000000~REF*EV*ETIN~DTM*405*20100101~N1*PR*NYSDOH~N3*OFFICE OF HEALTH INSURANCE PROGRAMS*CORNING TOWER, EMPIRE STATE PLAZA~N4*ALBANY*NY*122370080~PER*BL*PROVIDER SERVICES*TE*8003439000*UR*www.emedny.org~N1*PE*MAJOR MEDICAL PROVIDER*XX*9999999995~REF*TJ*000000000~LX*1~CLP*PATIENT ACCOUNT NUMBER*1*34.25*34.25**MC*1000210000000030*11~NM1*QC*1*SUBMITTED LAST*SUBMITTED FIRST****MI*LL99999L~NM1*74*1*CORRECTED LAST*CORRECTED FIRST~REF*EA*PATIENT ACCOUNT NUMBER~DTM*232*20100101~DTM*233*20100101~AMT*AU*34.25~SVC*HC:V2020:RB*6*6**1~DTM*472*20100101~AMT*B6*6~SVC*HC:V2700:RB*2.75*2.75**1~DTM*472*20100101~AMT*B6*2.75~SVC*HC:V2103:RB*5.5*5.5**1~DTM*472*20100101~AMT*B6*5.5~SVC*HC:S0580*20*20**2~DTM*472*20100101~AMT*B6*

    I just want to find all the numbers between every ~TRN*1* and the next asterisk found in the text file. and only display the last 4 sets of numbers regardless of the length between the numbers, i just need it to always be the last 4. Thank you all so much for your assistance with this, i am a total newbie at this and trying to learn. 🙂

    • #56012
      Profile photo of Olaf Soyk
      Olaf Soyk
      Participant

      OK, now I know what's wrong. Try this:

      Get-Content -Path C:\_Temp\test\sample.txt | ForEach-Object -Process {$_ -match '~TRN\*1.*?(\d{4})\*' | Out-Null ; $Matches[1]}

      I assume your file has line breaks

    • #56039
      Profile photo of Ronald Crisp
      Ronald Crisp
      Participant

      There are no line breaks, it all runs on a single line.

      I also ran your code and the results are:

      0000

      It seems to see only 1 single instance of this match which comes first, now how do we make it display all the matches?

      Again, thank you so much for your assistance olaf! 🙂

    • #56045
      Profile photo of Olaf Soyk
      Olaf Soyk
      Participant

      I will not give up .... yet. 😉

      Try this:

      (Select-string 'D:\Powershell\Sample.txt'-pattern ~TRN\*1\*\w+ -AllMatches | select -ExpandProperty Matches).Value | ForEach-Object -Process {$_ -match '~TRN\*1.*?(\d{4})$' | Out-Null ; $Matches[1]}

      ... and BTW: that's really weird to have file without any line breaks. I'm just curious: where do get this file from?

    • #56048
      Profile photo of Ronald Crisp
      Ronald Crisp
      Participant

      The results are showing now! Thank you, thank you, thank you! You have been a overwhelmingly great help Olaf!

      To be more specific on why I was trying to figure this out. I work in Ambulance Billing and I receive these Electronic Remittance Advices (ERAs) that are supposed to be opened in a application called Medicare EasyPrint reader. These files come in the file extension of .EDI (example.EDI). What I am currently having to do is open these files one by one and rename them based on the insurance payor and the check numbers contained within, sometimes a single check number, other times a whole bunch of check numbers, but only the last 4 digits of each check number. What I figured is since this is a repetitive task, I could use powershell to automate this process. Eventually I would like to run a powershell script, that will pull that information within each EDI file and rename them based upon those specific matches you have assisted me so awesomely with Olaf and I thank you soooooo much for that!

      I didn't want to tell everyone what the bigger picture was because I didn't want someone to just make the whole thing for me or people thinking I wanted them to do this for me, I am trying to study and understand everything being used and you have given me enough to study with the code you provided me Olaf.

      Also since these files come in .EDI File extension, I just mass change them to the .txt extension to make it readable which is the sample result I gave you, all in one line only.

      Here is an example of an ERA.EDI file (not the best one because it only has 1 check number in it but):
      https://www.emedny.org/HIPAA/5010/5010_sample_files/835%20Sample%20(Professional%20Claims%20only-w%20payment).2014.txt

    • #56050
      Profile photo of Olaf Soyk
      Olaf Soyk
      Participant

      Ronald,

      cool ... thanks for the explanation. So we both learned something today. Great. 😀

      BTW: you don't have to rename the files to read them with Powershell.

    • #56056
      Profile photo of Olaf Soyk
      Olaf Soyk
      Participant

      Ronald,

      I've played a little around and I've found an even shorter one: (maybe faster)

      (Select-string 'D:\Powershell\sample.txt' -pattern '~TRN\*1\*\d*(\d{4})\*' -AllMatches).Matches.Groups.value[(0..1000|%{$_*2+1})]

      But ... and it's a big BUT – it assumes that you don't have more than 1000 hits per file. If so you have to increase the number in the expression. 😀

You must be logged in to reply to this topic.