# Using Powershell to parse text

Welcome Forums General PowerShell Q&A Using Powershell to parse text

Viewing 8 reply threads
• Author
Posts
• #196109
Participant
Topics: 1
Replies: 3
Points: 18
Rank: Member

Hi all,

I would like to use Powershell to parse text.

As my first project, I have a text string which contains (among other things) dates formatted as dd/mm/yyyy

I want to build a script that will find the index location of each instance where these are found in the text, and just before that index location, insert a delimiter character.

I can find all of the dates using a regex

$Regex = [regex]"\d\d/\d\d/\d{4}" [regex]::match($Regex,$content) This produced the output of groups from which I extracted the index location for each date substring. I then created an array variable with all of the index values. I then converted those arrayed string values to numbers [array]$c = foreach($number in$STRARRAY) {([int]::parse($number))} Now I need one more loop to insert a delimiter character at the location of each index. Can anyone suggest how that command would be formed? Thanks! • #196136 Participant Topics: 2 Replies: 1693 Points: 3,368 Rank: Community Hero Welcome to Powershell.org. Please read the first pinned post on top of the list of posts of this forum. Read Me Before Posting! You'll be Glad You Did! When you post code you should format it as code. As you are a Powershell beginner I'd recommend staying with the built-in cmdlets as long as possible. It makes the code easier to read, easier to debug, easier to maintain or extend. And some dotNet methods have slightly suprising side effects compaired to standard Powershell cmdlets. I think you're overcomplicating your task. If I got you right and assumed your input comes from a file called text.txt and your delimiter charachter is "#" you can achieve your task like this: Get-Content -Path 'D:\sample\text.txt' | ForEach-Object {$_ -replace '(?=\d{2}\/\d{2}\/\d{4})','#'
}

You can pipe the result to whatever cmdlet or further steps as you like.

• #196142
Participant
Topics: 10
Replies: 1381
Points: 1,509
Rank: Community Hero

If you want to insert a delimiter, such as a semi-colon, you can do this with a pretty simple process:

$string = "this is a test 01/01/2020 testing 01/02/2020 " #https://www.regular-expressions.info/dates.html$pattern = '(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d'

#Place a semicolon in front of match 0
$string -replace$pattern, ';$0'  The regex matches:  PS C:\Users\rasim>$Matches

Name                           Value
----                           -----
3                              20
2                              01
1                              01
0                              01/01/2020


The match is 0, so we get the following output:

this is a test ;01/01/2020 testing ;01/02/2020


This discusses it more:

Regular Expressions are a -replace's best friend

• #196157
Participant
Topics: 1
Replies: 3
Points: 18
Rank: Member

These are wonderful!

I thought the -replace operator would simply remove the pattern string and replace it with a replacement string.

But Rob's example inserted the delimiter character nicely without removing the date values.

Thank you both very much!

• #196175
Participant
Topics: 27
Replies: 739
Points: 2,013
Rank: Community Hero

Hey, I finally get to use one of those little known -replace codes for the second argument:

'hi there 01/01/1999 hi there' -replace '\d\d/\d\d/\d{4}','#$&' hi there #01/01/1999 hi there $& – substitutes a copy of the whole match

Info is buried on this page that has no connection to the "about_comparison_operators" help:
https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#substitutions

• #196181
Participant
Topics: 2
Replies: 1693
Points: 3,368
Rank: Community Hero

But Rob's example inserted the delimiter character nicely without removing the date values.

... and my example snippet didn't???

• #196265
Participant
Topics: 1
Replies: 3
Points: 18
Rank: Member

Sorry Olaf, yes – your example did also.

I'm trying to understand out the option in the replace command that makes it function this way.

Most times a replace command does remove the selected string and totally replace it.

I was looking for the Get-Help on this and didn't get very far in discovering it.

Got any suggestions where I would look for any/all examples of its use?

Thank you again!

• #196268
Participant
Topics: 2
Replies: 1693
Points: 3,368
Rank: Community Hero

yes – your example did also. ...

Great. Because it worked here in my tests so I was worried. Thanks. 😉

The trick in my code snippet is in the regex pattern. It's called look-ahead and searches for something "followed by something particular". So it matches actually the charachter before (or after when you use a look-behind) the pattern you provide this way ... what you wanted.

• #196286
Participant
Topics: 1
Replies: 3
Points: 18
Rank: Member

wonderful!

Thank you for explaining that – and Look-aheads have been something I've had difficulty getting my brain around.  I can see I'm going to have to dig into that and get a solid understanding of it.

Thanks again – I appreciate!

Viewing 8 reply threads
• You must be logged in to reply to this topic.