Author Posts

September 6, 2018 at 10:24 am

Hi specialists,

I have created a short script to generate a bunch of docx fieles I need for testing. It works quite well, but I have a problem entering a sample text with German Umlauts. the Umlauts are broken in the resulting docx. Could you guys help me with this issue?

Here is my code:

[CmdletBinding()]
param (
    [Parameter()][string]$FileNamePrefix = "Test",
    [Parameter()][int]$Count = 10
)

begin {
    Add-Type -AssemblyName "Microsoft.Office.Interop.Word"
    $word = New-Object -ComObject Word.Application
    $word.Visible = $false
    $targetPath = Resolve-Path $PWD
}

process {
    for ($i = 1; $i -lt $Count + 1; $i++) {
        Write-Progress -Activity "Creating Word files. $i of $Count." -PercentComplete ($i/$Count*100)
        $doc = $word.Documents.Add()
        $doc.TextEncoding = [Microsoft.Office.Core.MsoEncoding]::msoEncodingUTF8
        $selection = $word.Selection
        $rtext = -join ((65..90) + (97..122) | Get-Random -Count 10 | % { [char]$_ })
        $selection.TypeText("Dies ist Text $($i) mit zufälligem Text: $rtext" )
        $selection.TypeParagraph()
        $target = Join-Path $targetPath "$FileNamePrefix-$($i)"
        $doc.SaveAs([object]$target.ToString())
        $doc.Close()
    }
}

end {
    $word.Quit()
}

The result in Word looks like this. The "ä" is encoded wrongly:

"Dies ist Text 1 mit zufälligem Text: lugJpDqxkB"

September 6, 2018 at 11:38 am

I tried your code and I get 10 word files with

" Dies ist Text 7 mit zufälligem Text: gXGmNPkIpt "

Looks correct,

This is with word 2016

September 6, 2018 at 11:59 am

Thank you for testing it. I'm using Word 2013. But I have a 2016 version somewhere and will try if that really makes a difference.

September 6, 2018 at 12:43 pm

Hi Detlef, I request you to format the code in the forum which makes other to easily understand your code, below link will help you.

September 7, 2018 at 6:22 am

Just to close this topic I'm reporting the solution I found.

It seems PowerShell is unable to correctly identify the encoding of a script file stored as UTF-8 without BOM.

I changed my script to UTF-8 with BOM, and now the string in my docx files are correct...