Author Posts

November 20, 2014 at 11:07 am

Can anyone help me to convert a c# .net program to powershell cmdlet.

Has anyone used this cmdlet ConvertFrom-PDF

http://www.beefycode.com/post/ConvertFrom-PDF-Cmdlet.aspx

I need to scan PDF's and I cant seem to get the source code in this blog post into a working cmdlet

Anyhelp would be greatly apprecated

November 20, 2014 at 11:20 am

I've done some work with the iTextSharp libraries directly in PowerShell before. You can see an example at https://social.technet.microsoft.com/Forums/scriptcenter/en-US/086b9a8c-7e47-49ed-8e94-8f5f43f408fe/search-a-pdf-and-return-specific-text?forum=winserverpowershell . You will need to download a copy of iTextSharp.dll.

November 20, 2014 at 12:20 pm

Hi Dave thanks for getting back to me

I tried Get-ReferencesFromPdf cmdlet It didn't return any data no errors either Any suggestions for troubleshooting this?

I do have the iTextSharp.dll and created the same directory structure from the post

November 20, 2014 at 12:29 pm

That function was written specifically for the question posted on that thread, looking for section numbers followed by some number of lines matching ABC-*. It's not meant for you to be able to run it directly.

However, the code does show you how to use the PdfReader and PdfTextExtractor classes to pull text out of a PDF into a .NET String variable. From there, you can split it by line as in the example, or just work with the whole page text as one string; that's up to you.

Here's a more trimmed down example that just extracts all of the text from the PDF and outputs it as a single string, that you can manipulate however you want:

function Get-PdfText
{
    [CmdletBinding()]
    [OutputType([string])]
    param (
        [Parameter(Mandatory = $true)]
        [string]
        $Path
    )

    $Path = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($Path)

    try
    {
        $reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
    }
    catch
    {
        throw
    }

    $stringBuilder = New-Object System.Text.StringBuilder

    for ($page = 1; $page -le $reader.NumberOfPages; $page++)
    {
        $text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page)
        $null = $stringBuilder.AppendLine($text) 
    }

    $reader.Close()

    return $stringBuilder.ToString()
}

November 20, 2014 at 12:51 pm

ok I tried it , still not returning a string am I still using

Add-Type -Path .\PdfToText\itextsharp.dll

I feel like im not placing this .dll right

any other suggestions

thx

November 20, 2014 at 11:20 pm

Try something like this

[System.Reflection.Assembly]::LoadFrom('C:\Data\iTextSharp.DLL')

.

(Copying the file there as well of course)

Use fully qualified paths BTW.

Then test Dave's function. Worked good for me.

August 14, 2017 at 10:26 am

For some reason I cannot load the dll as described..
instead i have to do like this

$bytes = [System.IO.File]::ReadAllBytes("c:\...\itextsharp.dll")
[System.Reflection.Assembly]::Load($bytes)