ConvertFrom-PDF PowerShell Cmdlet | convert a c# .net program to powershell cmd

This topic contains 6 replies, has 4 voices, and was last updated by  Jørgen Guldmann 4 months ago.

  • Author
    Posts
  • #20720

    H Man
    Participant

    Can anyone help me to convert a c# .net program to powershell cmdlet.

    Has anyone used this cmdlet ConvertFrom-PDF

    http://www.beefycode.com/post/ConvertFrom-PDF-Cmdlet.aspx

    I need to scan PDF's and I cant seem to get the source code in this blog post into a working cmdlet

    Anyhelp would be greatly apprecated

  • #20725

    Dave Wyatt
    Moderator

    I've done some work with the iTextSharp libraries directly in PowerShell before. You can see an example at https://social.technet.microsoft.com/Forums/scriptcenter/en-US/086b9a8c-7e47-49ed-8e94-8f5f43f408fe/search-a-pdf-and-return-specific-text?forum=winserverpowershell . You will need to download a copy of iTextSharp.dll.

  • #20739

    H Man
    Participant

    Hi Dave thanks for getting back to me

    I tried Get-ReferencesFromPdf cmdlet It didn't return any data no errors either Any suggestions for troubleshooting this?

    I do have the iTextSharp.dll and created the same directory structure from the post

  • #20740

    Dave Wyatt
    Moderator

    That function was written specifically for the question posted on that thread, looking for section numbers followed by some number of lines matching ABC-*. It's not meant for you to be able to run it directly.

    However, the code does show you how to use the PdfReader and PdfTextExtractor classes to pull text out of a PDF into a .NET String variable. From there, you can split it by line as in the example, or just work with the whole page text as one string; that's up to you.

    Here's a more trimmed down example that just extracts all of the text from the PDF and outputs it as a single string, that you can manipulate however you want:

    function Get-PdfText
    {
        [CmdletBinding()]
        [OutputType([string])]
        param (
            [Parameter(Mandatory = $true)]
            [string]
            $Path
        )
    
        $Path = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($Path)
    
        try
        {
            $reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
        }
        catch
        {
            throw
        }
    
        $stringBuilder = New-Object System.Text.StringBuilder
    
        for ($page = 1; $page -le $reader.NumberOfPages; $page++)
        {
            $text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page)
            $null = $stringBuilder.AppendLine($text) 
        }
    
        $reader.Close()
    
        return $stringBuilder.ToString()
    }
    
  • #20742

    H Man
    Participant

    ok I tried it , still not returning a string am I still using

    Add-Type -Path .\PdfToText\itextsharp.dll

    I feel like im not placing this .dll right

    any other suggestions

    thx

  • #20752

    Tim Pringle
    Participant

    Try something like this

    [System.Reflection.Assembly]::LoadFrom('C:\Data\iTextSharp.DLL')

    .

    (Copying the file there as well of course)

    Use fully qualified paths BTW.

    Then test Dave's function. Worked good for me.

  • #77280

    Jørgen Guldmann
    Participant

    For some reason I cannot load the dll as described..
    instead i have to do like this

    $bytes = [System.IO.File]::ReadAllBytes("c:\...\itextsharp.dll")
    [System.Reflection.Assembly]::Load($bytes)

You must be logged in to reply to this topic.