ConvertFrom-PDF PowerShell Cmdlet | convert a c# .net program to powershell cmd

This topic contains 5 replies, has 3 voices, and was last updated by Profile photo of Tim Pringle Tim Pringle 2 years ago.

  • Author
    Posts
  • #20720
    Profile photo of H Man
    H Man
    Participant

    Can anyone help me to convert a c# .net program to powershell cmdlet.

    Has anyone used this cmdlet ConvertFrom-PDF

    http://www.beefycode.com/post/ConvertFrom-PDF-Cmdlet.aspx

    I need to scan PDF's and I cant seem to get the source code in this blog post into a working cmdlet

    Anyhelp would be greatly apprecated

  • #20725
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    I've done some work with the iTextSharp libraries directly in PowerShell before. You can see an example at https://social.technet.microsoft.com/Forums/scriptcenter/en-US/086b9a8c-7e47-49ed-8e94-8f5f43f408fe/search-a-pdf-and-return-specific-text?forum=winserverpowershell . You will need to download a copy of iTextSharp.dll.

  • #20739
    Profile photo of H Man
    H Man
    Participant

    Hi Dave thanks for getting back to me

    I tried Get-ReferencesFromPdf cmdlet It didn't return any data no errors either Any suggestions for troubleshooting this?

    I do have the iTextSharp.dll and created the same directory structure from the post

  • #20740
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    That function was written specifically for the question posted on that thread, looking for section numbers followed by some number of lines matching ABC-*. It's not meant for you to be able to run it directly.

    However, the code does show you how to use the PdfReader and PdfTextExtractor classes to pull text out of a PDF into a .NET String variable. From there, you can split it by line as in the example, or just work with the whole page text as one string; that's up to you.

    Here's a more trimmed down example that just extracts all of the text from the PDF and outputs it as a single string, that you can manipulate however you want:

    function Get-PdfText
    {
        [CmdletBinding()]
        [OutputType([string])]
        param (
            [Parameter(Mandatory = $true)]
            [string]
            $Path
        )
    
        $Path = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($Path)
    
        try
        {
            $reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
        }
        catch
        {
            throw
        }
    
        $stringBuilder = New-Object System.Text.StringBuilder
    
        for ($page = 1; $page -le $reader.NumberOfPages; $page++)
        {
            $text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page)
            $null = $stringBuilder.AppendLine($text) 
        }
    
        $reader.Close()
    
        return $stringBuilder.ToString()
    }
    
  • #20742
    Profile photo of H Man
    H Man
    Participant

    ok I tried it , still not returning a string am I still using

    Add-Type -Path .\PdfToText\itextsharp.dll

    I feel like im not placing this .dll right

    any other suggestions

    thx

  • #20752
    Profile photo of Tim Pringle
    Tim Pringle
    Participant

    Try something like this

    [System.Reflection.Assembly]::LoadFrom('C:\Data\iTextSharp.DLL')

    .

    (Copying the file there as well of course)

    Use fully qualified paths BTW.

    Then test Dave's function. Worked good for me.

You must be logged in to reply to this topic.