ConvertFrom-PDF PowerShell Cmdlet | convert a c# .net program to powershell cmd

Welcome Forums General PowerShell Q&A ConvertFrom-PDF PowerShell Cmdlet | convert a c# .net program to powershell cmd

Viewing 6 reply threads
  • Author
    Posts
    • #20720
      Participant
      Topics: 33
      Replies: 112
      Points: 0
      Rank: Member

      Can anyone help me to convert a c# .net program to powershell cmdlet.

      Has anyone used this cmdlet ConvertFrom-PDF

      http://www.beefycode.com/post/ConvertFrom-PDF-Cmdlet.aspx

      I need to scan PDF's and I cant seem to get the source code in this blog post into a working cmdlet

      Anyhelp would be greatly apprecated

    • #20725
      Member
      Topics: 9
      Replies: 2322
      Points: 0
      Rank: Member

      I've done some work with the iTextSharp libraries directly in PowerShell before. You can see an example at https://social.technet.microsoft.com/Forums/scriptcenter/en-US/086b9a8c-7e47-49ed-8e94-8f5f43f408fe/search-a-pdf-and-return-specific-text?forum=winserverpowershell . You will need to download a copy of iTextSharp.dll.

    • #20739
      Participant
      Topics: 33
      Replies: 112
      Points: 0
      Rank: Member

      Hi Dave thanks for getting back to me

      I tried Get-ReferencesFromPdf cmdlet It didn't return any data no errors either Any suggestions for troubleshooting this?

      I do have the iTextSharp.dll and created the same directory structure from the post

    • #20740
      Member
      Topics: 9
      Replies: 2322
      Points: 0
      Rank: Member

      That function was written specifically for the question posted on that thread, looking for section numbers followed by some number of lines matching ABC-*. It's not meant for you to be able to run it directly.

      However, the code does show you how to use the PdfReader and PdfTextExtractor classes to pull text out of a PDF into a .NET String variable. From there, you can split it by line as in the example, or just work with the whole page text as one string; that's up to you.

      Here's a more trimmed down example that just extracts all of the text from the PDF and outputs it as a single string, that you can manipulate however you want:

      function Get-PdfText
      {
          [CmdletBinding()]
          [OutputType([string])]
          param (
              [Parameter(Mandatory = $true)]
              [string]
              $Path
          )
      
          $Path = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($Path)
      
          try
          {
              $reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
          }
          catch
          {
              throw
          }
      
          $stringBuilder = New-Object System.Text.StringBuilder
      
          for ($page = 1; $page -le $reader.NumberOfPages; $page++)
          {
              $text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page)
              $null = $stringBuilder.AppendLine($text) 
          }
      
          $reader.Close()
      
          return $stringBuilder.ToString()
      }
      
    • #20742
      Participant
      Topics: 33
      Replies: 112
      Points: 0
      Rank: Member

      ok I tried it , still not returning a string am I still using

      Add-Type -Path .\PdfToText\itextsharp.dll

      I feel like im not placing this .dll right

      any other suggestions

      thx

    • #20752
      Participant
      Topics: 4
      Replies: 262
      Points: 61
      Rank: Member

      Try something like this

      [System.Reflection.Assembly]::LoadFrom('C:\Data\iTextSharp.DLL')

      .

      (Copying the file there as well of course)

      Use fully qualified paths BTW.

      Then test Dave's function. Worked good for me.

    • #77280
      Participant
      Topics: 0
      Replies: 1
      Points: 0
      Rank: Member

      For some reason I cannot load the dll as described..
      instead i have to do like this

      $bytes = [System.IO.File]::ReadAllBytes("c:\...\itextsharp.dll")
      [System.Reflection.Assembly]::Load($bytes)

Viewing 6 reply threads
  • The topic ‘ConvertFrom-PDF PowerShell Cmdlet | convert a c# .net program to powershell cmd’ is closed to new replies.