multi HTML > single PDF Automation wkHTMLtoPDF (using Powershell)

Welcome Forums General PowerShell Q&A multi HTML > single PDF Automation wkHTMLtoPDF (using Powershell)

This topic contains 11 replies, has 3 voices, and was last updated by

 
Participant
2 months, 3 weeks ago.

  • Author
    Posts
  • #159104

    Participant
    Topics: 1
    Replies: 5
    Points: 6
    Rank: Member

    Hello,

     

    I have been trying to automate converting a bulk of HTML documents I have (powershell generated HTML documents) into a single .PDF document as a report.

    I have been using wktmltopdf to start – and it works fine for the task, but I cannot seem to automate it (at least multiple-in, single-out).

    The issue I'm having is outputting the selected files into the wkhtmltopdf cli as a "list".

    ##Syntax for wkhtmltopdf = command ran from /bin of directory wkhtmltopdf [global option] [documents/HTML] [file output full path]
    
    
    ##powershell script that takes HTML document path names and converts them into single PDF file
    
    $OutputFile = '$HOME\Documents\TempPDFReport\reporttest6.pdf'
    $wkhtmltopdfRootDir = 'C:\Program Files\wkhtmltopdf\bin'
    $GetChildItems = (Get-ChildItem -Path $HOME'\documents\TempHTMLConvert' -recurse |`
    where {$_.extension -eq ".html"} |`
    Select-Object -Property FullName).FullName -join ' '
    
    
    &'c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' $GetChildItems $OutputFile

    What I expected here was that it would generate a list of all child items, join them with a space (The files and paths have no spaces in the names so no quotations needed), and act as a "list" of all the files to be input to the wkhtmltopdf.exe CLI, separated with a space.

    I am however getting errors. When I add a Set-Clipboard pipe within the GetChildItems cmdlt and paste that into the wkhtmltopdf.exe CLI and add an output location – it works just fine. But the pass in the script doesn't seem to function, it throws wkhtmltopdf.exe error "unknown protocol c", which means it thinks that the first C in the first file path C:\... is a protocol, but I can't figure out what in the output format is causing that. There is no space after the C:\.

    If anyone knows a good way of being able to "pipe" PowerShell objects into other CLI – I'd greatly appreciate the help. I'm not crazy good with powershell, so I'd imagine you can do an array or something, or maybe a more complex -join?

     

    Thanks in advance,

    -Mackling101

  • #159165

    Participant
    Topics: 1
    Replies: 302
    Points: 145
    Helping Hand
    Rank: Participant

    Hi,

    It looks like you need to specify the protocol when using wkhtmltopdf.exe. For files, this is usually file:/// rather than http://

    This appears to work:

    $files = Get-ChildItem E:\temp\html\ -Include *.html -Recurse | Select -ExpandProperty FullName
    
    $fileList = @()
    
    foreach ($file in $files) {
    
        $file = "file:///$file"
        $fileList += $file
    }
    
    & E:\Temp\wkhtmltox\bin\wkhtmltopdf.exe $fileList output.pdf
    
  • #159203

    Participant
    Topics: 1
    Replies: 5
    Points: 6
    Rank: Member

    Hey Matt,

    Thanks for the advice – sadly, the big issue I'm having is dropping the added crap that powershell puts on the objects: @{FullName=

    That sticks in front of the filenames – and I'm not sure how to get rid of that. If I could just grab only the filename, and that's it.. nothing else, I think it would be okay. Also, the File:/// is not working either as I end up with an object looking like this:

    file:///@{FullName=C:\Users\...

    So each filename ends up with all of that stuff in front if the C:\. The join I had above got me the C:\ only, but doesn't seem to work still. Maybe at this point I just give up on this, and admit that passing filenames to wkhtmltopdf from powershell isn't possible. Maybe if I push objects to a CSV temp file, and then grab them from that they won't have all the added stuff?

     

    Thanks for the help.

     

    -Mackling101

  • #159213

    Participant
    Topics: 1
    Replies: 302
    Points: 145
    Helping Hand
    Rank: Participant

    Well it is possible. The code I posted works. Did you try it?

    You need to make sure you use -ExpandProperty to get the name as a String object.

  • #159216

    Participant
    Topics: 2
    Replies: 999
    Points: 1,946
    Helping Hand
    Rank: Community Hero

    I am really not sure what you are after here, but Get-Children return a file object, not feel content.

    Get-Content, return file content of an individual file called.

    You are using a 3rdP external app, wkhtmltopdf.exe, which I've never heard of before to create a PDF, when you can just use the PDF printer in Windows, but I digress.

    Using Set-Clipboard with Get-ChildItem, means nothing really, relative to the file content. You'll only be send the full file object to the clipboard, and that make little sense.

    So, are you just wanting to send the full filenames to a single PD or the actual file content?

    If it is the later, you have more work to do. You have to:

    • Loop to read each file
    • Get it's content
    • Add that to a single file or variable
    • The convert that to PDF

    If you are getting stuff back from just the above. Then good, if not, you need to figure out why.

    This is wrong, because you are using a variable than needs to be expanded.
    Single quotes are for simple strings.

    $OutputFile = '$HOME\Documents\TempPDFReport\reporttest6.pdf'
    

    it should be this
    Double quotes are for variable expansion

    $OutputFile = "$HOME\Documents\TempPDFReport\reporttest6.pdf"
    $wkhtmltopdfRootDir = 'C:\Program Files\wkhtmltopdf\bin'
    

    This, by it self would only send filenames to the PDF file, not content.

    $GetChildItems = (Get-ChildItem -Path $HOME'\documents\TempHTMLConvert' -recurse |`
    where {$_.extension -eq ".html"} |`
    Select-Object -Property FullName).FullName -join ' '
    

    Don't use the backtick after the pipe for line continuation, the pipe is a natural line continuation, there are many line continuations in PowerShell. Backtick as it's place, and I do use it, but not here. Other just malign it always.

    This is a good article on the topic, though I feel the author convolutes things to justify some of his point, and this I disagree with him. Yet, most of what's there, is on the money.

    Bye Bye Backtick: Natural Line Continuations in PowerShell

    # Get the fullname of the file and the full content of all target files, and put into a variable, 
    $HtmlContent = Get-ChildItem -Path "$HOME\documents\TempHTMLConvert\*.html" -recurse | 
    ForEach{ 
        $PSItem.FullName
        Get-Content -Path $PSItem.FullName 
    }
    

    Running external commands with PowerShell require special attention.

    Using PowerShell and external commands and their parameters or switches.

    PowerShell: Running Executables
    https://social.technet.microsoft.com/wiki/contents/articles/7703.powershell-running-executables.aspx

    Solve Problems with External Command Lines in PowerShell

    Solve Problems with External Command Lines in PowerShell

    Top 5 tips for running external commands in Powershell
    https://powershelleverydayfaq.blogspot.com/2012/04/top-5-tips-for-running-external.html

    Using Windows PowerShell to run old command line tools (and their weirdest parameters)
    https://blogs.technet.microsoft.com/josebda/2012/03/03/using-windows-powershell-to-run-old-command-line-tools-and-their-weirdest-parameters

    Execution of external commands in PowerShell done right
    Execution of external commands in PowerShell done right
    Execution of external commands (native applications) in PowerShell done right – Part 2
    Execution of external commands (native applications) in PowerShell done right – Part 3

    http://edgylogic.com/blog/powershell-and-external-commands-done-right

    Quoiting specifics
    https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_quoting_rules

    A Story of PowerShell Quoting Rules

    & 'c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' $HtmlContent $OutputFile
    
  • #160469

    Participant
    Topics: 1
    Replies: 5
    Points: 6
    Rank: Member

    Sorry for the late reply – I did try to use exactly what you had posted Matt, but it was still throwing me the same errors.

     

    Postanote – Thank you for your reply, I will read through this and see if I can get it working.

     

    Thank you both for the replies, I appreciate the assistance here.

     

    -Mackling101

  • #160494

    Participant
    Topics: 1
    Replies: 302
    Points: 145
    Helping Hand
    Rank: Participant

    Can you post the code you tested with after my post so I can try and re-test?
    The sample I posted worked fine for me, generating a single multi-page PDF file from three HTML files.

  • #162782

    Participant
    Topics: 1
    Replies: 5
    Points: 6
    Rank: Member
    
    $OutputFile = 'c:\users\—\Documents\TempPDFReport\reporttest8.pdf'
    $Files = Get-ChildItem C:\Users\—\Documents\TempHTMLConvert -Include *.html -Recurse | Select -ExpandProperty FullName
    
    $FilesList = @()
    
    Foreach ($File in $Files) {
    
    $File = "File:///$File"
    $FileList += $File
    
    }
    
    &'c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' $FileList $OutputFile
    
    

     

    Here is what I am using. Sorry for late reply – again. Due to functionality of this, I was a bit forced to move on. But I'm still passively working on it.

     

    I get output like this:  "Failed to load file:///C:\Users"  that is error from wkhtmltopdf.exe

     

    Thanks in advance.

     

    -Mackling101

     

     

  • #162801

    Participant
    Topics: 1
    Replies: 302
    Points: 145
    Helping Hand
    Rank: Participant

    If that's an exact copy/paste then the problem is that your array is called $Fileslist (with an 's' in the middle) but in the loop and in the argument, it's called $FileList. Instead of passing a list of files, you're passing one big string that looks like this:

    file:///filename1.htmlfile:///filename2.htmlfile:///filename3.html

  • #162933

    Participant
    Topics: 1
    Replies: 5
    Points: 6
    Rank: Member

    Oh. My. God. LUL

  • #162939

    Participant
    Topics: 1
    Replies: 5
    Points: 6
    Rank: Member

    I can't believe I missed that.. seriously. So sorry – That's pretty bad.

     

    That worked, and I now have a better understanding of arrays and passing variables. Thank you both for the help, and Matt – thank you a ton. I feel terrible that I missed that. Yikes..

     

    1000 thank you's.

    • #162951

      Participant
      Topics: 1
      Replies: 302
      Points: 145
      Helping Hand
      Rank: Participant

      You're very welcome and don't feel too bad about it, these things are easily overlooked. You may want to consider using VSCode for developing your scripts; one of its features is it tells you which variables are assigned but not used.

You must be logged in to reply to this topic.