Using wkhtmltopdf and an Xvfb daemon to render HTML to PDF

/ 11

I have a lot of projects that consist of collecting information and then rendering it to PDF, initially I used tcpdf, but the rendering is VERY finicky. I was doing some additional research a few weeks ago for a personal project and found wkhtmltopdf.


I have a lot of projects that consist of collecting information and then rendering it to PDF, initially I used tcpdf, but the rendering is VERY finicky. I was doing some additional research a few weeks ago for a personal project and found wkhtmltopdf.

I was scraping my Harvest time sheets to gather my hours for the previous two weeks, rendering to a PDF, and automatically creating a RightSignature document to be signed by my employers. If you’ve ever had to do this, it can be somewhat of a pain to do repeatedly, especially when you’re really busy. Plus, automating things is awesome.

For that process, I used some python (with Requests), wkhtmltopdf, and Xvfb‘s (X virtual framebuffer) xvfb-run command. This is fine, because I run the process manually, at will.

I have some other projects that require repeated rendering on a live site, for multiple users. Running individual xvfb-run processes for each request is just a dumb idea. Your server is going to explode if you do this. Don’t do it.

In order to use a single Xvfb process, I needed to set it up as a daemon that is run by the www-data user, so that the PHP wrapper I am using (PHPWkHtmlToPDF) can access the virtual frame buffer, and use the webkit rendering engine (Chrome) to save it to PDF.

I could (and probably will) setup a processing queue using rabbitmq or the likes, but for now (and for this post), this is fine.

Let’s get started, shall we?

These instructions are centered around Ubuntu, so your milage may vary depending on the setup you are using.

Step 1: Install wkhtmltopdf and xvfb

# apt-get install wkhtmltopdf xvfb

Step 2: Create and enable the init script for the xvfb daemon

Note “:0″ in XVFBARGS which sets to display 0, and the –set-guid parameter, which runs the daemon as the www-data user. If you are not running a headless server, make sure to change all references to “:0″ to “:#” (ie: :1, :2 or :22) to avoid conflicts with your existing X session(s)

Put the following in “/etc/init.d/xvfb”:

XVFB=/usr/bin/Xvfb
XVFBARGS=":0 -screen 0 1024x768x24 -ac +extension GLX +render -noreset"
PIDFILE=/var/run/xvfb.pid
case "$1" in
  start)
    echo -n "Starting virtual X frame buffer: Xvfb"
    start-stop-daemon --chuid www-data --start --quiet --pidfile $PIDFILE --make-pidfile --background --exec $XVFB -- $XVFBARGS
    echo "."
    ;;
  stop)
    echo -n "Stopping virtual X frame buffer: Xvfb"
    start-stop-daemon --chuid www-data --stop --quiet --pidfile $PIDFILE
    echo "."
    ;;
  restart)
    $0 stop
    $0 start
    ;;
  *)
        echo "Usage: /etc/init.d/xvfb {start|stop|restart}"
        exit 1
esac

exit 0

Enable your init script:

# update-rc.d xvfb defaults 10

Run your init script:

# /etc/init.d/xvfb start
Starting virtual X frame buffer: Xvfb.

Check to confirm your Xvfb is running:

# ps auxU www-data | grep [X]vfb
www-data 17852  0.0  0.2  54960  7904 ?        S    04:31   0:00 /usr/bin/Xvfb ...

Great! The daemon is up and running (hopefully).

Step 3: Modify the PHPWkHtmlToPdf Wrapper

We need to modify the wrapper’s call to wkhtmltopdf so it knows we are using a specific X session (see the notes about “:0″ above)

Open up the WkHtmlToPdf.php class/file and find the “getCommand” function (around line #252 at time of writing this post) and add a line to the beginning of the command variable that exports the DISPLAY variable to match the display number you used. In my case it’s “:0″, since I am running a headless server.

/**
    * @param string $filename the filename of the output file
    * @return string the wkhtmltopdf command string
    */
   public function getCommand($filename)
   {
       $command = 'export DISPLAY=":0";';
       $command .= $this->enableEscaping ? escapeshellarg($this->bin) : $this->bin;

       $command .= $this->renderOptions($this->options);

       foreach ($this->objects as $object) {
           $command .= ' ' . $object['input'];
           unset($object['input']);
           $command .= $this->renderOptions($object);
       }

       return $command . ' ' . $filename;
   }

Save, and you’re ready to….

Step 4: RENDER!

I use twig to fill up some html templates, but for the sake of an example, we’ll use PHP to grab the Google homepage and render it to a PDF.

include "classes/WkHtmlToPdf.php";

// instantiate the wrapper
$pdf = new WkHtmlToPdf;

// add the content, addPage can take HTML strings, URLs, or filenames.
$pdf->addPage("http://google.com");

// for this example, we'll just use send, which sends the PDF directly to the browser
// if (!$pdf->saveAs('/tmp/google.pdf')): // this saves to a file
if (!$pdf->send()):
    throw new Exception('Could not create PDF: ' . $pdf->getError());
endif;

Save this, run it, and you should be presented with well rendered PDF of the Google homepage.

There are a few options to get the render a bit more accurate, but you can see those at https://github.com/mikehaertl/phpwkhtmltopdf.

Enjoy and happy rendering!