How to: Save a crapload of money converting from print to web

We spend lots of dough each year converting material from our dead-tree editions into web-friendlier versions for our web sites. We crank out a bunch of PDFs, and send them through the ether to somewhere where the labor is cheap and the workday long, like Vietnam, Indonesia or Canada or something. Then some poor soul slices 'n dices them into jpegs and links and such, and sends 'em on back, and posts them on our site.

So I thought to myself, "Self, you can do that without having to do something silly like use people and worse, pay for it.

It's a work in progress, but it goes something like...

  • Export PDFs of ads from our DTI advertising system, and page PDFs from our Newsway prepress system.
  • Multiplex the PDFs through xpdf, imagemagick and swftools to extract text, convert to bitmaps and convert to Flash files respectively, with some proprietary workflow software. Maybe we'll OCR them with Tesseract if we can get a box with enough CPU horsepower, rather than the virtual machine it's running on, for extra text-extraction points.
  • Combine the files into an XML feed.
  • Send the files to the front-end system. Probably Drupal, but possibly a Rails app, or McClatchy's own Workbench CMS.
  • Display to the user with a combination of flash, jquery and CSS like so:

Shazaam! $35k saved.

Not to mention, jquery almost makes coding javascript fun. Almost.


hello,can you post the code of this cool pdf viewer or it is private?

Christian's picture

The PDF has actually been converted to a SWF (Flash) file with swftools, an open-source toolkit.

Javascript is used to wire up the zooming and panning functions. You can view source on this page to get the code, if you want.