How to: Save a crapload of money converting from print to web

We spend lots of dough each year converting material from our dead-tree editions into web-friendlier versions for our web sites. We crank out a bunch of PDFs, and send them through the ether to somewhere where the labor is cheap and the workday long, like Vietnam, Indonesia or Canada or something. Then some poor soul slices 'n dices them into jpegs and links and such, and sends 'em on back, and posts them on our site.

So I thought to myself, "Self, you can do that without having to do something silly like use people and worse, pay for it.

It's a work in progress, but it goes something like...

  • Export PDFs of ads from our DTI advertising system, and page PDFs from our Newsway prepress system.
  • Multiplex the PDFs through xpdf, imagemagick and swftools to extract text, convert to bitmaps and convert to Flash files respectively, with some proprietary workflow software. Maybe we'll OCR them with Tesseract if we can get a box with enough CPU horsepower, rather than the virtual machine it's running on, for extra text-extraction points.
  • Combine the files into an XML feed.
  • Send the files to the front-end system. Probably Drupal, but possibly a Rails app, or McClatchy's own Workbench CMS.
  • Display to the user with a combination of flash, jquery and CSS like so:

Shazaam! $35k saved.

Not to mention, jquery almost makes coding javascript fun. Almost.

Reply

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <b> <blockquote> <s>
  • Lines and paragraphs break automatically.
  • Easily link to terms in various wikis. For help, see <a href="/interwiki/1">interwiki</a>.
  • Images can be added to this post.

More information about formatting options

Captcha
This question is used to make sure you are a human visitor and to prevent spam submissions.
Copy the characters (respecting upper/lower case) from the image.