Docs, PDFs and open formats. Huh???

So here’s an interesting challenge. Where I work, I had to put together a server that converts word .doc format files to .pdf files for some of our documents. After quite a bit of searching around (doc2pdf type programmes and scripts), I’d settled on linbox converter. Basically it’s a daemon that runs on a windows machine that has MS Office installed as well as ghostscript and python. A client sends a doc with a request for conversion into, for example, PDF format. The daemon receives, runs office and prints to, for example, a PDF file which it then sends back to the client.

The problem I’ve experienced is with this one document that has jpg pictures in it. Linbox-converter (and even printing to PDF from word itself) seem to fail utterly and completely. The result is some garbled 1.5K long file full of nonsense. Here’s the magic: openoffice (2.01) will do the conversion just fine.

Now the question: anyone out there know how one might send cmd line args to oowriter2 to do the conversion for me? That way I can write a daemon (kinda like linbox converter I guess) that’ll just wrap around ooffice2 for the conversions. Thoughts, suggestions? Note: it has to be programmatic/automatic/system addict- yeah.

14 thoughts on “Docs, PDFs and open formats. Huh???”

  1. a Python script to do exactly what you need. It worked in OOo 1, not sure about OOo 2, but should. You may have to do some work to get python to find the libraries… In non-Gentoo OOo there is a Python runtime inside the installation.

  2. This script is basically the same as wv or something. What I need is a script that’ll take not just the plain text, but the formatting, the tables, and especially the images, and convert the lot into a PDF. Basically, something that presses the “Convert to PDF” button 🙂

  3. Spent about half an hour googling around for openoffice macro and tried to download a few things of which http://oooconv.free.fr/batchconv/batchconv_en.html actually could be used.

    And then somehow I ended up on the very front page for OpenOffice (www.openoffice.org) that has the news link “Batch Conversion of Legacy Docs to OOo” (http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openoffice.html) 🙂 Hope this will be a good starting point, I’m quite sleepy right now and can’t put two together words, so…

  4. Simon,

    Funny thing is I found exactly that same page (the xml one) from a google search this morning while I was writing this blog entry 🙂

    Ravi,

    Unfortunately I think ooo-1 is being deprecated in Gentoo, if it hasn’t disappeared already. I do like the whole odt thing, though. Something like pyopenoffice for ooo2 would be perfect as it could even be deployed as a general pdf generating server for everyone’s use.

  5. Update:

    So I ran that script from the xml.com page and I get this dialog box:

    BASIC runtime error.
    An exception occurred
    Type: com.sun.star.lang.IllegalArgumentException
    Message: URL seems to be an unsupported on..

  6. Actually, there should be some scripting support in the openoffice development kit. I believe in the form of a binary. Unfortunately we don’t built it in the ebuild, I guess I’ve got to fix that. You can build it manually though ;-).

    Further, I’ve been told by people who tried to do this, that it is actually also against the microsoft office license to provide such a batch conversion service using microsoft office.

  7. You didn’t try very hard did you? 😉

    Google for ‘openoffice commandline pdf’

    Looking at the code I think “writer_pdf_Export” is a keyword you should look for on the internet. It seems to bring up some OOo macros when you enter that in Google.

    Hope this helps 🙂

  8. Are you sure that wv won’t do what you want? It definately deals with more than just text – it does formatting and images at least.

  9. AbiWord command-line’s pretty cool but

    abiword –to=pdf foo.doc

    only creates a “foo.pdf” file that’s really in abw format.

    what about running doc2pdf in Wine?

  10. Seemant, I’ve been using this little gem for months now:
    http://www.skynet.ie/~caolan/Fragments/ooo-cgi.html

    It takes a bit to get working, but it works like a charm. If you are running hardened, make sure to remove shared mem protection from both the OpenOffice binary AND the python binary that comes with OpenOffice.

    Also, make sure you have all the fonts you need installed and accessible from X.

  11. I am using a java library called JDOConverter. You can find it on sourceforge or try http://www.artofsolving.com/opensource/jodconverter/guide.
    It is supposed to run as command line tool, web service, stand-alone app, or java library. I am dealing with the com.sun.star.lang.IllegalArgumentException
    Message: URL seems to be an unsupported on..error as well right now. Maybe someone knows a solution…..

  12. I’m also using JODConverter. I also encountered the same problem when using remote OOo server. Try replacing the OpenOfficeDocumentConverter with StreamOpenOfficeDocumentConverter to work around file permission problems with a remote OOo connection.

    HTH

Comments are closed.