From: D Herring
Subject: Re: Free software for parsing/generating MS-Word documents?
Date: 
Message-ID: <O9ednaSvyqiFpCzanZ2dnUVZ_jydnZ2d@comcast.com>
Robert Maas, see http://tinyurl.com/uh3t wrote:
> Do you know of any free software, written in Java or Common Lisp,
> which parses and/or generates MS-Word documents?

You underestimate the task.  Check out open office, koffice, et al to 
find libraries that attempt compatibility.  Its a quirky, poorly 
documented binary format.  Doesn't matter what language you use.

- Daniel
From: John Thingstad
Subject: Re: Free software for parsing/generating MS-Word documents?
Date: 
Message-ID: <op.t6ehf5oyut4oq5@pandora.alfanett.no>
P� Tue, 12 Feb 2008 07:13:46 +0100, skrev D Herring  
<········@at.tentpost.dot.com>:

> Robert Maas, see http://tinyurl.com/uh3t wrote:
>> Do you know of any free software, written in Java or Common Lisp,
>> which parses and/or generates MS-Word documents?
>
> You underestimate the task.  Check out open office, koffice, et al to  
> find libraries that attempt compatibility.  Its a quirky, poorly  
> documented binary format.  Doesn't matter what language you use.
>
> - Daniel

Truth with modifications. The newer Word2007 format is a XML format  
(compressed) and is well documented.
And it can be read in with cxml. There is a conversion tool from Microsoft  
that allows older Word versions to read/write to this format.

So I would:
1. Get the converter.
2. Load into Word.
3. Convert the document to Word2007 format.
4. Load it using sxml (unzip first).

Open XML File Format:
http://msdn2.microsoft.com/en-us/library/aa338205.aspx

Conversion tool:
http://www.microsoft.com/downloads/details.aspx?FamilyId=941b3470-3ae9-4aee-8f43-c6bb74cd1466&displaylang=en

CL zip lib:
http://common-lisp.net/project/zip/

CL cxml:
http://common-lisp.net/project/cxml/

--------------
John Thingstad