From: Andrew Reilly
Subject: Re: Free software for parsing/generating MS-Word documents?
Date: 
Message-ID: <61cs2jF1uhpmmU1@mid.individual.net>
On Mon, 11 Feb 2008 21:33:03 -0800, Robert Maas wrote:

> Do you know of any free software, written in Java or Common Lisp, which
> parses and/or generates MS-Word documents?

I believe that the OpenOffice Word input/output filters are in Java.  
That's probably a reasonable place to start.

The only other MS-Word parsers that I know of are in C or C++.  Don't 
know any in lisp, I'm afraid.

On the other hand, Word in Office'07 produces XML, apparently, and there 
are plenty of XML parsers in Java and lisp.

Cheers,

-- 
Andrew

From: Raymond Wiker
Subject: Re: Free software for parsing/generating MS-Word documents?
Date: 
Message-ID: <m2tzkehqvr.fsf@Macintosh-2.local>
Andrew Reilly <···············@areilly.bpc-users.org> writes:

> On Mon, 11 Feb 2008 21:33:03 -0800, Robert Maas wrote:
>
>> Do you know of any free software, written in Java or Common Lisp, which
>> parses and/or generates MS-Word documents?
>
> I believe that the OpenOffice Word input/output filters are in Java.  
> That's probably a reasonable place to start.
>
> The only other MS-Word parsers that I know of are in C or C++.  Don't 
> know any in lisp, I'm afraid.
>
> On the other hand, Word in Office'07 produces XML, apparently, and there 
> are plenty of XML parsers in Java and lisp.

	I suspect that you'll find MS' OOXML less straightforward to
use than you expect... MS' recent attempt at forcing this format
through ISO resulted in a list of 3500 issues that need clarification.

	I think the Apache project has converters written in Java that
does a reasonable job for some of the older Office formats, at least.
From: Andrew Reilly
Subject: Re: Free software for parsing/generating MS-Word documents?
Date: 
Message-ID: <61empeF1v3ut3U1@mid.individual.net>
On Tue, 12 Feb 2008 21:16:40 +0100, Raymond Wiker wrote:

> Andrew Reilly <···············@areilly.bpc-users.org> writes:
> 
>> On Mon, 11 Feb 2008 21:33:03 -0800, Robert Maas wrote:
>>
>>> Do you know of any free software, written in Java or Common Lisp,
>>> which parses and/or generates MS-Word documents?
>>
>> I believe that the OpenOffice Word input/output filters are in Java.
>> That's probably a reasonable place to start.
>>
>> The only other MS-Word parsers that I know of are in C or C++.  Don't
>> know any in lisp, I'm afraid.
>>
>> On the other hand, Word in Office'07 produces XML, apparently, and
>> there are plenty of XML parsers in Java and lisp.
> 
> 	I suspect that you'll find MS' OOXML less straightforward to
> use than you expect... MS' recent attempt at forcing this format through
> ISO resulted in a list of 3500 issues that need clarification.

Indeed.  I (very disingenuously) didn't make any comment about the XML 
format other than that it could (most likely) be parsed by existing XML 
tools.  Not that I've said so in this forum, but I've always thought that 
the fanatical excitement about how XML would liberate documents and file 
formats was ridiculous.  The encoding may be transparent, but the 
semantics can be as opaque and application-specific as the author of the 
schema desires.  Hopefully OOXML will drive that fact home.  Still, 
that's a rant that's off-topic for this group, so I'll stop here.

Cheers,

-- 
Andrew
From: D Herring
Subject: Re: Free software for parsing/generating MS-Word documents?
Date: 
Message-ID: <2ZSdneI5G_dK3S_anZ2dnUVZ_qTinZ2d@comcast.com>
...
Andrew Reilly wrote:
> Indeed.  I (very disingenuously) didn't make any comment about the XML 
> format other than that it could (most likely) be parsed by existing XML 
> tools.  Not that I've said so in this forum, but I've always thought that 
> the fanatical excitement about how XML would liberate documents and file 
> formats was ridiculous.  The encoding may be transparent, but the 
> semantics can be as opaque and application-specific as the author of the 
> schema desires.  Hopefully OOXML will drive that fact home.  Still, 
> that's a rant that's off-topic for this group, so I'll stop here.

Here's an outlet: http://xmlsucks.blogspot.com/

- Daniel