On Mon, 11 Feb 2008 21:33:03 -0800, Robert Maas wrote:
> Do you know of any free software, written in Java or Common Lisp, which
> parses and/or generates MS-Word documents?
I believe that the OpenOffice Word input/output filters are in Java.
That's probably a reasonable place to start.
The only other MS-Word parsers that I know of are in C or C++. Don't
know any in lisp, I'm afraid.
On the other hand, Word in Office'07 produces XML, apparently, and there
are plenty of XML parsers in Java and lisp.
Cheers,
--
Andrew
From: Raymond Wiker
Subject: Re: Free software for parsing/generating MS-Word documents?
Date:
Message-ID: <m2tzkehqvr.fsf@Macintosh-2.local>
Andrew Reilly <···············@areilly.bpc-users.org> writes:
> On Mon, 11 Feb 2008 21:33:03 -0800, Robert Maas wrote:
>
>> Do you know of any free software, written in Java or Common Lisp, which
>> parses and/or generates MS-Word documents?
>
> I believe that the OpenOffice Word input/output filters are in Java.
> That's probably a reasonable place to start.
>
> The only other MS-Word parsers that I know of are in C or C++. Don't
> know any in lisp, I'm afraid.
>
> On the other hand, Word in Office'07 produces XML, apparently, and there
> are plenty of XML parsers in Java and lisp.
I suspect that you'll find MS' OOXML less straightforward to
use than you expect... MS' recent attempt at forcing this format
through ISO resulted in a list of 3500 issues that need clarification.
I think the Apache project has converters written in Java that
does a reasonable job for some of the older Office formats, at least.
On Tue, 12 Feb 2008 21:16:40 +0100, Raymond Wiker wrote:
> Andrew Reilly <···············@areilly.bpc-users.org> writes:
>
>> On Mon, 11 Feb 2008 21:33:03 -0800, Robert Maas wrote:
>>
>>> Do you know of any free software, written in Java or Common Lisp,
>>> which parses and/or generates MS-Word documents?
>>
>> I believe that the OpenOffice Word input/output filters are in Java.
>> That's probably a reasonable place to start.
>>
>> The only other MS-Word parsers that I know of are in C or C++. Don't
>> know any in lisp, I'm afraid.
>>
>> On the other hand, Word in Office'07 produces XML, apparently, and
>> there are plenty of XML parsers in Java and lisp.
>
> I suspect that you'll find MS' OOXML less straightforward to
> use than you expect... MS' recent attempt at forcing this format through
> ISO resulted in a list of 3500 issues that need clarification.
Indeed. I (very disingenuously) didn't make any comment about the XML
format other than that it could (most likely) be parsed by existing XML
tools. Not that I've said so in this forum, but I've always thought that
the fanatical excitement about how XML would liberate documents and file
formats was ridiculous. The encoding may be transparent, but the
semantics can be as opaque and application-specific as the author of the
schema desires. Hopefully OOXML will drive that fact home. Still,
that's a rant that's off-topic for this group, so I'll stop here.
Cheers,
--
Andrew
...
Andrew Reilly wrote:
> Indeed. I (very disingenuously) didn't make any comment about the XML
> format other than that it could (most likely) be parsed by existing XML
> tools. Not that I've said so in this forum, but I've always thought that
> the fanatical excitement about how XML would liberate documents and file
> formats was ridiculous. The encoding may be transparent, but the
> semantics can be as opaque and application-specific as the author of the
> schema desires. Hopefully OOXML will drive that fact home. Still,
> that's a rant that's off-topic for this group, so I'll stop here.
Here's an outlet: http://xmlsucks.blogspot.com/
- Daniel