From: An[z]elmus
Subject: From Word to a comm- delimited file
Date: 
Message-ID: <bfqbm45b061fdne1ei0p6rt2ldum8k6be0@4ax.com>
I have no longer practiced with LISP for a year, but now I think I
need it and it seems that I have forgotten even the few things I
learned.
I want to bild comma-delimted files starting from a bunch of Word's
documents, then I want to import all data in a MySQL database.
The documents contains the inventory off all the books of a library.
They are thousands of books. I don't earn money for this job, 
After saving the file(s) in text format, each entry appears separated
from the other by a blank line and is organized in the following way,
line by line:

1) Classification (always present)
2) Author (some times missing)
3) Title (always present)
4) Subtitle or Translation (some times missing)
5) Pubblication data (always present)

As I said, I want to tranfer every thing in a CSV file with the 5
fields aforementioned. The data for columns 2 and 4 are sometimes
missing and I have to insert an empty field.
Here is a sample of the text file that need to be treated:

------------------------------------------------------
400 BUSON c 2
Buson - Sono futatsu no tabi
[Buson - I due viaggi]
TOKYO: Asahi Shimbunsha, 2001 � p. 183

400 BUSON c 3
Buson
TOKYO: Nihon Keizai Shimbunsha

400 BUSON c 4
Yosa Buson � Kakemeguru omoi
Yosa Buson � On the wings of art
SHIGA: Miho Museum, 2008 � p. 397

400 CHIKUDEN a 1
SASAKI, Kozo
Chikuden � Toyo bijutsu sensho
[Chikuden � Collana d�Arte Estremo-orientale]
TOKYO: Sansaisha, 1970, 1977 � tav. 38 + p. 74
------------------------------------------------------

As you can see, there probably are "regularities" thet may be captured
with regular espressions. For example: line 2 (Author wich is
sometimes missing), always start with the second name all in capital
letter while the third field (Title, always present) usually has only
the first letter capitalized. So my idea is to put an empty field (two
consecutive separators) where the Author is missing. After I have done
that, it would remain only to deal with one field out of 5 possibly
missing.