RFP -- Consultant needed

From: Fletcher James
Subject: RFP -- Consultant needed
Date: Tue, 16 Dec 2003 20:34:01 +0000
Message-ID: <vtur1r50n3cfff@corp.supernews.com>

Levit & James, Inc.

Purpose:
My company, Levit & James, Inc. is seeking an individual or individuals to
provide consultation and code development for one or more components in a
new commercial product to be developed in the near term.

Broad Description of the Project:

L&J is starting a new application that requires the development of a module
that parses an arbitrarily large text string (the contents of an MS Word
document) to detect one or more substrings that meet one or more specific
rules criteria.  The functionality will be very similar to a grammar
checker.

The detection scheme must consider the following:

*    Rules will be categorized, and each category will have some number of
variants.

*      Rules for most variants and sub-variants are to be read from a source
rather than "hard-wired" into the module.

*      Some of the rules used for the detection employ the use of keywords,
while others do not. Public and private keyword lists will be employed.

*     Some of the words in the string may be abbreviated.

*     Some words/keywords will have variant spellings and some of the
detected phrases may have incomplete punctuation.

*     The detected substrings may or may not be entire sentences.

*     The detected substrings may be separated by delimiting punctuation
(e.g., a semi-colon).

Detected substrings that <may> meet the standards set by a rule must be
differentiated from those that <do> meet the rule's standards. These will be
termed "candidates". When a candidate is detected, the output string must
contain information about what part of the rule was violated (including rule
number).

The detection scheme will contain an error checker for substrings which meet
most but not all of the rules criteria. For each detection, the parser will
build an "output substring", which will contain the detected substring and
other information that categorizes the detected substring.

All output substrings will then be communicated to the main application
modules.

Considerations:
*     The target platform will be Win32, and the main interactive modules
will be written (by in-house L&J staff) in VB/VBA.

*     The parsing routines may be written in any appropriate language for
which a royalty-free runtime is available.

*    We expect that the consultant(s) will start from an existing code base
which provides similar functions for other applications.  The consultants
must be able to provide L&J with the necessary rights to use such code.

Qualifications:
*    Experience in writing text parsing routines/modules

*    Established track record in the development of successful commercial
applications

*    References

If you are interested:
As President and Director of Development for L&J, I will be the point of
contact for this project.  Please send a resume, references, or other
information to me at ······@levitjames.com .

About Levit & James, Inc.
Levit & James is a software company which specializes in document conversion
and add-in products for Microsoft Word. Our customer base includes Fortune
500 corporations, large law firms, local, state, and federal government
organizations, and professional associations.

For more information, please see our website.


-- 
Fletcher James
President
Levit & James, Inc.
······@levitjames.com
(703) 771-1549

CrossEyes  Reveals the Codes in Word!
www.levitjames.com

Re: RFP -- Consultant needed Joe Marshall
- Re: RFP -- Consultant needed Fletcher James
  - Re: RFP -- Consultant needed Charles Pritchard
    - Re: RFP -- Consultant needed ·······@noshpam.lbl.government
  - Re: RFP -- Consultant needed Albert van der Horst

From: Joe Marshall
Subject: Re: RFP -- Consultant needed
Date: Tue, 16 Dec 2003 21:56:27 +0000
Message-ID: <k74wl94k.fsf@ccs.neu.edu>

"Fletcher James" <······@levitjames.com> writes:
>
> The detection scheme must consider the following:

Oh boy!

> arbitrarily large text string

`Arbitrarily large' is pretty large.

> (the contents of an MS Word document)

But that's not a string.  Do you mean `the text of an MS Word
document'?  Is this extracted somehow or is the parser expected to
decode the Word document?

> *    Rules will be categorized, and each category will have some number of
>      variants.

Er, um, yeah.  Since `variants' and `categories' are not defined or
used elsewhere in the spec, what use are they?  

> *  Rules for most variants and sub-variants are to be read from a source
>    rather than "hard-wired" into the module.

What sort of source?

> * Some of the rules used for the detection employ the use of keywords,
>   while others do not.  Public and private keyword lists will be
>   employed.

What sort of keyword use?  Are the keywords categorized?  Would a rule
using a keyword say something like this:

   If <noun> followed by keyword `drinking' then ....

or more like this:

   If <noun> followed by <keyword drawn from private list> then ....

> * Some of the words in the string may be abbreviated.

Canonically?  Arbitrarily?

> *     Some words/keywords will have variant spellings and some of the
> detected phrases may have incomplete punctuation.

Are the variants enumerated?

> * The detected substrings may or may not be entire sentences.

Any substring containing at least one word may or may not be an entire
sentence.

> *     The detected substrings may be separated by delimiting punctuation
> (e.g., a semi-colon).
>
> Detected substrings that <may> meet the standards set by a rule must be
> differentiated from those that <do> meet the rule's standards. 
> These will be termed "candidates".

This indicates at least two levels of acceptance.  Is the word `may'
to be interpreted as referring to the word `may' in the above specs?
So therefore a rule that matches an abbreviated string would be a
`candidate'?  How about a rule that matches a substring, but *is* a
complete sentence?

> When a candidate is detected, the output string must contain
> information about what part of the rule was violated (including rule
> number).

Like ``Rule 1 was violated because the keyword `inconsequential'
doesn't match the discovered text `Fava beans' at position 0.''

Presumably most rules will be `violated' at nearly all times.  How
much of a match do you need to have a candidate?  How does the engine
differentiate criteria that indicate that a rule is completely
irrelevant from those criteria that indicate that the rule is mostly
relevant? 

> The detection scheme will contain an error checker for substrings which meet
> most but not all of the rules criteria. 

Same question.

> For each detection, the parser will
> build an "output substring", which will contain the detected substring and
> other information that categorizes the detected substring.
>
> All output substrings will then be communicated to the main application
> modules.

You mean they won't simply be discarded?

--------

I'm just poking a little fun at your `official' spec.
It is nice to see job openings, especially ones where you give
latitude to use whatever language/technology that appears the best.

From: Fletcher James
Subject: Re: RFP -- Consultant needed
Date: Wed, 17 Dec 2003 01:27:23 +0000
Message-ID: <vtvc7rti2v90a8@corp.supernews.com>

CLARIFICATION:   This is not intended as a specification.  It is simply
intended to give the right person enough of the flavor of the project, to
recognize that it's something they might be interested in.  More details
will be forthcoming under NDA.

CAVEAT:  I am an expert programmer with 30 years experience, and have a
small staff of top-notch programmers, BUT we know almost nothing about
natural language processing.  That's why we are looking to hire an expert,
rather than start from scratch ourselves.  You will please forgive me if I
use a generic English description, and use words which have somewhat
different meaning in your technical jargon.  If we hire you for the job, you
can feel free to straighten us out on these matters on day one.
-- 
Fletcher James
President
Levit & James, Inc.
······@levitjames.com

CrossEyes  Reveals the Codes in Word!
www.levitjames.com
"Joe Marshall" <···@ccs.neu.edu> wrote in message
·················@ccs.neu.edu...
>
>
> "Fletcher James" <······@levitjames.com> writes:
> >
> > The detection scheme must consider the following:
>
> Oh boy!
>
> > arbitrarily large text string
>
> `Arbitrarily large' is pretty large.
>
> > (the contents of an MS Word document)
>
> But that's not a string.  Do you mean `the text of an MS Word
> document'?  Is this extracted somehow or is the parser expected to
> decode the Word document?
>
> > *    Rules will be categorized, and each category will have some number
of
> >      variants.
>
> Er, um, yeah.  Since `variants' and `categories' are not defined or
> used elsewhere in the spec, what use are they?
>
> > *  Rules for most variants and sub-variants are to be read from a source
> >    rather than "hard-wired" into the module.
>
> What sort of source?
>
> > * Some of the rules used for the detection employ the use of keywords,
> >   while others do not.  Public and private keyword lists will be
> >   employed.
>
> What sort of keyword use?  Are the keywords categorized?  Would a rule
> using a keyword say something like this:
>
>    If <noun> followed by keyword `drinking' then ....
>
> or more like this:
>
>    If <noun> followed by <keyword drawn from private list> then ....
>
> > * Some of the words in the string may be abbreviated.
>
> Canonically?  Arbitrarily?
>
> > *     Some words/keywords will have variant spellings and some of the
> > detected phrases may have incomplete punctuation.
>
> Are the variants enumerated?
>
> > * The detected substrings may or may not be entire sentences.
>
> Any substring containing at least one word may or may not be an entire
> sentence.
>
> > *     The detected substrings may be separated by delimiting punctuation
> > (e.g., a semi-colon).
> >
> > Detected substrings that <may> meet the standards set by a rule must be
> > differentiated from those that <do> meet the rule's standards.
> > These will be termed "candidates".
>
> This indicates at least two levels of acceptance.  Is the word `may'
> to be interpreted as referring to the word `may' in the above specs?
> So therefore a rule that matches an abbreviated string would be a
> `candidate'?  How about a rule that matches a substring, but *is* a
> complete sentence?
>
> > When a candidate is detected, the output string must contain
> > information about what part of the rule was violated (including rule
> > number).
>
> Like ``Rule 1 was violated because the keyword `inconsequential'
> doesn't match the discovered text `Fava beans' at position 0.''
>
> Presumably most rules will be `violated' at nearly all times.  How
> much of a match do you need to have a candidate?  How does the engine
> differentiate criteria that indicate that a rule is completely
> irrelevant from those criteria that indicate that the rule is mostly
> relevant?
>
> > The detection scheme will contain an error checker for substrings which
meet
> > most but not all of the rules criteria.
>
> Same question.
>
> > For each detection, the parser will
> > build an "output substring", which will contain the detected substring
and
> > other information that categorizes the detected substring.
> >
> > All output substrings will then be communicated to the main application
> > modules.
>
> You mean they won't simply be discarded?
>
> --------
>
> I'm just poking a little fun at your `official' spec.
> It is nice to see job openings, especially ones where you give
> latitude to use whatever language/technology that appears the best.

From: Charles Pritchard
Subject: Re: RFP -- Consultant needed
Date: Wed, 17 Dec 2003 03:27:11 +0000
Message-ID: <H3QDb.2123$6l1.124@okepread03>

This message is more useful than the first you posted.

Because I was looking for an excuse to post the link anyway --
WordNet
http://www.cogsci.princeton.edu/~wn/

I suggest you use it as a buzzword when cleaning down your list of
applicants.




"Fletcher James" <······@levitjames.com> wrote in message
···················@corp.supernews.com...
> CLARIFICATION:   This is not intended as a specification.  It is simply
> intended to give the right person enough of the flavor of the project, to
> recognize that it's something they might be interested in.  More details
> will be forthcoming under NDA.
>
> CAVEAT:  I am an expert programmer with 30 years experience, and have a
> small staff of top-notch programmers, BUT we know almost nothing about
> natural language processing.  That's why we are looking to hire an expert,
> rather than start from scratch ourselves.  You will please forgive me if I
> use a generic English description, and use words which have somewhat
> different meaning in your technical jargon.  If we hire you for the job,
you
> can feel free to straighten us out on these matters on day one.
> -- 
> Fletcher James
> President
> Levit & James, Inc.
> ······@levitjames.com
>
> CrossEyes  Reveals the Codes in Word!
> www.levitjames.com
> "Joe Marshall" <···@ccs.neu.edu> wrote in message
> ·················@ccs.neu.edu...
> >
> >
> > "Fletcher James" <······@levitjames.com> writes:
> > >
> > > The detection scheme must consider the following:
> >
> > Oh boy!
> >
> > > arbitrarily large text string
> >
> > `Arbitrarily large' is pretty large.
> >
> > > (the contents of an MS Word document)
> >
> > But that's not a string.  Do you mean `the text of an MS Word
> > document'?  Is this extracted somehow or is the parser expected to
> > decode the Word document?
> >
> > > *    Rules will be categorized, and each category will have some
number
> of
> > >      variants.
> >
> > Er, um, yeah.  Since `variants' and `categories' are not defined or
> > used elsewhere in the spec, what use are they?
> >
> > > *  Rules for most variants and sub-variants are to be read from a
source
> > >    rather than "hard-wired" into the module.
> >
> > What sort of source?
> >
> > > * Some of the rules used for the detection employ the use of keywords,
> > >   while others do not.  Public and private keyword lists will be
> > >   employed.
> >
> > What sort of keyword use?  Are the keywords categorized?  Would a rule
> > using a keyword say something like this:
> >
> >    If <noun> followed by keyword `drinking' then ....
> >
> > or more like this:
> >
> >    If <noun> followed by <keyword drawn from private list> then ....
> >
> > > * Some of the words in the string may be abbreviated.
> >
> > Canonically?  Arbitrarily?
> >
> > > *     Some words/keywords will have variant spellings and some of the
> > > detected phrases may have incomplete punctuation.
> >
> > Are the variants enumerated?
> >
> > > * The detected substrings may or may not be entire sentences.
> >
> > Any substring containing at least one word may or may not be an entire
> > sentence.
> >
> > > *     The detected substrings may be separated by delimiting
punctuation
> > > (e.g., a semi-colon).
> > >
> > > Detected substrings that <may> meet the standards set by a rule must
be
> > > differentiated from those that <do> meet the rule's standards.
> > > These will be termed "candidates".
> >
> > This indicates at least two levels of acceptance.  Is the word `may'
> > to be interpreted as referring to the word `may' in the above specs?
> > So therefore a rule that matches an abbreviated string would be a
> > `candidate'?  How about a rule that matches a substring, but *is* a
> > complete sentence?
> >
> > > When a candidate is detected, the output string must contain
> > > information about what part of the rule was violated (including rule
> > > number).
> >
> > Like ``Rule 1 was violated because the keyword `inconsequential'
> > doesn't match the discovered text `Fava beans' at position 0.''
> >
> > Presumably most rules will be `violated' at nearly all times.  How
> > much of a match do you need to have a candidate?  How does the engine
> > differentiate criteria that indicate that a rule is completely
> > irrelevant from those criteria that indicate that the rule is mostly
> > relevant?
> >
> > > The detection scheme will contain an error checker for substrings
which
> meet
> > > most but not all of the rules criteria.
> >
> > Same question.
> >
> > > For each detection, the parser will
> > > build an "output substring", which will contain the detected substring
> and
> > > other information that categorizes the detected substring.
> > >
> > > All output substrings will then be communicated to the main
application
> > > modules.
> >
> > You mean they won't simply be discarded?
> >
> > --------
> >
> > I'm just poking a little fun at your `official' spec.
> > It is nice to see job openings, especially ones where you give
> > latitude to use whatever language/technology that appears the best.
>
>

From: ·······@noshpam.lbl.government
Subject: Re: RFP -- Consultant needed
Date: Wed, 17 Dec 2003 04:04:50 +0000
Message-ID: <Pine.LNX.4.44.0312170400550.15744-100000@thar.lbl.gov>

And don't forget about Morphix-NLP, the new distro based off of
Morphix GNU/Linux ( which is based off of Knoppix, which is based off
of Debian... ) which is pre-packaged with all of the latest
open-source Natural Language Processing (NLP) software:

Linguistics Meets Linux: Morphix-NLP
http://ileriseviye.org/arasayfa.php?inode=morphixnlp.html

~Tomer



On Dec 16, 2003 at 7:27pm, Charles Pritchard wrote:

chuck >Date: Tue, 16 Dec 2003 19:27:11 -0800
chuck >From: Charles Pritchard <·····@visc.us>
chuck >Newsgroups: comp.lang.forth, comp.lang.lisp, comp.lang.prolog,
chuck >    comp.lang.scheme
chuck >Subject: Re: RFP -- Consultant needed
chuck >
chuck >This message is more useful than the first you posted.
chuck >
chuck >Because I was looking for an excuse to post the link anyway --
chuck >WordNet
chuck >http://www.cogsci.princeton.edu/~wn/
chuck >
chuck >I suggest you use it as a buzzword when cleaning down your list of
chuck >applicants.
chuck >
chuck >
chuck >
chuck >
chuck >"Fletcher James" <······@levitjames.com> wrote in message
chuck >···················@corp.supernews.com...
chuck >> CLARIFICATION:   This is not intended as a specification.  It is simply
chuck >> intended to give the right person enough of the flavor of the project, to
chuck >> recognize that it's something they might be interested in.  More details
chuck >> will be forthcoming under NDA.
chuck >>
chuck >> CAVEAT:  I am an expert programmer with 30 years experience, and have a
chuck >> small staff of top-notch programmers, BUT we know almost nothing about
chuck >> natural language processing.  That's why we are looking to hire an expert,
chuck >> rather than start from scratch ourselves.  You will please forgive me if I
chuck >> use a generic English description, and use words which have somewhat
chuck >> different meaning in your technical jargon.  If we hire you for the job,
chuck >you
chuck >> can feel free to straighten us out on these matters on day one.
chuck >> -- 
chuck >> Fletcher James
chuck >> President
chuck >> Levit & James, Inc.
chuck >> ······@levitjames.com
chuck >>
chuck >> CrossEyes  Reveals the Codes in Word!
chuck >> www.levitjames.com
chuck >> "Joe Marshall" <···@ccs.neu.edu> wrote in message
chuck >> ·················@ccs.neu.edu...
chuck >> >
chuck >> >
chuck >> > "Fletcher James" <······@levitjames.com> writes:
chuck >> > >
chuck >> > > The detection scheme must consider the following:
chuck >> >
chuck >> > Oh boy!
chuck >> >
chuck >> > > arbitrarily large text string
chuck >> >
chuck >> > `Arbitrarily large' is pretty large.
chuck >> >
chuck >> > > (the contents of an MS Word document)
chuck >> >
chuck >> > But that's not a string.  Do you mean `the text of an MS Word
chuck >> > document'?  Is this extracted somehow or is the parser expected to
chuck >> > decode the Word document?
chuck >> >
chuck >> > > *    Rules will be categorized, and each category will have some
chuck >number
chuck >> of
chuck >> > >      variants.
chuck >> >
chuck >> > Er, um, yeah.  Since `variants' and `categories' are not defined or
chuck >> > used elsewhere in the spec, what use are they?
chuck >> >
chuck >> > > *  Rules for most variants and sub-variants are to be read from a
chuck >source
chuck >> > >    rather than "hard-wired" into the module.
chuck >> >
chuck >> > What sort of source?
chuck >> >
chuck >> > > * Some of the rules used for the detection employ the use of keywords,
chuck >> > >   while others do not.  Public and private keyword lists will be
chuck >> > >   employed.
chuck >> >
chuck >> > What sort of keyword use?  Are the keywords categorized?  Would a rule
chuck >> > using a keyword say something like this:
chuck >> >
chuck >> >    If <noun> followed by keyword `drinking' then ....
chuck >> >
chuck >> > or more like this:
chuck >> >
chuck >> >    If <noun> followed by <keyword drawn from private list> then ....
chuck >> >
chuck >> > > * Some of the words in the string may be abbreviated.
chuck >> >
chuck >> > Canonically?  Arbitrarily?
chuck >> >
chuck >> > > *     Some words/keywords will have variant spellings and some of the
chuck >> > > detected phrases may have incomplete punctuation.
chuck >> >
chuck >> > Are the variants enumerated?
chuck >> >
chuck >> > > * The detected substrings may or may not be entire sentences.
chuck >> >
chuck >> > Any substring containing at least one word may or may not be an entire
chuck >> > sentence.
chuck >> >
chuck >> > > *     The detected substrings may be separated by delimiting
chuck >punctuation
chuck >> > > (e.g., a semi-colon).
chuck >> > >
chuck >> > > Detected substrings that <may> meet the standards set by a rule must
chuck >be
chuck >> > > differentiated from those that <do> meet the rule's standards.
chuck >> > > These will be termed "candidates".
chuck >> >
chuck >> > This indicates at least two levels of acceptance.  Is the word `may'
chuck >> > to be interpreted as referring to the word `may' in the above specs?
chuck >> > So therefore a rule that matches an abbreviated string would be a
chuck >> > `candidate'?  How about a rule that matches a substring, but *is* a
chuck >> > complete sentence?
chuck >> >
chuck >> > > When a candidate is detected, the output string must contain
chuck >> > > information about what part of the rule was violated (including rule
chuck >> > > number).
chuck >> >
chuck >> > Like ``Rule 1 was violated because the keyword `inconsequential'
chuck >> > doesn't match the discovered text `Fava beans' at position 0.''
chuck >> >
chuck >> > Presumably most rules will be `violated' at nearly all times.  How
chuck >> > much of a match do you need to have a candidate?  How does the engine
chuck >> > differentiate criteria that indicate that a rule is completely
chuck >> > irrelevant from those criteria that indicate that the rule is mostly
chuck >> > relevant?
chuck >> >
chuck >> > > The detection scheme will contain an error checker for substrings
chuck >which
chuck >> meet
chuck >> > > most but not all of the rules criteria.
chuck >> >
chuck >> > Same question.
chuck >> >
chuck >> > > For each detection, the parser will
chuck >> > > build an "output substring", which will contain the detected substring
chuck >> and
chuck >> > > other information that categorizes the detected substring.
chuck >> > >
chuck >> > > All output substrings will then be communicated to the main
chuck >application
chuck >> > > modules.
chuck >> >
chuck >> > You mean they won't simply be discarded?
chuck >> >
chuck >> > --------
chuck >> >
chuck >> > I'm just poking a little fun at your `official' spec.
chuck >> > It is nice to see job openings, especially ones where you give
chuck >> > latitude to use whatever language/technology that appears the best.
chuck >>
chuck >>
chuck >
chuck >
chuck >

From: Albert van der Horst
Subject: Re: RFP -- Consultant needed
Date: Wed, 17 Dec 2003 12:55:02 +0000
Message-ID: <Hq1IJq.6w0.1.spenarn@spenarnc.xs4all.nl>

In article <··············@corp.supernews.com>,
Fletcher James <······@levitjames.com> wrote:
>CLARIFICATION:   This is not intended as a specification.  It is simply
>intended to give the right person enough of the flavor of the project, to
>recognize that it's something they might be interested in.  More details
>will be forthcoming under NDA.
>
>CAVEAT:  I am an expert programmer with 30 years experience, and have a
>small staff of top-notch programmers, BUT we know almost nothing about
>natural language processing.  That's why we are looking to hire an expert,

As an aside, you give more information than in the remainder of
this post or the previous one.

" nlp on ms-word files " that would have been sufficient.
I may apply, if your NDA is not overly restrictive.

(Some NDA's effectively forbid you to use the inevitable gain in
experience, and from then on you are at least formally and legally at
the mercy of the company for the rest of your life.)
Is the NDA available at your site?

>Fletcher James

Groetjes Albert
-- 
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
        One man-hour to invent,
                One man-week to implement,
                        One lawyer-year to patent.