connection between lisp and speech recognition?

From: Richard Fateman
Subject: connection between lisp and speech recognition?
Date: Wed, 14 Jan 2004 17:58:18 +0000
Message-ID: <_qfNb.9720$dx5.6269@newssvr27.news.prodigy.com>

I'm trying to build a clean way of allowing a lisp program to
listen to voice input, in particular an FFI
interface to something like Microsoft's speech SDK, Viavoice, or
Dragon NS. This has become one of these challenges where you
believe it is easy if you know how to do it, but it could take
a very long time to figure out the simple solution.

Has anyone already done something like this?
thnx
RJF

Re: connection between lisp and speech recognition? Gorbag
- Re: connection between lisp and speech recognition? Richard Fateman
  - Re: connection between lisp and speech recognition? Gorbag
Re: connection between lisp and speech recognition? Rob Warnock
- Re: connection between lisp and speech recognition? Richard Fateman
  - Re: connection between lisp and speech recognition? Rob Warnock

From: Gorbag
Subject: Re: connection between lisp and speech recognition?
Date: Thu, 15 Jan 2004 16:20:22 +0000
Message-ID: <Q4zNb.482$K6.90@bos-service2.ext.raytheon.com>

When we did this, we decided we really didn't want the speech recognition
algorithms in the same process as our lisp (they suck up a lot of CPU, but
our experience was with using CMU Sphinx-II from 1997 so maybe things are
different now). Instead, we wrapped the recognizer with an "agent", stuck it
on its own CPU, and then used messages to communicate with it from lisp
processes, basically receiving asynchronous text strings when available. I
can point you to a TR if you are interested.

"Richard Fateman" <·······@cs.berkeley.edu> wrote in message
························@newssvr27.news.prodigy.com...
> I'm trying to build a clean way of allowing a lisp program to
> listen to voice input, in particular an FFI
> interface to something like Microsoft's speech SDK, Viavoice, or
> Dragon NS. This has become one of these challenges where you
> believe it is easy if you know how to do it, but it could take
> a very long time to figure out the simple solution.
>
> Has anyone already done something like this?
> thnx
> RJF
>

From: Richard Fateman
Subject: Re: connection between lisp and speech recognition?
Date: Thu, 15 Jan 2004 19:57:53 +0000
Message-ID: <bu6rc1$2t1v$1@agate.berkeley.edu>

I think the workstations (even desktops) of today have enough power
to do speech along with an application. A pointer to a TR
would be welcome, if it is accessible!

(Bruce Weimer, see note to this newsgroup on
  11/15/2003, seems to have gotten further than I did, but
then got stuck... I think we need someone who succeeded once.
Though Bruce and I would like to use Windows SAPI5 Speech..)
RJF


Gorbag wrote:
> When we did this, we decided we really didn't want the speech recognition
> algorithms in the same process as our lisp (they suck up a lot of CPU, but
> our experience was with using CMU Sphinx-II from 1997 so maybe things are
> different now). Instead, we wrapped the recognizer with an "agent", stuck it
> on its own CPU, and then used messages to communicate with it from lisp
> processes, basically receiving asynchronous text strings when available. I
> can point you to a TR if you are interested.
> 
> "Richard Fateman" <·······@cs.berkeley.edu> wrote in message
> ························@newssvr27.news.prodigy.com...
> 
>>I'm trying to build a clean way of allowing a lisp program to
>>listen to voice input, in particular an FFI
>>interface to something like Microsoft's speech SDK, Viavoice, or
>>Dragon NS. This has become one of these challenges where you
>>believe it is easy if you know how to do it, but it could take
>>a very long time to figure out the simple solution.
>>
>>Has anyone already done something like this?
>>thnx
>>RJF
>>
> 
> 
>

From: Gorbag
Subject: Re: connection between lisp and speech recognition?
Date: Fri, 16 Jan 2004 18:14:02 +0000
Message-ID: <nRVNb.486$K6.78@bos-service2.ext.raytheon.com>

"Richard Fateman" <·······@cs.berkeley.edu> wrote in message
··················@agate.berkeley.edu...
> I think the workstations (even desktops) of today have enough power
> to do speech along with an application. A pointer to a TR
> would be welcome, if it is accessible!

ftp://ftp.cs.rochester.edu/pub/papers/ai/96.tn5.Design_and_implementation_of_TRAINS-96_system.ps.gz

>
> (Bruce Weimer, see note to this newsgroup on
>   11/15/2003, seems to have gotten further than I did, but
> then got stuck... I think we need someone who succeeded once.
> Though Bruce and I would like to use Windows SAPI5 Speech..)
> RJF
>
>
> Gorbag wrote:
> > When we did this, we decided we really didn't want the speech
recognition
> > algorithms in the same process as our lisp (they suck up a lot of CPU,
but
> > our experience was with using CMU Sphinx-II from 1997 so maybe things
are
> > different now). Instead, we wrapped the recognizer with an "agent",
stuck it
> > on its own CPU, and then used messages to communicate with it from lisp
> > processes, basically receiving asynchronous text strings when available.
I
> > can point you to a TR if you are interested.
> >
> > "Richard Fateman" <·······@cs.berkeley.edu> wrote in message
> > ························@newssvr27.news.prodigy.com...
> >
> >>I'm trying to build a clean way of allowing a lisp program to
> >>listen to voice input, in particular an FFI
> >>interface to something like Microsoft's speech SDK, Viavoice, or
> >>Dragon NS. This has become one of these challenges where you
> >>believe it is easy if you know how to do it, but it could take
> >>a very long time to figure out the simple solution.
> >>
> >>Has anyone already done something like this?
> >>thnx
> >>RJF
> >>
> >
> >
> >
>

From: Rob Warnock
Subject: Re: connection between lisp and speech recognition?
Date: Fri, 16 Jan 2004 13:11:57 +0000
Message-ID: <Jrmdnea2pIOAfprdRVn-hA@speakeasy.net>

Richard Fateman  <·······@cs.berkeley.edu> wrote:
+---------------
| I'm trying to build a clean way of allowing a lisp program to
| listen to voice input...
+---------------

As much as it pains me to suggest it, look at some of the "Voice XML"
vendors, such as Nuance, VoiceGenie, SpeechWorks, TellMe, LumenVox, etc.
They basically make boxes which you can pre-load with a URL, and when a
call comes in the box makes an HTTP "GET" request to that URL (possibly
providing some query parameters, depending on the box and the application).
The HTTP server -- which could easily be a Lisp-based web server -- replies
with a script (written in either the "Voice XML" scripting language or some
proprietary scripting language) that tells the box what words (grammar) to
expect and how to proceed with the call (i.e., what state transitions to
make and the various URLs to "GET" or "POST" with the results of each
state transition).

TellMe and others even allow you limited free access to a unit if you
register as a developer. You point them to a URL on your server, and
they assign you a telephone number at their site. Then when anyone[1]
calls that number, their VMXL box does a "GET" from your URL across the
public Internet, and you're off and debugging...

VoiceGenie [I think] will alternatively sell you software that runs on
*your* platforms and takes raw PCM audio streams in over the 'Net (from
codecs co-located with your server or even somewhere else) and does the
voice-recognition function and then plays the same VXML game with your
HTTP server (which can also be either co-located or somewhere else).

From a programming languages point of view, VXML (and the related vendor-
proprietary languages) is (are) a horrible hack, but there are *LOTS* of
people deploying voice-based interactive applications out there these days
using VXML (and not just in simple emulation of tradition touch-tone menu
input, either)...

Anyway, my point is simply that hooking Common Lisp to a VXML box should
be really straightforward...


-Rob

[1] Normally you're the only one who calls it, at least during initial
    development, but once it's sort of running you could tell a small
    group of others whom you wanted to try out your application.

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Richard Fateman
Subject: Re: connection between lisp and speech recognition?
Date: Sat, 17 Jan 2004 17:32:59 +0000
Message-ID: <fleOb.12086$yc2.5250@newssvr27.news.prodigy.com>

Thanks for the suggestion.  It is different from what I
had in mind, which was to have a "multimodal" setup by which
I mean a human sitting at a workstation would have a headset/microphone,
and a stylus (and a keyboard sometimes).  (The purpose of this
would be to do handwriting and voice input of mathematics.  Yes
I know most peoples' first reaction is that you can't speak
mathematics-- but guess what, you can't write it either --
try to distinguish 1 | I l   0O   (C  _-  S5    |<  K   etc.
You CAN say bold capital script A.  ...)
If you have 2 modes, you should win.

Your suggestion requires that the user make a phone call for
the voice part. I think the sound quality would be too variable, and
it would be untrained recognition.. probably additional handicaps
I don't want to deal with.

RJF


Rob Warnock wrote:
> Richard Fateman  <·······@cs.berkeley.edu> wrote:
> +---------------
> | I'm trying to build a clean way of allowing a lisp program to
> | listen to voice input...
> +---------------
> 
> As much as it pains me to suggest it, look at some of the "Voice XML"
> vendors, such as Nuance, VoiceGenie, SpeechWorks, TellMe, LumenVox, etc.
> They basically make boxes which you can pre-load with a URL, and when a
> call comes in the box makes an HTTP "GET" request to that URL (possibly
> providing some query parameters, depending on the box and the application).
> The HTTP server -- which could easily be a Lisp-based web server -- replies
> with a script (written in either the "Voice XML" scripting language or some
> proprietary scripting language) that tells the box what words (grammar) to
> expect and how to proceed with the call (i.e., what state transitions to
> make and the various URLs to "GET" or "POST" with the results of each
> state transition).
> 
> TellMe and others even allow you limited free access to a unit if you
> register as a developer. You point them to a URL on your server, and
> they assign you a telephone number at their site. Then when anyone[1]
> calls that number, their VMXL box does a "GET" from your URL across the
> public Internet, and you're off and debugging...
> 
> VoiceGenie [I think] will alternatively sell you software that runs on
> *your* platforms and takes raw PCM audio streams in over the 'Net (from
> codecs co-located with your server or even somewhere else) and does the
> voice-recognition function and then plays the same VXML game with your
> HTTP server (which can also be either co-located or somewhere else).
> 
> From a programming languages point of view, VXML (and the related vendor-
> proprietary languages) is (are) a horrible hack, but there are *LOTS* of
> people deploying voice-based interactive applications out there these days
> using VXML (and not just in simple emulation of tradition touch-tone menu
> input, either)...
> 
> Anyway, my point is simply that hooking Common Lisp to a VXML box should
> be really straightforward...
> 
> 
> -Rob
> 
> [1] Normally you're the only one who calls it, at least during initial
>     development, but once it's sort of running you could tell a small
>     group of others whom you wanted to try out your application.
> 
> -----
> Rob Warnock			<····@rpw3.org>
> 627 26th Avenue			<URL:http://rpw3.org/>
> San Mateo, CA 94403		(650)572-2607
>

From: Rob Warnock
Subject: Re: connection between lisp and speech recognition?
Date: Sat, 17 Jan 2004 22:40:10 +0000
Message-ID: <voudnTGfapFXJJTdRVn-iQ@speakeasy.net>

Richard Fateman  <········@sbcglobal.net> wrote:
+---------------
| Rob Warnock wrote:
| > As much as it pains me to suggest it, look at some of the "Voice XML"
| > vendors, such as Nuance, VoiceGenie, SpeechWorks, TellMe, LumenVox, etc.
| > They basically make boxes which you can pre-load with a URL, and when a
| > call comes in the box makes an HTTP "GET" request to that URL...
|
| Thanks for the suggestion.  It is different from what I
| had in mind, which was to have a "multimodal" setup by which
| I mean a human sitting at a workstation would have a headset/microphone,
| and a stylus (and a keyboard sometimes).
...
| Your suggestion requires that the user make a phone call for the voice part.
+---------------

Sorry, I was perhaps unclear. AFAIK, most of those boxes [or some versions
of them] *can* be used in dedicated applications. I was mainly suggesting
using the telephony-based versions for applications development, 'cuz the
initial outlay of capital equipment is a lot less...


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607