Is there any library for converting common English words to its equivalent numeric value in digits?

From: normanj
Subject: Is there any library for converting common English words to its 	equivalent numeric value in digits?
Date: Sat, 22 Mar 2008 16:36:13 +0000
Message-ID: <2586e5e7-0360-4015-b994-37cf13ed9fe9@s19g2000prg.googlegroups.com>

I found one in Python, and one in .Net, then I wonder if there is a
Lisp version.

When I started, I realized it's a little complicated then it seems
like to be.

Re: Is there any library for converting common English words to its equivalent numeric value in digits? Brian
- Re: Is there any library for converting common English words to its equivalent numeric value in digits? normanj
Re: Is there any library for converting common English words to its equivalent numeric value in digits? Jason
Re: Is there any library for converting common English words to its equivalent numeric value in digits? John Thingstad
- Re: Is there any library for converting common English words to its equivalent numeric value in digits? normanj
- Re: Is there any library for converting common English words to its equivalent numeric value in digits? Brian
  - Re: Is there any library for converting common English words to its equivalent numeric value in digits? Steven M. Haflich
    - Re: Is there any library for converting common English words to its equivalent numeric value in digits? Steven M. Haflich
    - Re: Is there any library for converting common English words to its equivalent numeric value in digits? Victor Kryukov
    - Re: Is there any library for converting common English words to its equivalent numeric value in digits? Kent M Pitman
      - Re: Is there any library for converting common English words to its equivalent numeric value in digits? Steven M. Haflich
        Re: Is there any library for converting common English words to its equivalent numeric value in digits? Kent M Pitman
        Re: Is there any library for converting common English words to its equivalent numeric value in digits? Steven M. Haflich
        Re: Is there any library for converting common English words to its equivalent numeric value in digits? Robert Maas, see http://tinyurl.com/uh3t
        Re: Is there any library for converting common English words to its equivalent numeric value in digits? Robert Maas, see http://tinyurl.com/uh3t

From: Brian
Subject: Re: Is there any library for converting common English words to its 	equivalent numeric value in digits?
Date: Sat, 22 Mar 2008 16:56:19 +0000
Message-ID: <03d324d8-02c8-43c6-975b-b7c3941aaed1@e39g2000hsf.googlegroups.com>

normanj wrote:
> I found one in Python, and one in .Net, then I wonder if there is a
> Lisp version.
>
> When I started, I realized it's a little complicated then it seems
> like to be.
Are you wanting a some-function such that:
(some-function "four")  => 4
(some-function "ten")  => 10
etc.?

From: normanj
Subject: Re: Is there any library for converting common English words to its 	equivalent numeric value in digits?
Date: Sat, 22 Mar 2008 17:56:08 +0000
Message-ID: <194f2c38-7406-473f-ba74-ffdb2b5e1170@u10g2000prn.googlegroups.com>

> Are you wanting a some-function such that:
> (some-function "four")  => 4
> (some-function "ten")  => 10

Yes, but more complicated than these.

Desired samples are:
  forty-four (or forty four)
  nine hundred and ninety-nine
  eight thousand, seven hundred and sixty-five

From: Jason
Subject: Re: Is there any library for converting common English words to its 	equivalent numeric value in digits?
Date: Sun, 23 Mar 2008 02:08:47 +0000
Message-ID: <23b52e37-55f0-4a65-8094-ef0a7f32035f@e10g2000prf.googlegroups.com>

On Mar 22, 9:36 am, normanj <·······@gmail.com> wrote:
> I found one in Python, and one in .Net, then I wonder if there is a
> Lisp version.
>
> When I started, I realized it's a little complicated then it seems
> like to be.

(char-code #\a)
97

From: John Thingstad
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Sat, 22 Mar 2008 18:10:33 +0000
Message-ID: <op.t8fjfvfeut4oq5@pandora.alfanett.no>

P� Sat, 22 Mar 2008 17:36:13 +0100, skrev normanj <·······@gmail.com>:

> I found one in Python, and one in .Net, then I wonder if there is a
> Lisp version.
>
> When I started, I realized it's a little complicated then it seems
> like to be.

yes, see format..

--------------
John Thingstad

From: normanj
Subject: Re: Is there any library for converting common English words to its 	equivalent numeric value in digits?
Date: Sat, 22 Mar 2008 19:30:08 +0000
Message-ID: <3042d001-2236-43fd-ae51-fd8c21d44fda@i29g2000prf.googlegroups.com>

> yes, see format..
>

With "~R" directive, format can print numbers as English words, but
not reverse.
And, normal English words are not as strict as digits, so I guess
there will not be a general solution for such a task.

From: Brian
Subject: Re: Is there any library for converting common English words to its 	equivalent numeric value in digits?
Date: Sat, 22 Mar 2008 19:30:38 +0000
Message-ID: <a667d263-f683-409b-9e9b-c53608384e05@a23g2000hsc.googlegroups.com>

John Thingstad wrote:
> P� Sat, 22 Mar 2008 17:36:13 +0100, skrev normanj <·······@gmail.com>:
>
> > I found one in Python, and one in .Net, then I wonder if there is a
> > Lisp version.
> >
> > When I started, I realized it's a little complicated then it seems
> > like to be.
>
> yes, see format..
>
> --------------
> John Thingstad
format can print 74823957 as "seventy-four million, eight hundred and
twenty-three thousand, nine hundred and fifty-seven", but it can't go
the other way.

From: Steven M. Haflich
Subject: Re: Is there any library for converting common English words to its  equivalent numeric value in digits?
Date: Sun, 23 Mar 2008 01:04:15 +0000
Message-ID: <j0iFj.19307$xq2.2619@newssvr21.news.prodigy.net>

Brian wrote:
> format can print 74823957 as "seventy-four million, eight hundred and
> twenty-three thousand, nine hundred and fifty-seven", but it can't go
> the other way.

Here is an implementation of the reverse, written long ago.  Sorry about 
any folded lines.  It includes a setf expander so you can write things
like
  (incf (english-number) (aref my-array index))

;;;;;;;;;; english-number.cl  ···@franz.com

(in-package :user)

(macrolet ((e (x &optional ordinalp)
	     `(let ((english (format nil (if ,ordinalp "~:r" "~r") ,x)))
		(list* (subseq english (1+ (or (position #\space english) -1)))
		       ,x
		       ,ordinalp))))
   (defparameter *number-words*
       (nconc (loop for n from 0 below 20 collect (e n) collect (e n t))
	     (loop for n from 20 below 100 by 10 collect (e n) collect (e n t))))

   (defparameter *number-multipliers*
       (loop for n in '(1000 1000000 1000000000 1000000000000)
	  collect (e n) collect (e n t))))

;; This parses numbers such as are produced by format ~r.
(defun english-number (string)
   (flet ((word (string start)
	   (nth-value 1
		      (match-re "[a-z]+" string :start start :return :index))))
     (let* ((ordinalp nil)
	   (len (length string))
	   (start 0)
	   (minusp (let ((pos (word string 0)))
		     (if (string= (subseq string (car pos) (cdr pos)) "negative")
			 (progn (setf start (cdr pos)) -1)
		       1))))
       (values
        (* minusp
	  (loop
	      while (< start len)
	      summing (* (loop with sum = 0
			     while (< start len)
			     as pos = (word string start)
			     while pos
			     as num-string = (subseq string (car pos) (cdr pos))
			     as num = (cdr (assoc num-string *number-words* :test #'string=))
			     if num
			     do (incf sum (car num))
				(setf start (cdr pos))
				(when (cdr num) (setq ordinalp t))
			     else if (string= num-string "hundred")
			     do (setf sum (* 100 sum))
				(setf start (cdr pos))
			     else if (string= num-string "hundredth")
			     do (setf sum (* 100 sum))
				(setf start (cdr pos))
				(setf ordinalp t)
			     else return sum
			     finally (return sum))
			 (loop with partial = 1
			     while (< start len)
			     as pos = (word string start)
			     while pos
			     as num-string = (subseq string (car pos) (cdr pos))
			     as num = (cdr (assoc num-string *number-multipliers*
						  :test #'string=))
			     while num
			     do (setf partial (* partial (car num)))
				(setf start (cdr pos))
				(when (cdr num) (setq ordinalp t))
			     finally (return partial)))))
        ordinalp))))

(define-setf-expander english-number (x &environment env)
   (multiple-value-bind (vars vals stores setter getter)
       (get-setf-expansion x env)
     (let ((store (gensym)))
       (values vars
	      vals
	      `(,store)
	      `(progn  (setf ,(car stores) (format nil "~r" ,store))
		 ,setter)
	      `(english-number ,getter)))))

From: Steven M. Haflich
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Sun, 23 Mar 2008 01:54:07 +0000
Message-ID: <gLiFj.5121$6H.1888@newssvr22.news.prodigy.net>

I just noticed in the code a dependency on match-re.  Anyone who doesn't 
  want to deal with adapting to some other regexp module can substitute 
this portable equivalent:

   (flet ((word (string start)
            (let ((len (length string)))
              (loop when (= start len)
                  do (return-from word nil)
                  until (lower-case-p (char string start))
                  do (incf start))
              (loop for i from (1+ start) below len
                  while (lower-case-p (char string i))
                  finally (return (cons start i))))))

From: Victor Kryukov
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Sun, 23 Mar 2008 16:54:41 +0000
Message-ID: <m2eja19y9a.fsf@gmail.com>

"Steven M. Haflich" <···@alum.mit.edu> writes:

> Brian wrote:
>> format can print 74823957 as "seventy-four million, eight hundred and
>> twenty-three thousand, nine hundred and fifty-seven", but it can't go
>> the other way.
>
> Here is an implementation of the reverse, written long ago.  Sorry
> about any folded lines.  It includes a setf expander so you can write
> things
> like
>  (incf (english-number) (aref my-array index))

I'd like to propose my own solution. While Steven's solution is cool,
I _personally_ don't like it (no offense Steven) for the following
reasons:

1/ It uses some advanced concepts (macrolet) and contains some
advanced functionality (setf expander) not requested by the original
poster, and therefore it creates an illusion that the problem is
non-trivial, while it is not.
2/ It's not documented.
3/ There are no test cases.
4/ It contains a long (~ 30 lines) loop with two sub-loops. Such loops
are hard to understand, debug and/or extend, especially for the noobs.

I generally like solving complex problems by breaking them into
smaller ones, and I consider every function longer than a dozen lines
as a candidate for refactoring.

Enjoy,
Victor.

;;;; words-to-number.lisp by ··············@gmail.com
;;;; A reverse function for (format nil "~R" ...)

(defpackage :converter
  (:use :cl :cl-ppcre))

(in-package :converter)

#|
First of all, let's normalize our input, replacing all commas,
dashes, "and"s and double spaces with a single space, using CL-PPCRE
regular expression library.
|#

(defparameter *normalizer-re*
  (create-scanner "(-|\\sand\\s|,)" :case-insensitive-mode t)
  "Matches all commas, dashes and \"and\"s.")

(defparameter *multiple-space-re*
  (create-scanner "\\s+")
  "Matches multiple spaces.")

(defun normalize (str)
  "Normalize STR, replacing all commas, dashes, \"and\"s and double
  spaces with a space."
  (string-downcase 
   (regex-replace-all *multiple-space-re*
		      (regex-replace-all *normalizer-re*
					 (string-trim " " str) " ")
		      " ")))

#|
Next, let's write a statistical test for our yet-to-be-written function
words->number.
|#

(defun number->words (n)
  (format nil "~R" n))

(defun test-words->number (&key (num-tests 1000) (upbound (expt 10 10))
			   (converter #'words->number))
  "Tests WORDS->NUMBER on NUM-TESTS random numbers below UPBOUND."
  (loop for i below num-tests
     do (let ((n (1+ (random upbound))))
	  (assert (= n (funcall converter (number->words n)))))))

#|
Now the main algorithm: every english number has the following grammar:

english-number = english-number-below-1000 [factor [english-number]]
factor = thousand | million | billion | ...
english-number-below-1000 = one | two | ... | nine hundred ninety-nine

Therefore, it's easy to construct a recursive algorithm to parse
such numbers.
|#

(defparameter *numbers-below-1000*
  (let ((h (make-hash-table :test #'equal)))
    (loop
       for i from 1 to 999
       do (setf (gethash (normalize (number->words  i)) h)
		i))
    h)
  "A hash that maps english numbers below 1000 to their numerical values.")

(defun words->small-number (str)
  "Convert all numbers below 1000."
  (or (gethash (normalize str) *numbers-below-1000*)
      (error "Don't know how to convert ~A" str)))

(defparameter *factors-list*
  (loop for i from 20 downto 1
     collect (cons (subseq (number->words (expt 10 (* i 3))) 4)
		   (expt 10 (* i 3))))
  "Alist of factors and their numerical values")

(defun join (list)
  "Concatenate elements of list by inserting a space between each two."
  (format nil "~{~A~^ ~}" list))

(defun find-factor (str)
  "Return three values: english-number-below-1000, factor,
english-number according to the grammar above."
  (let* ((seq (split "\\s" str))
	 (factor (find-if (lambda (x)
			    (member x *factors-list* :key #'first :test #'string=))
			  seq))
	 (n (position factor seq)))
    (if factor
	(values (join (subseq seq 0 n))
		(cdr (assoc factor *factors-list* :test #'string=))
		(join (subseq seq (1+ n))))
	(values str))))

(defun words->number (str)
  "Takes an english number and returns a corresponding numberical value."
  (multiple-value-bind (small factor rest)
      (find-factor (normalize str))
    (if factor
	(+ (* (words->small-number small) factor) (words->number rest))
	(if (string= "" small)
	    0
	    (words->small-number small)))))
            
-- 
http://macrodefinition.blogspot.com

From: Kent M Pitman
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Sun, 23 Mar 2008 20:35:59 +0000
Message-ID: <uzlsp9o0g.fsf@nhplace.com>

"Steven M. Haflich" <···@alum.mit.edu> writes:

> Here is an implementation of the reverse, written long ago.

This isn't very fault tolerant, takes kind of a long time to load,
takes more storage than you'd like, and doesn't work over the full
range of numbers, but has a bit of elegance in its simplicity...

(defun parse-english-number (string)
  (or (gethash string
        ;; Might take a while (and a bit of space) to load.
        ;; Oh, and I recommend you do this compiled, not interpreted.
        (load-time-value
          (let ((table (make-hash-table :test #'equalp)))
            ;; Doesn't work for really huge numbers
            (dotimes (i 1000000)
              (setf (gethash (format nil "~R" i) table) i)) table)
          nil))
      (error "Can't parse English number: ~A" string)))

This one is even shorter, uses less space, works over a more general
range, but runs a bit slower in the worst case (not too bad in the
best case, though):

(defun parse-english-number (string)
  (loop for i from 0 do
    (when (equal string (format nil "~R" i))
      (return i))))

[There's a trivial variant of this that would find negatives, too.]

(No, these are not serious suggestions.)

From: Steven M. Haflich
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Mon, 24 Mar 2008 01:03:01 +0000
Message-ID: <w5DFj.6355$qS5.3609@nlpi069.nbdc.sbc.com>

Kent M Pitman wrote:

> (defun parse-english-number (string)
>   (loop for i from 0 do
>     (when (equal string (format nil "~R" i))
>       (return i))))

This one is sometimes faster but less reliable:

(defun parse-english-number (string)
   (loop for i from 1 by 2
       as j = (random i)
       when (equal string (format nil "~R" j))
       return j))

Seriously Kent, why did we leave the operational constraints on ~r so 
loose?  There is no implication in the ANS whether ~r handles negative 
numbers, nor the limits over which it is required to work, nor is there 
any required behavior when the argument is outside that range.  Finally, 
the precise names of the extreme high order English number names.  ACL 
works for positives and negatives through

cl-user(24): (format nil "~r" (1- (expt 10 66)))
"nine hundred ninety-nine vigintillion nine hundred ninety-nine 
novemdecillion nine hundred ninety-nine octodecillion nine hundred 
ninety-nine septendecillion nine hundred ninety-nine sexdecillion nine 
hundred ninety-nine quindecillion nine hundred ninety-nine 
quattuordecillion nine hundred ninety-nine tredecillion nine hundred 
ninety-nine duodecillion nine hundred ninety-nine undecillion nine 
hundred ninety-nine decillion nine hundred ninety-nine nonillion nine 
hundred ninety-nine octillion nine hundred ninety-nine septillion nine 
hundred ninety-nine sextillion nine hundred ninety-nine quintillion nine 
hundred ninety-nine quadrillion nine hundred ninety-nine trillion nine 
hundred ninety-nine billion nine hundred ninety-nine million nine 
hundred ninety-nine thousand nine hundred ninety-nine"

but signals error on any integer of larger magnitude, or on non-integers.

By the way, the final requirement on the ANS Tilde R page is "If and 
only if no parameters are supplied, ~R binds *print-base* to 10."  Since 
there is no way user code can be executed inside the processing of ~R, 
this requirement is without effect, except, I suppose, in the 
implication that it _doesn't_ bind *print-base* in other invocations.

From: Kent M Pitman
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Mon, 24 Mar 2008 02:55:37 +0000
Message-ID: <uprtkal06.fsf@nhplace.com>

"Steven M. Haflich" <···@alum.mit.edu> writes:

> Seriously Kent, why did we leave the operational constraints on ~r so
> loose?  There is no implication in the ANS whether ~r handles negative
> numbers, nor the limits over which it is required to work, nor is
> there any required behavior when the argument is outside that range.

Legacy design.  We could have excluded it from ANSI CL, but to what end?
It was already there in most implementations and its absence would have
annoyed people.

I would have loved to see it internationalized, but doing a serious
job on that would have required more dataflow, since in many languages
you can't do these at all without a gender and sometimes also not
without other classification data as well.  And ~P would have been even
harder to get right internationally.

> Finally, the precise names of the extreme high order English number
> names. [...]

Not to mention the British "billion" thing.

> By the way, the final requirement on the ANS Tilde R page is "If and
> only if no parameters are supplied, ~R binds *print-base* to 10."
> Since there is no way user code can be executed inside the processing
> of ~R, this requirement is without effect, except, I suppose, in the
> implication that it _doesn't_ bind *print-base* in other invocations.

I admit that's a little confusing but I believe the significance of 
this is that

 (let ((*print-base* 2)) (format t "~R" 3))
 => "three" ; not "eleven"

(Of course, people might dispute whether 10 base 2 is properly read as
"ten" in any case, but that's part of the point.  I recall reading
somewhere, years ago (not related to CL) the claim that the binaries
are read one (1), two (10), twin (11), twindred (100), twindred one (101),
twindred two (110), twindred twin (111), twosand (1000), ... but I 
have never seen that silliness since.  But that doesn't mean the question
doesn't exist, it just means there's no canonical answer.  Someone might
still expect "eleven" or "twin" from the computation above... and the
spec didn't want to leave open that option.  Anyone inclined to bicker
would be headed off by the claim that *print-base* had been bound...)

From: Steven M. Haflich
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Mon, 24 Mar 2008 05:00:16 +0000
Message-ID: <AzGFj.16438$5K1.9263@newssvr12.news.prodigy.net>

Kent M Pitman wrote:

>> Seriously Kent, why did we leave the operational constraints on ~r so
>> loose?  There is no implication in the ANS whether ~r handles negative
>> numbers, nor the limits over which it is required to work, nor is
>> there any required behavior when the argument is outside that range.
> 
> Legacy design.  We could have excluded it from ANSI CL, but to what end?
> It was already there in most implementations and its absence would have
> annoyed people.

The answer I was looking for was "Because there was more than enough 
other _important_ stuff to deal with."  But I agree.  Basically, it's a 
cool facility as well as a bit of whimsy.  But we could have done better 
at least by specifying a minimum range, including negatives, and most 
importantly specified a required fallback for a non-integer or 
out-of-range argument, as is done for ~d etc.

> I would have loved to see it internationalized, but doing a serious
> job on that would have required more dataflow, since in many languages
> you can't do these at all without a gender and sometimes also not
> without other classification data as well.  And ~P would have been even
> harder to get right internationally.

Basically, ANSI-CL is an English programming language, like nearly every 
other programming language.  It's unfair, but if we do it again let's 
base it on Esperanto.  We should leave car and cdr alone, but I'd be 
willing to use Esperanto words for first and rest.

>> By the way, the final requirement on the ANS Tilde R page is "If and
>> only if no parameters are supplied, ~R binds *print-base* to 10."
>> Since there is no way user code can be executed inside the processing
>> of ~R, this requirement is without effect, except, I suppose, in the
>> implication that it _doesn't_ bind *print-base* in other invocations.
> 
> I admit that's a little confusing but I believe the significance of 
> this is that
> 
>  (let ((*print-base* 2)) (format t "~R" 3))
>  => "three" ; not "eleven"

Well, if we accept that implication then the contrapositive implication 
would imply that this result

   cl-user(53): (let ((*print-base* 2)) (format nil ··@r" 5))
   "V"

would be incorrect because *print-base* does not get rebound.  I'll 
leave it to the better minds on this list to derive a system of Roman 
numerology in non-decimal radix.

From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Thu, 03 Apr 2008 03:08:23 +0000
Message-ID: <rem-2008apr02-005@yahoo.com>

> From: "Steven M. Haflich" <····@alum.mit.edu>
> Basically, ANSI-CL is an English programming language, like
> nearly every other programming language.  It's unfair, but if we do
> it again let's base it on Esperanto.

Why Esperanto, a totally artificial language nobody knows?
Why not Interlingua, a mixmash of various known languages, an
attempt to make an Esperanto-like language most people would party
recognize?
<http://en.wikipedia.org/wiki/Interlingua>
   Interlingua's greatest advantage is that it is the most widely
   understood international auxiliary language by virtue of its
   naturalistic (as opposed to schematic) grammar and vocabulary,
   allowing those familiar with a Romance language, and educated speakers
   of English, to read and understand it without prior study. ...
   ... There are several active mailing lists, and
   Interlingua is also in use in certain Usenet newsgroups, particularly
   in the europa.* hierarchy. ...

> We should leave car and cdr alone, but I'd be willing to use
> Esperanto words for first and rest.

English->Interlingua <http://www.interlingua.com/an/ceid>
English->Esperanto <http://dictionaries.travlang.com/EnglishEsperanto/>
When treating a single CONS cell as part of a tree:
     * LEFT = leve (I don't like maldekstren)
     * RIGHT = dextre (dekstren is almost as good)
When treating a CDR-linked chain of CONS cells as a list:
     * FIRST = prime (the "e" is presumably pronounced as in Spanish)
         (Fuck Esperanto! You need to know the gender of the
                          elements in your list!!! unua unue)
     * NEXT = sequente (poste isn't bad at all)

IMO unua/unue is the killer. You can't do a runtime type-check on
the first element of the list until you fetch it to inspect it, but
you can't fetch it from the CONS cell until after you know its
gender, a classic Catch-22 blockage. The only way I see this
working is to have two kinds of CONS cells, those whose CAR can
store only male objects, and one whose CAR can store only female
objects, and then polymorphise higher-level code that does a
type-case on the CONS cell to determine whether it's a MALE-CONS or
FEMALE-CONS cell hence whether to call UNUE or UNUA respectively.
No way will I go along with such an absurdity.
(Nevermind how to decide, at system-design time, the gender of each
 data type to appear in the software, to avoid later
 mis-communication between different programmers on the team.)
(I you think the incompatibility between various kinds of connectors:
 - EIA 12+13=25
 - EIA 4+5=9
 - EIA 5+5+5=15
 - DIN
 - USB
 is a problem, at least you can *see* (or if blind, *feel* by
 touch) the physical connectors before you try to plug one type of
 plug into another type of socket. My Dell Latitude XPi has one of
 each of the first four kinds, but it apparently pre-dated the new
 USB type of connector!!)

> I'll leave it to the better minds on this list to derive a system
> of Roman numerology in non-decimal radix.

What's the problem? You scrap all the V stuff, using only I X C M etc.,
except in hexadecimal base you use E for eight in lieu of V for five.
I II III IIII IIIE IIE IE E EI EII EIII EIIII IIIX IIX IX X
XI XII XIII XIIII XIIIE XIIE XIE XE XEI XEII XEIII XEIIII XIIIX XIIX XIX XX
XXI XXII XXIII XXIIII XXIIIE XXIIE XXIE XXE XXEI XXEII XXEIII XXEIIII XXIIIX...
XXXI XXXII XXXIII XXXIIII XXXIIIE XXXIIE XXXIE XXXE XXXEI XXXEII XXXEIII ...
You can still use L for #x80, and D for #x800.

(format nil ··@R" 5000)
Error in function FORMAT::FORMAT-PRINT-ROMAN:
   Number too large to print in Roman numerals: 5,000
Hmm, curious ...
How did the Romans do a census in their day??
I'm sure Rome had more than five thousand citizens??
Maybe slaves don't count?
Or maybe this problem is why the Empire collapsed????

From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: Is there any library for converting common English words to its equivalent numeric value in digits?
Date: Thu, 03 Apr 2008 02:09:04 +0000
Message-ID: <rem-2008apr02-004@yahoo.com>

> From: Kent M Pitman <······@nhplace.com>
> ... people might dispute whether 10 base 2 is properly read as
> "ten" in any case, but that's part of the point.  I recall reading
> somewhere, years ago (not related to CL) the claim that the
> binaries are read one (1), two (10), twin (11), twindred (100),
> twindred one (101), twindred two (110), twindred twin (111),
> twosand (1000), ... but I have never seen that silliness since.

I never saw that proposal before, but now that you mention it, I
don't think it's silly at all, I think it's a wonderful idea!!

Follow-up question. I know that the "w" is not pronounced in "two"
but *is* pronounced in "twin". So I presume it'd be pronouced in
any form that starts with "twin", such as "twindred", but not in
any form that starts with "two", such as "twosand". Is that the
original intention, or was pronounciation not discussed in the
original proposal? I would personally prefer that the "w" be
pronounced in everything except "two" itself (and forms using that
to indicate a 2 in some position where it's directly analagous to
"two" by itself). Thus "twosand" pronounces the "w", but
"two twosand" pronounces only the second "w". "twosand" would
have vowels sound like "low-hand" in Poker, for example.
The fully pronounced consonant "tw" together with the Poker vowel
"o" would sound like a lispy version of "row row row your boat",
or maybe an Elmer Fudd version of "troll-sand". (Where do
baby twolls play? In a twoll-sand box!)

So let me guess the rest:
(expt 2 6) = twillion ?  Maybe better would be mwillion?
(expt 2 9) = bwillion ?
(expt 2 12) = ???????

Google: twin twindred twosand
    Web  Results 1 - 1 of 1 for twin twindred twosand. (0.17 seconds)
   Did you mean: twin twindred twos and
   Is there any library for converting common English words to its ...
   twindred two (110), twindred twin (111), twosand (1000), ... but I
   have never seen that silliness since. But that doesn't mean the
   question ...
   groups.google.com/group/ comp.lang.lisp/msg/488a56d728499879 - 31k -

<http://search.yahoo.com/search?ei=UTF-8&fr=yfp-t-501&cop=mss&p=twin+twindred+twosand>
       Sorry, there was a problem retrieving search results. Please try
       again.
   Re: Is there any library for converting common English words to its ...
            ... twin (11), twindred (100), twindred one (101), twindred
            two (110), twindred twin (111), twosand (1000), ... but I
            have never seen that silliness since. ...
            www.codecomments.com/Lisp/message2190444.html - 25k - Cached

<http://en.wikipedia.org/wiki/Special:Search?search=twin+twindred+twosand&fulltext=Search>
   No page text matches
<http://en.wikipedia.org/w/index.php?title=Special%3ASearch&ns0=1&ns1=1&ns2=1&ns3=1&ns4=1&ns5=1&ns6=1&ns7=1&ns8=1&ns9=1&ns10=1&ns11=1&ns12=1&ns13=1&ns14=1&ns15=1&ns100=1&ns101=1&redirs=1&search=twin+twindred+twosand&fulltext=Advanced+search>
   No page text matches

Now, Mr. expert in non-decimal numeric bases, what were the
corresponding proposals for octal and hexadecimal?
Let me guess:
(expt 8 1) = eight
(expt 8 2) = en
(expt 8 3) = eousand
(expt 8 6) = eillion (pronounced eeyillion)
(expt 8 9) = byellion (like the way Carl Sagan pronounced (expt ten 9))
(expt 8 12) = tryellion
(setq hd (expt 2 4))
(expt hd 1) = steen
(expt hd 2) = stundred
(expt hd 3) = stousand
(expt hd 6) = stillion
(expt hd 9) = st'billion
(expt hd 12) = strillion

OT: I still want to find time and energy to write up my followup
to your data-intension essay to include atomic datatypes too.
It would include both physical units (like does a distance of 6
between nearby towns mean 6 miles or 6 kilometers, or parsecs vs.
lightyears between stars) and internal data representations as bit
patterns which mean different things as signed byte or unsigned
byte or floating point or character etc.
For example, what does this bit-pattern mean?
01100110 01110101 01100011 01101011 (4 consecutive bytes in RAM)
It depends on the intention of the programmer whose code generated it!
If you're using a debugger to examine memory after a program crash,
you might not have the intention readily available, so you might
have to just guess and then try to gather more information to
verify or refute your guess. But if you're writing software to
process this block of data, you'd better know a priori what the
intention was!!

Back to the main topic: Has anybody used a neural net or other
general-learning system to "learn" the relationship between input
and output of the ~R and/or ~:R directives and thereby grok the
algorithm and consequently devise the reverse algorithm without
anybody needing to explicitly code it?