> There was a discussion sometime ago regarding a lisp version of
> Norvig's spell checking program.
> http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/78...
> However, the main concern at the time was to minimize line count.
> I was curious and wrote my version that optimizes for readability.
> However, when I tried it out I noticed that it is _very_ slow. Norvig
> claimed that the first test case (270 words) took 13 seconds.
> Here's my result (P4 2.8Ghz SBCL 1.0.3 linux)
> SPELLCHECK> (time (spelltest *tests1* :verbose t))
> ...
> User time = 0:04:57.094
> System time = 0.840
> Elapsed time = 0:05:14.309
> Allocation = 1581329808 bytes
> 0 Page faults
> That is simply unacceptable. So I tried a few examples in the above
> thread and the results were either much worst or _slightly_ better
> (within 10% range).
Same results here. My program is located at http://www.profibing.de/lisp/spell/,
Lisp is ClozureCL 1.3(trunk) 64 bit, Mac OSX.
I tried to minimize consing with a macro DOEDITS.
(defun edits1 (word)
"Create a list of all words with Levenshtein distance of one."
(let (words)
(doedits (e1 word)
(pushnew (copy-seq e1) words :test #'string=))
words))
(defun edits2 (word)
"Create a list of all words with Levenshtein distance of two."
(let (words)
(doedits (e1 word)
(doedits (e2 e1)
(pushnew (copy-seq e2) words :test #'string=)))
words))
But (length (edits2 "something")) needs 630 seconds, while the Python
version does it at a blink.
The benchmark runs with comparable time but it uses a *lot* of memory.
Can we do better? How comes it, that Python is so fast, even as a
interpreted language?
--
Bernd
In article
<····································@s50g2000hsb.googlegroups.com>,
·············@googlemail.com wrote:
> > There was a discussion sometime ago regarding a lisp version of
> > Norvig's spell checking program.
> > http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/78...
> > However, the main concern at the time was to minimize line count.
> > I was curious and wrote my version that optimizes for readability.
> > However, when I tried it out I noticed that it is _very_ slow. Norvig
> > claimed that the first test case (270 words) took 13 seconds.
> > Here's my result (P4 2.8Ghz SBCL 1.0.3 linux)
> > SPELLCHECK> (time (spelltest *tests1* :verbose t))
> > ...
> > User time = 0:04:57.094
> > System time = 0.840
> > Elapsed time = 0:05:14.309
> > Allocation = 1581329808 bytes
> > 0 Page faults
> > That is simply unacceptable. So I tried a few examples in the above
> > thread and the results were either much worst or _slightly_ better
> > (within 10% range).
>
> Same results here. My program is located at http://www.profibing.de/lisp/spell/,
> Lisp is ClozureCL 1.3(trunk) 64 bit, Mac OSX.
>
> I tried to minimize consing with a macro DOEDITS.
>
> (defun edits1 (word)
> "Create a list of all words with Levenshtein distance of one."
> (let (words)
> (doedits (e1 word)
> (pushnew (copy-seq e1) words :test #'string=))
> words))
>
> (defun edits2 (word)
> "Create a list of all words with Levenshtein distance of two."
> (let (words)
> (doedits (e1 word)
> (doedits (e2 e1)
> (pushnew (copy-seq e2) words :test #'string=)))
> words))
>
> But (length (edits2 "something")) needs 630 seconds, while the Python
> version does it at a blink.
The functions above look very inefficient.
You are pushing something onto a list. For every object
you push you have potentially to traverse the whole list.
The time complexity is bad - which you will see for long lists.
Then you use copy-seq. Is that needed?
A typical better implementation is to use a hash-table, put
the strings into the hash-table and later get the
contents of the hash-table as the result. Checking if
a string is in the hash-table is likely faster.
The other code in your spell.lisp file also has lots
of opportunities to be optimized. You also don't
need EVAL-WHEN around the macro. The compiler
notices that it is a macro and allows
the use of the macro in the same file.
To find out where your code spends its time, try to use a profiler.
That's a very useful tool.
> The benchmark runs with comparable time but it uses a *lot* of memory.
>
> Can we do better? How comes it, that Python is so fast, even as a
> interpreted language?
There are no 'interpreted languages'. (Most) Languages can be implemented
by a compiler or interpreter or both (-> C interpreters, Lisp compilers, ...).
>
> --
> Bernd
--
http://lispm.dyndns.org/
From: Marek Kubica
Subject: Re: Lisps memory usage (was: Efficient way to create strings? (was: How to Write a Spelling Corrector))
Date:
Message-ID: <ge5gh4$a4t$1@hoshi.visyn.net>
On Mon, 27 Oct 2008 21:33:14 +0200, Rainer Joswig wrote:
>> Can we do better? How comes it, that Python is so fast, even as a
>> interpreted language?
>
> There are no 'interpreted languages'. (Most) Languages can be
> implemented by a compiler or interpreter or both (-> C interpreters,
> Lisp compilers, ...).
Not only that - Python usually gets compiled into Python VM bytecode or
Java bytecode or .NET bytecode that gets interpreted. OTOH there are also
Lisp interpreters which include a Just-in-time compiler. The distinction
gets blurry.
regards,
Marek
On Oct 27, 12:59 pm, ·············@googlemail.com wrote:
>
> But (length (edits2 "something")) needs 630 seconds, while the Python
> version does it at a blink.
Are you sure you compiled the code? I just tried it in SBCL and it
took .167 sec.
-- Scott
From: Raffael Cavallaro
Subject: Re: Lisps memory usage (was: Efficient way to create strings? (was: How to Write a Spelling Corrector))
Date:
Message-ID: <geav5p$kir$1@aioe.org>
On 2008-10-27 15:59:34 -0400, ·············@googlemail.com said:
> But (length (edits2 "something")) needs 630 seconds, while the Python
> version does it at a blink.
No way - I'm running CCL 64-bit as well:
? (lisp-implementation-version)
"Version 1.2-r11241M (DarwinX8664)"
? (time (length (edits2 "something")))
(LENGTH (EDITS2 "something")) took 359,333 microseconds (0.359333
seconds) to run
with 2 available CPU cores.
During that period, 316,098 microseconds (0.316098 seconds) were spent
in user mode
16,398 microseconds (0.016398 seconds) were spent
in system mode
60,495 microseconds (0.060495 seconds) was spent in GC.
26,268,256 bytes of memory allocated.
274675
?
On 30 Okt., 01:29, Raffael Cavallaro <················@pas-d'espam-
s'il-vous-plait-mac.com> wrote:
> On 2008-10-27 15:59:34 -0400, ·············@googlemail.com said:
>
> > But (length (edits2 "something")) needs 630 seconds, while the Python
> > version does it at a blink.
>
> No way - I'm running CCL 64-bit as well:
>
> ? (lisp-implementation-version)
> "Version 1.2-r11241M (DarwinX8664)"
> ? (time (length (edits2 "something")))
> (LENGTH (EDITS2 "something")) took 359,333 microseconds (0.359333
> seconds) to run
> with 2 available CPU cores.
> During that period, 316,098 microseconds (0.316098 seconds) were spent
> in user mode
> 16,398 microseconds (0.016398 seconds) were spent
> in system mode
> 60,495 microseconds (0.060495 seconds) was spent in GC.
> 26,268,256 bytes of memory allocated.
> 274675
> ?
Meanwhile I changed the code. Instead of PUSHNEW just PUSH is used.
Because of duplicates this leads to the answer 274675 instead of
114324 like in Python.
CL-USER> (defvar *aaa*)
*AAA*
CL-USER> (time (progn (setf *aaa* (edits2 "something")) (values)))
(PROGN (SETF *AAA* (EDITS2 "something")) (VALUES)) took 309,687
microseconds (0.309687 seconds) to run
with 2 available CPU
cores.
During that period, 295,427 microseconds (0.295427 seconds) were spent
in user mode
14,485 microseconds (0.014485 seconds) were spent
in system mode
57,727 microseconds (0.057727 seconds) was spent in
GC.
26,268,256 bytes of memory
allocated.
; No value
CL-USER> (length *aaa*)
274675
CL-USER> (time (length (delete-duplicates *aaa* :test #'string=)))
(LENGTH (DELETE-DUPLICATES *AAA* :TEST #'STRING=)) took 1,610,182,581
microseconds (1610.182500 seconds) to run
with 2 available CPU
cores.
During that period, 1,570,058,823 microseconds (1570.058800 seconds)
were spent in user mode
3,492,789 microseconds (3.492789 seconds) were
spent in system mode
16 bytes of memory
allocated.
434 minor page faults, 434 major page faults, 0
swaps.
114324
It seems that emulating Pythons set() with PUSHNEW or DELETE-
DUPLICATES is a bad idea.
--
Bernd
On Oct 30, 12:50 pm, ·············@googlemail.com wrote:
> Meanwhile I changed the code. Instead of PUSHNEW just PUSH is used.
Oh! I didn't notice.
I changed it to use FSet and got a time of 1.2 sec:
(defun edits2 (word)
"Create a list of all words with Levenshtein distance of two."
(let ((words (set)))
(doedits (e1 word)
(doedits (e2 e1)
(adjoinf words (copy-seq e2))))
(convert 'list words)))
FSET-USER> (time (size (edits2 "something")))
;;; Real time: 1.2 sec
114324
http://common-lisp.net/project/fset/
(I'll have the FSet 1.2 release, complete with brand new tutorial, out
this weekend for sure.)
-- Scott
From: Thomas A. Russ
Subject: Re: Lisps memory usage (was: Efficient way to create strings? (was: How to Write a Spelling Corrector))
Date:
Message-ID: <ymid4hhd7ri.fsf@blackcat.isi.edu>
·············@googlemail.com writes:
> It seems that emulating Pythons set() with PUSHNEW or DELETE-
> DUPLICATES is a bad idea.
Only if the set is large.
The only implementation available for serial PUSHNEW forms is a N^2
algorithm. The typical implementations for DELETE/REMOVE-DUPLICATES are
also N^2 algorithms.
Fortunately, it isn't hard to write a faster function for
REMOVE-DUPLICATES in the case of large inputs. That is actually fairly
common.
A better approach would be to use a different implementation technique
for sets than plain lists. Certainly Java does that with their use of
HashSets. I'm not sure what Python does.
You could either create your own set implementation class or else IIRC
there is something in one of the CL libraries, but I'm not sure exactly
which one has it.
--
Thomas A. Russ, USC/Information Sciences Institute
Some experiments with hash-tables:
(defun edits1-list (word)
"Create a list of all words with Levenshtein distance of one."
(let (words)
(doedits (e1 word)
(push (copy-seq e1) words))
(delete-duplicates words :test #'string=)))
(defun edits1-hash (word)
"Create a hash-table of all words with Levenshtein distance of one."
(let ((words (make-hash-table :test #'equal :size (+ 25 (* 54
(length word))))))
(doedits (e1 word)
(setf (gethash e1 words) t))
words))
CL-USER> (length (edits1-list "something"))
494
CL-USER> (hash-table-count (edits1-hash "something"))
482
Why is there a difference?
And it is not dependend on Lisp implementation:
(hash-table-count (edits1-hash "something"))
;; AllegroCL:
482
;; ClozureCL:
254
;; SBCL:
254
All with MacOSX/PPC 32-bit
Found it. The addresses of MAKE-STRING where reused.
(defun edits1-hash (word)
"Create a hash-table of all words with Levenshtein distance of one."
(let ((words (make-hash-table :test #'equal :size (+ 25 (* 54
(length word))))))
(doedits (e1 word)
(setf (gethash (copy-seq e1) words) t))
words))
CL-USER> (hash-table-count (edits1-hash "something"))
494