From: ······@aristotle.ils.nwu.edu
Subject: Spell-Checkers and Spell-Correctors
Date: 
Message-ID: <5480@accuvax.nwu.edu>
From: Vitas Daulys <······@aristotle.ils.nwu.edu>


This is a general request going out in search of any available spell-checker
or spell-correction routines.  Either English or Spanish words would be optimum 
and LISP is the preferred language, but any variations of these are acceptable.

Thanks
From: Jeff Palmucci
Subject: Re: Spell-Checkers and Spell-Correctors
Date: 
Message-ID: <53979@bbn.COM>
Here is a spell checker that I wrote for a natural language system ~3
years ago. It allows you to build a dictionary through the
add-spelling function, and retrieve close matches through
find-similar. 

A close word it one with ONE of the following errors: added character,
deleted character, replaced character, transposed characters.

It finds these words by hashing on (<first half of the word> . length)
and (length . <second half of the word>). Since we are only looking
for words 1 error away from the possible misspelling, one of the halfs
will be correct, and should lead to the correct word. 

The code is not particularly efficient, and it stores the entire
dictionary in memory. It works great on a Symbolics, which have about
170mb swapping space standard, but may be too costly for large
dictionaries on smaller machines.

If you are using a Symbolics, dictionary disk saves can be done
quickly with sys:write-forms-to-file. Other machines may have to build
it from scratch each time you power up.

The indexing idea came from an undergraduate bachelors thesis at MIT,
forgot which one.

Jeff

---------------------
I make no guarantees or restrictions on the following code:

(defun get-search-keys (sym)
  "Returns the hash codes that the word should be listed under."
  (let* ((word (symbol-name sym))
	 (word-length (length word))
	 (up-word (string-upcase word))
	 (prefix-base  (subseq up-word 0 (1- (ceiling (/ word-length 2)))))
	 (suffix-base  (subseq up-word (- word-length -1 (ceiling (/ word-length 2))))))
 						; if the word is too short for a key, return the null
      (cond ((< word-length 3)
	     `((,word-length . "")))
	    ((evenp word-length)
	     (list (cons word-length prefix-base)
		   (cons suffix-base word-length)))
	    (t (list (cons word-length prefix-base)
		     (cons suffix-base word-length)
		     (cons word-length (subseq prefix-base 0 (1- (length prefix-base))))
		     (cons (subseq suffix-base 1) word-length))))))

(defun get-index-keys (sym)
  "Returns the hash codes that the word may be found under"
  (let* ((word (symbol-name sym))
	 (word-length (length word))
	 (up-word (string-upcase word))
	 (prefix-base  (subseq up-word 0 (1- (ceiling (/ word-length 2)))))
	 (suffix-base  (subseq up-word (- word-length -1 (ceiling (/ word-length 2))))))
						; if the word is too short for a key, return the null
    (cond ((= word-length 1)
	   '((1 . "") (2 . "")))
	  ((= word-length 2)
	   '((1 . "") (2 . "") (3 . "")))
	  ((oddp word-length)
	   (list 
	     (cons (1- word-length) (subseq prefix-base 0 (1- (length prefix-base))))
	     (cons word-length prefix-base)
	     (cons (1+ word-length) prefix-base)
	     (cons (subseq suffix-base 1) (1- word-length))
	     (cons suffix-base word-length)
	     (cons suffix-base (1+ word-length))))
	  (t 
	   (list
	     (cons (1- word-length) prefix-base)
	     (cons word-length prefix-base)
	     (cons (1+ word-length) prefix-base)
	     (cons suffix-base (1- word-length))
	     (cons suffix-base word-length)
	     (cons suffix-base (1+ word-length)))))))


(defun make-spelling-table ()
  "Creates a table that is used to store the most likely word matches in"
  (make-hash-table :test #'equal :size 256 :rehash-size 1.5 :rehash-threshold .75))

(defun add-spelling (sym table &optional &key (longword nil))
  "Adds a word to a specific spelling table"
  (mapc #'(lambda (k)
	    (push sym (gethash k table)))
	(get-search-keys sym)))


(defun find-similar (sym table)
  "Returns a list of all similarp words in table"
  (remove-duplicates
    (mapcan #'(lambda (x)
		(if (similarp (symbol-name sym)(symbol-name x))
		    (list x)))
	    (remove-duplicates
	      (mapcan #'(lambda (x)
			  (copy-list (gethash x table)))
		      (get-index-keys sym))
	      :test #'equal))))

(defun similarp (word1 word2)
  "Are the two words similar enough to be considered spelling replacements."
  (or (equal word1 word2)
      (do  ((l1 (coerce word1 'list) (cdr l1))
	    (l2 (coerce word2 'list) (cdr l2)))
	   ((neq (car l1) (car l2)) 
	    (or (equal (cdr l1) l2)
		(equal l1 (cdr l2))
		(equal (cdr l1) (cdr l2))
		(and (eq (car l1) (cadr l2))
		     (eq (cadr l1) (car l2))
		     (equal (cddr l1) (cddr l2))))))))