ASCII dictionary/lexicon needed

From: Mark Tarver
Subject: ASCII dictionary/lexicon needed
Date: Mon, 11 Jun 2007 17:24:23 +0000
Message-ID: <1181582663.468091.326500@p77g2000hsh.googlegroups.com>

Hi,

I'm looking for a downloadable ASCII English dictionary/lexicon with
the words and their lexical categories.   Anybody know where to find
one?

Mark

Re: ASCII dictionary/lexicon needed ··········@gmail.com
- Re: ASCII dictionary/lexicon needed Ken Tilton
  - Re: ASCII dictionary/lexicon needed Bob Bechtel
Re: ASCII dictionary/lexicon needed Charlton Wilbur
- Re: ASCII dictionary/lexicon needed Ken Tilton
Re: ASCII dictionary/lexicon needed ··········@hotmail.com

From: ··········@gmail.com
Subject: Re: ASCII dictionary/lexicon needed
Date: Mon, 11 Jun 2007 19:28:08 +0000
Message-ID: <1181590088.295798.229190@p47g2000hsd.googlegroups.com>

On Jun 11, 10:24 am, Mark Tarver <··········@ukonline.co.uk> wrote:
> Hi,
>
> I'm looking for a downloadable ASCII English dictionary/lexicon with
> the words and their lexical categories.   Anybody know where to find
> one?
>
> Mark

WordNet (wordnet.princeton.edu) may be a good place to start
looking.

Alex

From: Ken Tilton
Subject: Re: ASCII dictionary/lexicon needed
Date: Mon, 11 Jun 2007 20:19:46 +0000
Message-ID: <I5ibi.6$2U4.2@newsfe12.lga>

··········@gmail.com wrote:
> On Jun 11, 10:24 am, Mark Tarver <··········@ukonline.co.uk> wrote:
> 
>>Hi,
>>
>>I'm looking for a downloadable ASCII English dictionary/lexicon with
>>the words and their lexical categories.   Anybody know where to find
>>one?
>>
>>Mark
> 
> 
> WordNet (wordnet.princeton.edu) may be a good place to start
> looking.

Seconded. Also the Guttenberg project.

kt

-- 
http://www.theoryyalgebra.com/

"Algebra is the metaphysics of arithmetic." - John Ray

"As long as algebra is taught in school,
there will be prayer in school." - Cokie Roberts

"Stand firm in your refusal to remain conscious during algebra."
    - Fran Lebowitz

"I'm an algebra liar. I figure two good lies make a positive."
    - Tim Allen

From: Bob Bechtel
Subject: Re: ASCII dictionary/lexicon needed
Date: Tue, 12 Jun 2007 00:21:00 +0000
Message-ID: <%Elbi.28$Dv2.8@newsfe04.lga>

Ken Tilton wrote:
> 
> 
> ··········@gmail.com wrote:
>> On Jun 11, 10:24 am, Mark Tarver <··········@ukonline.co.uk> wrote:
>>
>>> Hi,
>>>
>>> I'm looking for a downloadable ASCII English dictionary/lexicon with
>>> the words and their lexical categories.   Anybody know where to find
>>> one?
>>>
>>> Mark
>>
>>
>> WordNet (wordnet.princeton.edu) may be a good place to start
>> looking.
> 
> Seconded. Also the Guttenberg project.
> 
> kt
> 
Specifically, take a look at Moby Part of Speech at Project Gutenberg:

http://www.gutenberg.org/etext/3203

From: Charlton Wilbur
Subject: Re: ASCII dictionary/lexicon needed
Date: Tue, 12 Jun 2007 02:08:33 +0000
Message-ID: <871wghudpa.fsf@mithril.chromatico.net>

>>>>> "MT" == Mark Tarver <··········@ukonline.co.uk> writes:

    MT> Hi, I'm looking for a downloadable ASCII English
    MT> dictionary/lexicon with the words and their lexical
    MT> categories.  Anybody know where to find one?

In "Time flies like an arrow.  Fruit flies like a banana," how do you
know what part of speech "flies" and "like" are without knowing the
context?

That said, the Brown Corpus is quite useful for part-of-speech
tagging, and contains about a million words.  Google can find it for
you.

Charlton



-- 
Charlton Wilbur
·······@chromatico.net

From: Ken Tilton
Subject: Re: ASCII dictionary/lexicon needed
Date: Wed, 13 Jun 2007 04:19:51 +0000
Message-ID: <OdKbi.57$Kc2.39@newsfe12.lga>

Charlton Wilbur wrote:
>>>>>>"MT" == Mark Tarver <··········@ukonline.co.uk> writes:
> 
> 
>     MT> Hi, I'm looking for a downloadable ASCII English
>     MT> dictionary/lexicon with the words and their lexical
>     MT> categories.  Anybody know where to find one?
> 
> In "Time flies like an arrow.  Fruit flies like a banana," how do you
> know what part of speech "flies" and "like" are without knowing the
> context?

That's a thoroughly different question. The dictionry just needs to say 
"flies-n,v.i." to satisfy the question asked.

kt

From: ··········@hotmail.com
Subject: Re: ASCII dictionary/lexicon needed
Date: Tue, 12 Jun 2007 06:38:07 +0000
Message-ID: <1181630287.716800.222650@i13g2000prf.googlegroups.com>

On 11 Juni, 19:24, Mark Tarver <··········@ukonline.co.uk> wrote:
> Hi,
>
> I'm looking for a downloadable ASCII English dictionary/lexicon with
> the words and their lexical categories.   Anybody know where to find
> one?
>
> Mark

Wiktionary is useful. Following code translates the wiki source to
lists.

; regular expression loop-like language:
; search
; for line <page>
; then line <title>english_word</title>
; scan while </page> is not found do
;   search
;   for "{{trans-top|"
;   scan while "{{trans" is not found do
;     search
;       extract *Finnish: [[word1]] (1), [[word2]], ...
;       or
;       extract *Finnish: [[word1]] (1), [[word2]] (2), ...

(proclaim '(optimize (speed 3) (safety 0) (debug 0)))

(defun print-dict (from words)
  (format t "~a ~a~%" from words)
  )

(defun doit ()

(with-open-file (fh "enwiktionary-20070225-pages-articles.xml"
                    :direction :input)
  (let (line word next)
    (loop
      (setf next nil)
      (setf line (read-line fh nil nil))
      (if (not line) (return))
      ; begin page extract
      (when (search "<page>" line)
        ; verify title is found
        (setf word (read-line fh nil nil))
        (when (search "<title>" word)
          (loop
            (setf line (read-line fh nil nil))
            (if (not line) (return))
            (when (search "</page>" line)
              (setf next t)
              (return))
            (when (search "{{trans-top" line)
              (loop
                (setf line (read-line fh nil nil))
                (if (not line) (return))
                (when (search "*Finnish: " line)
                  (print-dict word line)
                  (setf next t)
                  (return)))
              (if next (return)))
            (if next (return))))))))

)

(compile 'doit)

(doit)



----------------------
Also I've a perl version that is (guess) more correct:

#!/usr/bin/perl
# regular expression loop-like language:
# search
# for line <page>
# then line <title>english_word</title>
# scan while </page> is not found do
#   search
#   for "{{trans-top|"
#   scan while "{{trans" is not found do
#     search
#       extract *Finnish: [[word1]] (1), [[word2]], ...
#       or
#       extract *Finnish: [[word1]] (1), [[word2]] (2), ...

sub print_dict {

  my($from, $words) = @_;
  chomp $from;
  chomp $words;
  $from =~ s/.*<title>(.*)<\/title>.*/$1/;
  $words =~ s/.*Finnish: ([^ ]*).*/$1/;
  $words =~ s/[\[\],]/ /g;
  $words =~ s/��/�/g;
  $words =~ s/��/�/g;
  $words =~ s/^[ ]*//;
  if($words =~ / /){
    $words =~ s/([^ ]*) .*/$1/;
  }
  if($words =~ /#/){
    $words =~ s/([^#]*)#.*/$1/;
  }
#  $words =~ s/.Finnish: \[\[([^\]]*)\]\].*/$1/;
  print "(\"$from\" \"$words\")\n";
}

open my $fh, '<', "enwiktionary-20070225-pages-articles.xml" or die;
#open my $fh, '<', "t2" or die;
my ($line, $word, $next);
while(1){
  last if(eof($fh));
  $line = <$fh>;
#print "Parsing: {$line}\n";
  # begin page extract
  if($line =~ /<page>/){
#print "Found beginning of page.\n";
    # verify title is found
    $word = <$fh>;
    if($word =~ /<title>/){
#print "Word is: $word";
      while(1){
        last if(eof($fh));
        $line = <$fh>;
#print "Searching for translation in {$line}\n";
        if($line =~ /<\/page>/){
#print "Found end of page {$line}\n";
          last;}
#print "Has line translation? {$line}\n";
        if($line =~ /\*Finnish: /){
#print "translation found! {$line}\n";
          print_dict($word, $line);
          last;}}}}}