On Jun 11, 10:24 am, Mark Tarver <··········@ukonline.co.uk> wrote:
> Hi,
>
> I'm looking for a downloadable ASCII English dictionary/lexicon with
> the words and their lexical categories. Anybody know where to find
> one?
>
> Mark
WordNet (wordnet.princeton.edu) may be a good place to start
looking.
Alex
··········@gmail.com wrote:
> On Jun 11, 10:24 am, Mark Tarver <··········@ukonline.co.uk> wrote:
>
>>Hi,
>>
>>I'm looking for a downloadable ASCII English dictionary/lexicon with
>>the words and their lexical categories. Anybody know where to find
>>one?
>>
>>Mark
>
>
> WordNet (wordnet.princeton.edu) may be a good place to start
> looking.
Seconded. Also the Guttenberg project.
kt
--
http://www.theoryyalgebra.com/
"Algebra is the metaphysics of arithmetic." - John Ray
"As long as algebra is taught in school,
there will be prayer in school." - Cokie Roberts
"Stand firm in your refusal to remain conscious during algebra."
- Fran Lebowitz
"I'm an algebra liar. I figure two good lies make a positive."
- Tim Allen
Ken Tilton wrote:
>
>
> ··········@gmail.com wrote:
>> On Jun 11, 10:24 am, Mark Tarver <··········@ukonline.co.uk> wrote:
>>
>>> Hi,
>>>
>>> I'm looking for a downloadable ASCII English dictionary/lexicon with
>>> the words and their lexical categories. Anybody know where to find
>>> one?
>>>
>>> Mark
>>
>>
>> WordNet (wordnet.princeton.edu) may be a good place to start
>> looking.
>
> Seconded. Also the Guttenberg project.
>
> kt
>
Specifically, take a look at Moby Part of Speech at Project Gutenberg:
http://www.gutenberg.org/etext/3203
>>>>> "MT" == Mark Tarver <··········@ukonline.co.uk> writes:
MT> Hi, I'm looking for a downloadable ASCII English
MT> dictionary/lexicon with the words and their lexical
MT> categories. Anybody know where to find one?
In "Time flies like an arrow. Fruit flies like a banana," how do you
know what part of speech "flies" and "like" are without knowing the
context?
That said, the Brown Corpus is quite useful for part-of-speech
tagging, and contains about a million words. Google can find it for
you.
Charlton
--
Charlton Wilbur
·······@chromatico.net
Charlton Wilbur wrote:
>>>>>>"MT" == Mark Tarver <··········@ukonline.co.uk> writes:
>
>
> MT> Hi, I'm looking for a downloadable ASCII English
> MT> dictionary/lexicon with the words and their lexical
> MT> categories. Anybody know where to find one?
>
> In "Time flies like an arrow. Fruit flies like a banana," how do you
> know what part of speech "flies" and "like" are without knowing the
> context?
That's a thoroughly different question. The dictionry just needs to say
"flies-n,v.i." to satisfy the question asked.
kt
On 11 Juni, 19:24, Mark Tarver <··········@ukonline.co.uk> wrote:
> Hi,
>
> I'm looking for a downloadable ASCII English dictionary/lexicon with
> the words and their lexical categories. Anybody know where to find
> one?
>
> Mark
Wiktionary is useful. Following code translates the wiki source to
lists.
; regular expression loop-like language:
; search
; for line <page>
; then line <title>english_word</title>
; scan while </page> is not found do
; search
; for "{{trans-top|"
; scan while "{{trans" is not found do
; search
; extract *Finnish: [[word1]] (1), [[word2]], ...
; or
; extract *Finnish: [[word1]] (1), [[word2]] (2), ...
(proclaim '(optimize (speed 3) (safety 0) (debug 0)))
(defun print-dict (from words)
(format t "~a ~a~%" from words)
)
(defun doit ()
(with-open-file (fh "enwiktionary-20070225-pages-articles.xml"
:direction :input)
(let (line word next)
(loop
(setf next nil)
(setf line (read-line fh nil nil))
(if (not line) (return))
; begin page extract
(when (search "<page>" line)
; verify title is found
(setf word (read-line fh nil nil))
(when (search "<title>" word)
(loop
(setf line (read-line fh nil nil))
(if (not line) (return))
(when (search "</page>" line)
(setf next t)
(return))
(when (search "{{trans-top" line)
(loop
(setf line (read-line fh nil nil))
(if (not line) (return))
(when (search "*Finnish: " line)
(print-dict word line)
(setf next t)
(return)))
(if next (return)))
(if next (return))))))))
)
(compile 'doit)
(doit)
----------------------
Also I've a perl version that is (guess) more correct:
#!/usr/bin/perl
# regular expression loop-like language:
# search
# for line <page>
# then line <title>english_word</title>
# scan while </page> is not found do
# search
# for "{{trans-top|"
# scan while "{{trans" is not found do
# search
# extract *Finnish: [[word1]] (1), [[word2]], ...
# or
# extract *Finnish: [[word1]] (1), [[word2]] (2), ...
sub print_dict {
my($from, $words) = @_;
chomp $from;
chomp $words;
$from =~ s/.*<title>(.*)<\/title>.*/$1/;
$words =~ s/.*Finnish: ([^ ]*).*/$1/;
$words =~ s/[\[\],]/ /g;
$words =~ s/��/�/g;
$words =~ s/��/�/g;
$words =~ s/^[ ]*//;
if($words =~ / /){
$words =~ s/([^ ]*) .*/$1/;
}
if($words =~ /#/){
$words =~ s/([^#]*)#.*/$1/;
}
# $words =~ s/.Finnish: \[\[([^\]]*)\]\].*/$1/;
print "(\"$from\" \"$words\")\n";
}
open my $fh, '<', "enwiktionary-20070225-pages-articles.xml" or die;
#open my $fh, '<', "t2" or die;
my ($line, $word, $next);
while(1){
last if(eof($fh));
$line = <$fh>;
#print "Parsing: {$line}\n";
# begin page extract
if($line =~ /<page>/){
#print "Found beginning of page.\n";
# verify title is found
$word = <$fh>;
if($word =~ /<title>/){
#print "Word is: $word";
while(1){
last if(eof($fh));
$line = <$fh>;
#print "Searching for translation in {$line}\n";
if($line =~ /<\/page>/){
#print "Found end of page {$line}\n";
last;}
#print "Has line translation? {$line}\n";
if($line =~ /\*Finnish: /){
#print "translation found! {$line}\n";
print_dict($word, $line);
last;}}}}}