From: Peter Dudey
Subject: Free program:  GALE  (long--includes description)
Date: 
Message-ID: <C89u92.GI5@willamette.edu>
The program explained below is now available.  Send email to ·······@sfu.ca if  
you are interested.

--
! Peter Dudey, 11 kyu, Lisp SubGuru and Purveyor of Fine Cheeses >FINGER ME< !
! Tobacco is the leading cause of death in the U.S. | More mass transit!     !
! Darwin was right. | There is no valid use for handguns or nuclear weapons. !
! Please mail me plastic spaceships:  257 NE 13th Street, Salem, OR  97301   !








An Evolutionary Program to Mimic Language Acquisition:
GALE (Genetic Acquisition of a Language of Exchange).


-------------------------------------------------------------------------------
It's free, and you can do what you want with it, but:

1)  Keep my name on it, and

2)  I'm not responsible for anything you do to or with it.  Don't sue me, I  
don't have any money.

-Peter Dudey
-------------------------------------------------------------------------------


GALE was written by Peter Dudey in CLISP. CLISP was written by Bruno Haible and   
Michael Stoll, and is available via anonymous ftp from  
ma2s2.mathematik.uni-karlsruhe.de.  The source code accompanies this file.  It  
will, of course, run much faster if compiled;  since compiled code will depend  
on the platform, that is left to the user.


This was in response to a brief thread that emerged on comp.ai.genetic about
 modeling human language evolution after an idea by Paul Harrald. 


   On a vt100 monitor GALE displays a grid,  each element of the grid can
 contain either a ``resource'' or a ``Critter.'' There are three types of
 resource, they are represented by the non-alphanumeric objects of a comma (,),
 a bar (_), or a dot (.).  A ``Critter'' is one of the main actors in GALE.
 Critters are small neural networks. They are represented by alphabetic
 characters, either upper or lower case or at-signs (@). The Critters move
 randomly to adjacent
 grid points (in Manhattan fashion). If the grid point contains a resource, the
 Critter picks it up. If there is another Critter adjacent they may attempt to
 communicate. The reason for communication is that the individual resources are
 of no value in themselves, but a combination of a comma, bar and dot can be
 used to produce one `widget' (what else?). A widget endows a Critter with 50
 units of `lifespan'---it can exist for 50 more moves. The resources are
 `manna from heaven' in that they appear randomly in unoccupied grid points
 from time to time.

   When two Critters meet they attempt to communicate by uttering a sound from
 a fixed vocabulary. The sound they utter is intended to convey two things:
 what they have that is available and what they want in exchange. They listen
 to one another, try to guess the intention of the other, and then offer what
 they think the other wants---literally present the item. It
 might be complained that the Critters can simply see what the other has, and
 hence a possible trade is manifest: no need for language. We have in mind that
 carry around the resources is very arduous, and that they are hidden in a
 secret archive known only to the Critters. They then wander around, bumping
 into the others. Once they guess at the meaning of the other's utterance, and
 if they think there is a trade to be made, they go to their respective
 archives and retrieve what they think is appropriate (if they think there is
 the possibility of trade). When they reconvene, they then expectantly present
 what they have, and are either pleased or disappointed by what the other has.
 If they have successfully interpreted one another (and have something to  
trade) a trade takes place, otherwise it does not.
   As the critters move around they are trying to associate a word utterance
 with an idea. There are six basic ``ideas'' the Critters would like to convey;

   ``I will give you a dot for a comma''

   ``I will give you a dot for a bar''

   ``I will give you a comma for a dot''
 
   ``I will give you a comma for a bar''

   ``I will give you a bar for a dot''
 
   ``I will give you a bar for a comma''


There are six sounds that a Critter can utter:


       ``Fnord''	(See Robert Anton Wilson's _Illuminatus!_ trilogy)
       ``Flugblogs''	(See damn near anything by Bob French)
       ``Bar''		(The rest are standard LISP gibberish variables)
       ``Baz''
       ``Foo''
       ``Quux''


Each critter assigns 36 integer-valued weights between the six ideas and the
 six words. When the critter wishes to convey a certain idea, it utters the
 word connected to that idea with greatest weight. Similarly, when a Critter
 hears a particular word, it associates an idea on the basis of greatest
 weight.


    The Critters alter the weights as experience dictates. Periodically two
 Critters, when adjacent will mate, producing a genetic crossbreed Critter that
 is placed at a random, empty point.

GALE keeps track of many statistics to illustrate the evolution of common
 interpretations of words (or not)---later.


Representing the Critters


   The agents in GALE are comprised of two parts: a `brain' and `genes.' The
 brain and genes are stored as an array, the 36 weights between the ideas and
 words. The brain can `learn' by increasing weights in response to correct
 interpretation, and decreasing weights in response to an incorrect
 interpretation.
   Initially, and at birth, a critter's brain and it's genes are identical.
 Over time, the brain learns via interaction, but the genes (naturally) remain
 the same.


Movement and interaction.


The initial grid is randomly seeded with critters and resources, randomly
 placed in the ``world.'' Each critter moves to one of the adjacent grid
 points: if the critter is located at grid point (x,y) it will move to grid
 points , (x,y+1), (x+1,y), (x-1,y), (x,y-1), with equal probability.
 Three things can happen when a critter takes up a new position: it can
 find a resource, it can find nothing, or it can become adjacent to another
 critter. Naturally, the first two are mutually exclusive events. A critter who
 finds a resource that enables it to produce a widget does so immediately.


   When two critters become adjacent, they ``interact.'' The nature of an
 interaction depends upon the critters' current stores of resources. A critter
 with two identical resources (and nothing else) will want to trade one of them
 for either of the two resources it does not have in its possession. This
 applies to a critter with a resource of one type and two of another, in which
 case, however, it will want to acquire the third type. Since critters always
 manufacture (and consume) widgets immediately, they can have at most two
 resource types in their possession. There is no limit on the quantity of
 resources the critters can hoard. Generally, then, a critter has a particular
 quantity of one resource, and a particular quantity of some other. If either
 of these quantities is greater than two, the critter would like to trade.


Attempts to trade


   It is highly unlikely that trade will take place unless there is
 successful communication. When two critters meet, and if they want to make a
 trade, they must choose a word to represent their desires. An idea is
 represented as a dotted pair, of the form (resource1 . resource2). The
 notation (resource1 . resource2) indicates a desire to swap a unit of
 resource1 for a unit of resource2, this constitutes an `idea'. There is no
 quibbling about prices. A critter who does not wish to trade simply says
 nothing. Should either critter do this, that is the end of the interaction.
 This leaves the case in which both critters have uttered a word. Recall, the
 word that is uttered is the one with greatest link to the idea that the
 critter wishes to convey. Having exchanged this initial utterance, the
 critters then must decide on an interpretation of what the other said---again
 based on which idea has greatest weight associated with the word the other
 chose. If a critter interprets the other's utterance as the converse of its
 own (that is, it thinks they can trade), it will make the same utterance
 again. If the critter thinks (given its interpretation of the other) that the
 other does not want to trade, then a new offer is made---a new word, if the
 critter has a secondary trade it would like to make. Both critters now make
 the offers indicated by their second utterances.  If they match, there is a
 trade.
 

Double-Quadruple-coincidence


It can be seen that trades will be quite a rare occurrence. A prerequisite is
 the incidence of double-coincidence of wants in the traditional sense. With
 only three goods, this is not all that unlikely. Notice that there can be only
 one feasible trade between any two critters with two resources, but if one
 critter has more than one unit of just one resource, then there are two
 possible trades with another critter with more than one unit of the other two
 resources. In order that a trade take place the critters must then use the
 same words to indicate the two ideas to be conveyed. A further coincidence is
 required: the critters must be adjacent before anything can happen. Given
 this, we would not expect to see communication adapt very quickly. If GALE
 illustrates one thing, it is that if language is an emergent phenomenon, it
 certainly was not very rapid in its emergence. Yet, by evolutionary standards,
 we have developed an incredibly complex means of verbal communication in very
 short order. This still remains a puzzle for me.

[The critters in GALE don't get to learn directly from their parents in
anthing resembling school.  Also unlike humans, they all just happen to speak  
the same "mental" language, i.e., have the same (finite) set of ideas--it's  
just a question of correlation.  It's more of a model of learning /foreign/  
languages.  Hmmm, maybe if each critter was understood to represent a nation or  
culture . . .
Also see Richard Dawkins' _The Selfish Gene_, for the concept of memes.
-Peter Dudey]


Screen output GALE and internal operations.

In this section I show some sample screens from GALE as it is running to help
 explain what is possible when two critters bump into one another, and also
 discuss several other features of GALE. Consider the following:


             _           UJ  .           ,Q
                 N ,  ,I       , _ 
             S   .E          .               
               .         ,          ,    Y   
            .,        _       ,              
                .                            
           ,BW     Z   M          D          
                         ,   ,        A_     
               F                       C     
          _     .         O_  R_ _    _T _ K 
             G .         _         V  ,X  ,  
              _              ,               
             .  .     _H       .  .       _  
                          ,  _        P      
                 L           ,  ,         ,  

          Command? [QOTHKNDFEL]

This is the start-up screen. The capital letters represent critters, which have
 been randomly positioned. In this run, there are 26 critters represented by
 the lettters of the alphabet. Resources have been randomly positioned in free
 grid-points. The menu, `Command? [QOTHKNDFEL]' offers the user the the
 following options: Q= Quit, O= one epoch, T= ten epochs, H= one hundred
 epochs, K= 1000 epochs, D= calculation of population diversity, F= an  
approximate
 diversity calculation (F for `fast diversity'), L = a list of statistics,
 yielding the following:

                                   STATISTICS

Word:    (./,)     (./_)     (,/.)     (,/_)     (_/.)     (_/,)
Foo       0/0       0/0       0/0       0/0       0/0       0/0
Bar       0/0       0/0       0/0       0/0       0/0       0/0
Quux      0/0       0/0       0/0       0/0       0/0       0/0
Baz       0/0       0/0       0/0       0/0       0/0       0/0
Fnord     0/0       0/0       0/0       0/0       0/0       0/0
Flugblogs 0/0       0/0       0/0       0/0       0/0       0/0


Number of times used/correctly interpreted


(Hit any key to continue)


The root menu, [QOTHKNDFEL], appears after each menu choice is completed. An
 `epoch' consists of one move per critter, and of course, any associated
 interactions. If two critters meet, and neither of them wishes to trade, they
 make no attempt at speech. This happens a lot early on as resource
 acquisitions are limited. As the critters acquire resources, the potential for
 trade increases, as does the complexity of interactions. Consider two
 critters, one of which wishes to acquire a COMMA, the other a DOT.
 
GALE shows one possible interaction below:

Critter 3 (C) at (6 . 6) moving.  Lifespan: 538  Goodies: (DOT BAR BAR BAR BAR) 
C: Bar? (meaning (BAR . COMMA), interpreted as (COMMA . DOT))
S: Baz? (meaning (BAR . COMMA), interpreted as (BAR . DOT))

S: Baz. (meaning (DOT . COMMA), interpreted as (BAR . DOT))


It worthwhile decomposing these events, I think. Critter C has the hoard (DOT
 BAR BAR BAR BAR), and naturally wishes to acquire a COMMA, and would like to
 swap a BAR for a COMMA.


Footnote: If critter C thought it would be very easy to acquire a DOT in the
 future, then it might consider trading its DOT for a COMMA. This kind of
 reasoning is not yet embodied in GALE.

 
  Although not shown, critter S also wants to acquire a COMMA, and is also
 going to offer a BAR. These critters do not actually have a trade to make, but
 they are not sure of this. To convey the idea `(BAR . COMMA)' critter C has
 selected the word `BAR' which critter S has interpreted as `(COMMA . DOT).'
 Critter S does not want a DOT, and makes an offer to swap its BAR for C's
  COMMA, which of course C doesn't have. After the first attempt at speech, the
 two critters have different ideas about what has happened. Critter C thinks
 the following sequence of events has occurred: 1) I offered him my BAR for his
  COMMA. 2) His response was to offer me his BAR for my DOT. Critter C then
 waits for S to make a second offer, which may differ from the first offer. All
 that happens is that S makes the same offer again (in A's estimation). Critter
  A ends this interaction by saying nothing, meaning there is no trade to take
 place.
   Now let us see these events as critter S interpreted them. The first event
 was 1) The other critter offers me a COMMA in exchange for a DO}, then 2) I
 offer BAR in exchange for a COMMA, 3) The other critter did not make an
 attempt to trade, so I make the new offer, 4) I offer, as an alternative, a
 DOT, in exchange for a COMMA, 5) nothing doing, apparently.


It may seem odd that critter S uses the same word, Baz, to convey two different
 ideas. This, however, will happen from time to time.   It so
happens that, for critter S, the meanings (BAR . COMMA) and (DOT . COMMA)
are both expressed the same way.  Every critter (at any moment) has
exactly one word to express a given meaning, and exactly one
interpretation of a given word it hears, but the converses are not true.
A word may be used to express several different meanings, and an idea may
be the interpretation of several different (heard) words: thanks Peter!
Note that neither critter can be sure that the other has misunderstood any
 utterance in this interaction. In this regard, the interaction illustrated
 does not alter the brains of the critters.

   Consider a more fruitful encounter:


Critter 12 (L) at (18 . 12) moving.  Lifespan: 990  Goodies: (COMMA...
L: Flugblogs? (meaning (BAR . DOT), interpreted as (DOT . COMMA))
F: Foo? (meaning (COMMA . BAR), interpreted as (DOT . BAR))
L: Flugblogs. (meaning (BAR . DOT), interpreted as (DOT . COMMA))
F: Foo. (meaning (DOT . BAR), interpreted as (DOT . BAR))
L trades BAR for F's DOT

In this encounter, only the final statement of F is properly understood, and it
 happens to correspond to a feasible trade, which takes place. For both of
 these critters the weight associated between the idea (DOT . BAR) and the word
  Foo is increased, the weight between (DOT . BAR) and all other words is
 decreased, as is the weight between Foo and all other ideas. the amount of
 increase and decrease is stored in a variable in GALE. Two critters could make
 a mistake in expecting they have a compatible trade to make, only to
 subsequently discover they had misinterpreted each other. This has the obvious
 effects on the weights between ideas and words.

   The option E in the root menu allows the user to observe, and edit if
 desired, a critter's brain. For example, critter A's brain might look like
 this initially:


Critter A

Word:    (./,)     (./_)     (,/.)     (,/_)     (_/.)     (_/,)
Foo       -4        -9        -9        -2        -6        -10
Bar       -5        -8        -3        -5        -2        -5
Quux      -1        -9        0         -2        -8        -2
Baz       -1        -9        -5        -2        -1        -2
Fnord     -8        0         -9        -4        -2        -1
Flugblogs -5        -10       -3        -1        -9        -6

Option? [QCS]

}
The sub-menu here allows the user to go back to the main screen, Q, perhaps to
 quit altogether or to carry on with some more epochs, to ``clear'' this
 critter's brains and genes (reset all links to zero), or to set any one of the
 links displayed. Links (initially) range from +10 to -10. 

   The option D calculates an overall ``diversity'' calculation. For GALE
 diversity is based on actual performance, not (as yet) the size of
 links. It is an average of the `diversity' between the brains of all critters
 compared pairwise (but not with themselves). The critters can react to 6 types
 of `input:' each one of the ideas. by ``react'' I mean generate an
 interpretation. There are also the converse words used by critters to
 indicate each of the 6 ideas. In this respect, critters can either agree or
 disagree on 12 different counts, the 6 interpretations associated with the six
 words, and the six words associated with the 6 ideas. This number is
 calculated for each pair of critters, and 1 added to it to ensure safe
 division. This calculation is then averaged for all critters.  (In the saved
 statistics, the diversity is only calculated for critters with more than
 100 units of lifespan (or some other number if desired) so that children don't
 create spurious jumps in the diversity calculation. `F' makes a `fast'
 diversity population by sampling the population randomly.

  The option N creates a new world populated by a fresh new-born population.
  There are some other details. The initial population of 26 critters are given
 the names A through Z, and are indexed by 1 through 26. Children are given the
 names a through z sequentially, and after that they always appear as @'s.
 However, indexes are incremented sequentially.
 
   In some runs of GALE there are many births. When two critters meet if both
 their lifespans are ``large'' they give birth with a probability equal to a
 specifiable constant, *fecundity*. If two critters breed, they still interact
 linguistically, and perhaps make a trade.

   If the user is interested only in the development of neural networks, it is
 possible to set *fecundity* equal to zero, and ensure initial lifespans are
 large enough. However, it may be of interest to watch a genetic algorithm-like
 evolution too, in which case birth is desirable. It can be tricky to set
 lifespans and *fecundity* in a particular relationship that maintains a
 reasonable sized population, and overcrowding can be a problem. This
 simulation is not designed to mimic adaption to life-threatening situations,
 although clearly it could be adapted to do so. We have found that a method of
 `stark-fist' removal helps. Each epoch a grid-point is randomly selected. If a
 critter is unfortunate enough to be residing in that grid point then that
 critter loses a certain amount of lifespan, which may  be fatal.

   Variables and constants in Lisp are declared at the top of the code for
 gale. Some of the ones a user might want to alter are:


   *save-critter-files*: GALE can save statistics related to each critter,
 described in detail in the next section. If  *save-critter-files*T then these
 statistics are saved, if *save-critter-files*=nil they are not.


  *world-width* and *world-height* are the dimensions of the grid inhabited by
 the critters. The example here show a 35 by 15 grid.  Changing the height may  
well mess up the message-writing mechanisms, so it is best to only twiddle  
width.


  *learning-rate* is the magnitude of the  change in the links between words
 and ideas when there is any change.
 

  *initial-resources* is the number of initial resources placed in the world.
 Whatever this number is, the choice of resource type is random when a resource
 is placed in a grid-point. Each epoch a number *resources-per-epoch* are
 placed in the grid, if there is room. If there is not, GALE will hang!


 *fecundity*,  is the probability of a birth when two `fit' critters meet.


 *delay* is the amount of time, in seconds (approximately) that GALE will pause
 after it displays the details of an interaction.


 *smite* is the number of  units of lifespan removed by the stark fist. The
 stronger is the invisible hand, the less important will be the stark fist.


File output.

In its current form GALE will save some statistics related to its current run,
 all of which are reported for a particular run in the next section. Always
 stored is a file called `stats' are contains 110 variables, defined as
 follows:

The words and ideas are given as follows:
 Foo       = Word 1     
 Bar       = Word 2
 Quux      = Word 3  
 Baz       = Word 4  
 Fnord     = Word 5 
 Flugblogs = Word 6 

(dot . comma)   = Idea 1 
(dot . bar)     = Idea 2
(comma . dot)   = Idea 3 
(comma . bar)   = Idea 4  
(bar . dot)     = Idea 5 
(bar . comma)   = Idea 6 

The 110 columns in stats are:

Column       data

1            Epoch
2            population diversity
3            Number of times word 1 used as idea 1 (this epoch)
4            Number of times word 1 used as idea 2
 
 
9            Number of times word 1 used as idea 6
10           Number of times word 2 used as idea 1
 
 
38           Number of times word 1 correctly interpreted as idea 1
 
 
75           Number of critters using word 1 as idea 1
 
 
110          Number of critters using word 6 as idea 6

Optionally, files store statistics on individual critters too, in files of the  
form critter#.stats. the file critter3.stats would contain the following:

Column        data

1             Epoch
2             lifespan
3             Total number of interactions so far
4             Number of trades of idea 1 type so far
 
 
9             Number of trades of idea 6 type so far
10            (Block metric) distance from critter 1
11            Diversity with critter 1
12            (Block metric) distance from critter 2
13            Diversity with critter 2
14            (Block metric) distance from critter 4
15            Diversity with critter 4
 
 

The number of columns will depend upon the number of critters in existence;   
there will be 9 + (2 * <number of critters minus 1>), the -1 because a critter  
is not compared to itself.  The number of columns will vary with population  
size!