At the heart of Lisp and Prolog

From: Pete Abel
Subject: At the heart of Lisp and Prolog
Date: Wed, 19 Dec 2001 19:28:33 +0000
Message-ID: <b119f458.0112191128.67e04dcd@posting.google.com>

Background:
Im interested in the way knowledge can be represented and processed in
computers; from a programming point of view.

Question 1:
What is the "exact" model used to represent "lists" internally (in
memory) in Lisp? Is it, for example, a simple straightforward "tree"
data structure?

Question 2:
What is the exact model used to represent FACTS internally (in memory)
in Prolog? Is it simply a "tree" data structure?

I'd appreciate "as many" details as possible.
(Are there any detailed resources on this subject on the web?)

Your help is most appreciated. Thank you.
Pete

Re: At the heart of Lisp and Prolog Wouter Van Santvliet
- Re: At the heart of Lisp and Prolog Bart Demoen
  - Re: At the heart of Lisp and Prolog Wouter Van Santvliet
    - Re: At the heart of Lisp and Prolog Bart Demoen
  - Re: At the heart of Lisp and Prolog Jens Kilian
Re: At the heart of Lisp and Prolog Thorsten Seelend
- Re: At the heart of Lisp and Prolog Rahul Jain
  - Re: At the heart of Lisp and Prolog Kalle Olavi Niemitalo
- Re: At the heart of Lisp and Prolog Erik Naggum
- Re: At the heart of Lisp and Prolog Vadim Radionov
Re: At the heart of Lisp and Prolog Pete Abel
- Re: At the heart of Lisp and Prolog Jens Kilian

From: Wouter Van Santvliet
Subject: Re: At the heart of Lisp and Prolog
Date: Thu, 20 Dec 2001 01:28:43 +0000
Message-ID: <3c21400b$0$10325$ba620e4c@news.skynet.be>

A Lisp implementation typically uses a single linked list data type. That
means that the basic building blocks for the representation of lists are
pairs. The first element in the pair (car) holds an element of the list. The
second element in the pair (cdr) holds a reference (pointer) to the
remainder of the list. (or at least the first pair of the remainder of the
list) The end of a list is represented by nil, a special value indicating
that further dereferencing is not permitted. The car and cdr primitives
allow one to access both elements of such a pair. The cons primitive creates
such a pair, and initializes it with its arguments. The car of such a pair
can be a list itself, for representing nested lists. For a so called "dotted
pair", the cdr does not hold a list, but a non-list atom. (notation: (a .
b)) A schematic example might help here. You will find it in most Lisp
text-books.

Note that a "program" or function in Lisp is also represented as a list. An
expression like a+b would typically be represented by a list (+ a b). This
eliminated the barrier between code and the data it manipulates, and is a
common feature in AI languages.

Prolog facts are represented as terms. Implementations vary as far as memory
management is concerned. Basically, a term is stored in a consecutive series
of elements in memory (an array), where the first element of the array holds
the functor of the term, and the reaminder of the array holds the arguments
of the term, in the correct order. The term f(a,b) for example is
represented by an array |f|a|b|. An argument of a term can be (a reference
to) another term. Prolog engines allocate memory for term representations on
a stack or a heap. The memory management of prolog is too complex to
describe in a couple of lines. Text-books will often ignore the
representation of terms in Prolog.

The same remark as for Lisp applies: prolog programs (clauses and facts) are
represented as terms too, eliminating the barrier between code and data. A
clause p :- q for example is an infix notation for the term ':-'(p,q).

Prolog supports a list data type, but this is "emulated" by means of terms
with a dot (.) as a functor. The list [a,b,c] for example is represented by
the term .(a,.(b,.(c,[]))).

In fact the list and term data types as supported by Lisp and Prolog
respectively are equivalent in terms of what they can represent. The prolog
list data type shows that for any list there is an equivalent representation
using terms. The Lisp representation of terms (a+b basically is a term with
funcor +, using infix notation) illustrates that terms can be represented
using lists.

Conceptually, both nested lists and nested terms can be viewed as trees
also. (and both nested lists or terms can be used to represent tree
structures) The physical representation used by implementations of both
languages however is as explained before.

"Pete Abel" <·········@hotmail.com> wrote in message
·································@posting.google.com...
> Background:
> Im interested in the way knowledge can be represented and processed in
> computers; from a programming point of view.
>
> Question 1:
> What is the "exact" model used to represent "lists" internally (in
> memory) in Lisp? Is it, for example, a simple straightforward "tree"
> data structure?
>
> Question 2:
> What is the exact model used to represent FACTS internally (in memory)
> in Prolog? Is it simply a "tree" data structure?
>
> I'd appreciate "as many" details as possible.
> (Are there any detailed resources on this subject on the web?)
>
> Your help is most appreciated. Thank you.
> Pete

From: Bart Demoen
Subject: Re: At the heart of Lisp and Prolog
Date: Thu, 20 Dec 2001 09:57:23 +0000
Message-ID: <3C21B603.BCAD1B57@cs.kuleuven.ac.be>

I am not quite happy with what

Wouter Van Santvliet wrote:

> Prolog facts are represented as terms.

...

> The same remark as for Lisp applies: prolog programs (clauses and facts) are
> represented as terms too, eliminating the barrier between code and data. A
> clause p :- q for example is an infix notation for the term ':-'(p,q).

Prolog facts (and rules) (can) have more than one representation: the user sees them in
sources files as terms. Whether internally they have the same representation as runtime
generated terms (like in X =.. [Y,A,B])  ... most often not: static Prolog facts (and rules) are
in most systems represented as virtual machine code (this is for emulators) or as machine
code (some native code systems like SICStus Prolog on Sparc or GNU-Prolog). Dynamic
facts and rules ... it depends on the system: some like to keep them as Prolog terms (not
on the heap usually) and have a fast assert, but a slower execution of these dynamic code;
others like to compile up to a certain point on the fly (Prolog systems have a form of
Jit compilation since at least 1885) and have a slower assert but a faster execution of the
dynamic code ...

>
> Prolog supports a list data type, but this is "emulated" by means of terms
> with a dot (.) as a functor. The list [a,b,c] for example is represented by
> the term .(a,.(b,.(c,[]))).
>

Again there is a difference between source representation and internal representation: whether
you use [a,b,c] or .(a,.(b,.(c,[]))) in a file, most systems will have a representation internally that is
close to the LISP internal representation with a car and a cdr (not cudr-coding though). I know
of only one system (BinProlog) that uses internally the . as well (in some way at least).

Cheers

Bart Demoen

From: Wouter Van Santvliet
Subject: Re: At the heart of Lisp and Prolog
Date: Thu, 20 Dec 2001 20:17:03 +0000
Message-ID: <3c224887$0$10331$ba620e4c@news.skynet.be>

"Bart Demoen" <···@cs.kuleuven.ac.be> wrote in message
······················@cs.kuleuven.ac.be...
> I am not quite happy with what
>
> Wouter Van Santvliet wrote:
>
> > Prolog facts are represented as terms.
>
> ...
>
> > The same remark as for Lisp applies: prolog programs (clauses and facts)
are
> > represented as terms too, eliminating the barrier between code and data.
A
> > clause p :- q for example is an infix notation for the term ':-'(p,q).
>
> Prolog facts (and rules) (can) have more than one representation: the user
sees them in
> sources files as terms. Whether internally they have the same
representation as runtime
> generated terms (like in X =.. [Y,A,B])  ... most often not: static Prolog
facts (and rules) are
> in most systems represented as virtual machine code (this is for
emulators) or as machine
> code (some native code systems like SICStus Prolog on Sparc or
GNU-Prolog). Dynamic
> facts and rules ... it depends on the system: some like to keep them as
Prolog terms (not
> on the heap usually) and have a fast assert, but a slower execution of
these dynamic code;
> others like to compile up to a certain point on the fly (Prolog systems
have a form of
> Jit compilation since at least 1885) and have a slower assert but a faster
execution of the
> dynamic code ...
>
> >
> > Prolog supports a list data type, but this is "emulated" by means of
terms
> > with a dot (.) as a functor. The list [a,b,c] for example is represented
by
> > the term .(a,.(b,.(c,[]))).
> >
>
> Again there is a difference between source representation and internal
representation: whether
> you use [a,b,c] or .(a,.(b,.(c,[]))) in a file, most systems will have a
representation internally that is
> close to the LISP internal representation with a car and a cdr (not
cudr-coding though). I know
> of only one system (BinProlog) that uses internally the . as well (in some
way at least).

The remarks of Bart are correct, and much more specific than the generic
explanation I have provided. They refer to implementations of Lisp and
Prolog environments however, and the efforts that have been made to improve
the behaviour of such environments, in particular wrt performance.

The representations I described are "original" in the sense that they have
been the foundation on which the languages have been designed. A large
number of primitives relate directly to the list and term data structures.
Any implementation that respects the language specification will have to
provide those primitives. (I am not only thinking of the (de)composition
primitives for lists and terms, but also of dynamically added functions or
facts and clauses, introspection, and in the case of Lisp of the subtle
semantics of a substantial part of the primitives.)

For understanding the implementations of Lisp or Prolog better, the remarks
of Bart are most valuable. If one is
> ... interested in the way knowledge can be represented and processed in
> computers; from a programming point of view.
however, they seem to complicate matters more instead of adding to the
general insight.

From: Bart Demoen
Subject: Re: At the heart of Lisp and Prolog
Date: Fri, 21 Dec 2001 08:15:28 +0000
Message-ID: <3C22EFA0.B722C9C5@cs.kuleuven.ac.be>

Wouter Van Santvliet wrote:

> For understanding the implementations of Lisp or Prolog better, the remarks
> of Bart are most valuable. If one is
> > ... interested in the way knowledge can be represented and processed in
> > computers; from a programming point of view.
> however, they seem to complicate matters more instead of adding to the
> general insight.

On better reading the original question, I realize it was asking about many levels at once.
I reacted to the part "What is the exact model used to represent FACTS internally
(in memory) in Prolog?" (and the same about lists)
Anyway - is the original poster satisfied or does he want more :-)

Bart Demoen

From: Jens Kilian
Subject: Re: At the heart of Lisp and Prolog
Date: Fri, 21 Dec 2001 12:06:21 +0000
Message-ID: <sfwuzgpyki.fsf@bstde026.germany.agilent.com>

Bart Demoen <···@cs.kuleuven.ac.be> writes:
> (Prolog systems have a form of
> Jit compilation since at least 1885)

Wow.  Prolog seems to be older than I ever thought!

;-)

> Again there is a difference between source representation and
> internal representation: whether you use [a,b,c] or
> .(a,.(b,.(c,[]))) in a file, most systems will have a representation
> internally that is close to the LISP internal representation with a
> car and a cdr (not cudr-coding though). I know of only one system
> (BinProlog) that uses internally the . as well (in some way at
> least).

IIRC, BinProlog uses a generalized form of cdr-coding when it can, putting
the functor cell of the last argument of a term into the space of that
argument.  BinProlog puts type tags on data objects to make this work;
most other Prologs (that I know of) use tagged pointers.

Bye,
	Jens.
-- 
··········@acm.org                 phone:+49-7031-464-7698 (TELNET 778-7698)
  http://www.bawue.de/~jjk/          fax:+49-7031-464-7351
PGP:       06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]

From: Thorsten Seelend
Subject: Re: At the heart of Lisp and Prolog
Date: Sun, 23 Dec 2001 14:47:23 +0000
Message-ID: <a04qpr$hhn$07$1@news.t-online.com>

"Pete Abel" <·········@hotmail.com> schrieb im Newsbeitrag
·································@posting.google.com...
> Background:
> Im interested in the way knowledge can be represented and processed in
> computers; from a programming point of view.
>
> Question 1:
> What is the "exact" model used to represent "lists" internally (in
> memory) in Lisp? Is it, for example, a simple straightforward "tree"
> data structure?
>
> Question 2:
> What is the exact model used to represent FACTS internally (in memory)
> in Prolog? Is it simply a "tree" data structure?
>
> I'd appreciate "as many" details as possible.
> (Are there any detailed resources on this subject on the web?)
>
> Your help is most appreciated. Thank you.
> Pete

An additional remark.

In Lisp  the base  data structure  isn't a list  but a  so called
Structured  Expression  (SEXpression).   Every  non  atomic  data
internally is represented as a "dotted pair". For example:

(peter . 12) is represented as

 ---------
 | . | . |
----------
   .   |
   ....|..> peter
       |
       |--> 12

Historically based on the machine language of the IBM computer on
that Lisp  first was implemented,  these to parts  (pointers) are
called CAR  (Contents of Address  Register) and CDR  (Contents of
Displacement Register) respectively.

Because of the  fundemantal meaning of lists, they  got a special
representation:

(a b c)

which in fact is just an abbreviation and identical to:

(a . (b . (c . nil)))

So internally, Prolog and Lisp represents all structured data the
same way; even the "operator" is the same: The dot.

By
Thorsten

From: Rahul Jain
Subject: Re: At the heart of Lisp and Prolog
Date: Sun, 23 Dec 2001 23:33:27 +0000
Message-ID: <874rmhzf3s.fsf@photino.sid.rice.edu>

"Thorsten Seelend" <·······@gmx.de> writes:

> In Lisp  the base  data structure  isn't a list  but a  so called
> Structured  Expression  (SEXpression).   Every  non  atomic  data
> internally is represented as a "dotted pair". For example:

How is a hash-table internally represented as a cons cell?

See cmucl for an example implementation where it's not. I'm at a loss
to find a sample implementation where it is a cons cell.

-- 
-> -/-                       - Rahul Jain -                       -\- <-
-> -\- http://linux.rice.edu/~rahul -=- ·················@usa.net -/- <-
-> -/- "I never could get the hang of Thursdays." - HHGTTG by DNA -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.220020101.23.50110101.042
   (c)1996-2000, All rights reserved. Disclaimer available upon request.

From: Kalle Olavi Niemitalo
Subject: Re: At the heart of Lisp and Prolog
Date: Wed, 26 Dec 2001 21:40:13 +0000
Message-ID: <871yhh8xtu.fsf@Astalo.y2000.kon.iki.fi>

Rahul Jain <·····@sid-1129.sid.rice.edu> writes:

> "Thorsten Seelend" <·······@gmx.de> writes:
> 
> > Every non atomic data internally is represented as a "dotted pair".
> 
> How is a hash-table internally represented as a cons cell?

Hash tables are atomic.  ;-)

From: Erik Naggum
Subject: Re: At the heart of Lisp and Prolog
Date: Mon, 24 Dec 2001 06:26:32 +0000
Message-ID: <3218163989125367@naggum.net>

* "Thorsten Seelend" <·······@gmx.de>
| In Lisp the base data structure isn't a list but a so called Structured
| Expression (SEXpression).  Every non atomic data internally is
| represented as a "dotted pair".

  That was an amazing number of errors in so little space!  Are you coming
  from the Scheme camp with this "insight" into Lisp?  Since you have
  managed to believe this stuff despite a huge amount of corrections every
  time someone says something equally wrong, all textbooks on Lisp correct
  these misconceptions, and and no textbook on Lisp reinforce these strange
  ideas, I presume that they are taken out of nowhere, you believe it very
  strongly (or you would not be unafraid to post this nonsense in public
  view of thousands of Lisp experts), and that you will not listen to
  anyone who corrects you this time, either, this is for the people who
  could mistake your article for containing information:
  
  There is no _one_ basic data structure in Lisp.  First, "basic" depends
  on what you are looking at, and how you are looking at it.  _If_ you are
  looking at the symbolic (note: not "structured") expression, well, then
  the list implemented with the cons cell has some interesting properties,
  and atom/non-atom is a valid distinction.  Not otherwise.  Second, Lisp,
  like every other language, has arrays.  Lisp 1.5 had arrays.  (In LISP
  1.5 Programmer's Manual, 4.4 The Array Feature starts "Provision is made
  in LISP 1.5 for allocating blocks of storage for data.")  Even Scheme has
  arrays, so they must be among the _truly_ basic types.  Third, while it
  is true that a cons cell is not an atom, the incompetent wording "every
  non-atomic data" implies that there are more than one type of non-atomic
  data.  There is only _one_ non-atomic data type, the cons cell.  The cons
  cell _is_ a dotted pair, so the statement you made is really incompetent.
  Oh, and the adjective "base" has a different meaning than you think --
  look it up in a dictionary.

  The Symbolic Expression is abbreviated "S-expression".  The way you have
  written it, the function sexp would always return true.  <-- joke.

  An S-expression is a list of S-expressions _or_ an atom, so "atom" covers
  the whole rest of the type hierarchy.  Now, it really did look like you
  said that all _atoms_ are represented as a dotted pair, because the way
  you wrote about the cons cell is so incompetent.

| Historically based on the machine language of the IBM computer on that
| Lisp first was implemented, these to parts (pointers) are called CAR
| (Contents of Address Register) and CDR (Contents of Displacement
| Register) respectively.

  Sigh.  It was DECREMENT, OK?  What _is_ your source for all this crap?

  These operators referred to halves of a machine word because memory was
  usually less than half of what machine words could address.  A common way
  to implement a push-down list (now called a stack) was with a machine
  word that contained a pointer to the top of the stack in one half and the
  negative of the size of the allocated area in the other half.  Then you
  would add 1 to each half when you pushed something on the push-down list
  (which grew upward) and when the decrement half overflowed to (positive)
  zero, you had a stack overflow condition.  Checking for this condition
  was part of the push and pop instructions.  All pretty clever.  Using the
  two halves of a machine word for a cons cell actually required the cdr to
  be negative, which is all documented in the Lisp 1.5 Programmer's Manual,
  which you should acquire and read before the next time you wish to shoot
  your mouth off in a Lisp experts forum.

| So internally, Prolog and Lisp represents all structured data the
| same way; even the "operator" is the same: The dot.

  This is Just Plain Wrong.

  I think you need to actually _read_ the LISP 1.5 Programmer's Manual�
  before you talk any more about it, but more than that, read some of the
  stuff that has been done in Lisp over the past 40 years since the first
  article in the CACM.  Incidentally, I have not been able to acquire a
  copy of the LISP 1 Programmer's Manual.

  I _really_ hope you know more about Prolog than you know about Lisp.

///
� ISBN 0-262-13011-4
  <URL:http://www.amazon.com/exec/obidos/ASIN/0262130114>
-- 
  The past is not more important than the future, despite what your culture
  has taught you.  Your future observations, conclusions, and beliefs are
  more important to you than those in your past ever will be.  The world is
  changing so fast the balance between the past and the future has shifted.

From: Vadim Radionov
Subject: Re: At the heart of Lisp and Prolog
Date: Mon, 24 Dec 2001 18:34:32 +0000
Message-ID: <a07sfn$r94$1@sdtcom.lg.ua>

In comp.lang.lisp Thorsten Seelend <·······@gmx.de> wrote:

TS> In Lisp  the base  data structure  isn't a list  but a  so called
TS> Structured  Expression  (SEXpression). 

As I know, the abbrivation s-expressition has meaning: "Symbolic expression"


---
RVP

From: Pete Abel
Subject: Re: At the heart of Lisp and Prolog
Date: Mon, 24 Dec 2001 17:09:29 +0000
Message-ID: <b119f458.0112240909.4f51fe0@posting.google.com>

Wouter Van Santvliet wrote:
{
The term f(a,b) for example is represented by an array |f|a|b|. An
argument of a term can be (a  reference to) another term. Prolog
engines allocate memory for term representations on a stack   or a
heap. The memory management of prolog is too complex to describe in a
couple of lines.  Text-books will often IGNORE the representation of
terms in Prolog.
}

This was the "closest" answer to what I've asked about:
{
Question:
What is the exact model used to represent FACTS internally (in memory)
in Prolog? Is it simply a "tree" data structure?
}

A few "facts" first, I use Turbo Prolog 2.0 for the purpose of
experimentation. And I use C++/VC++ to write code. And my "goal"
starts by trying to simulate the functionality of Prolog's inference
engine... first by finding the right data structure to represent ALL
facts and rules, then by finding the right search algorithm based on
"that" data structure.... but the main problem was the immense
complexity of the "data structure management algorithm" that I've
deduced by "tracing" Turbo Prolog 2.0 goal-execution strategy!

For example, using "arrays" only to represent "terms" is relatively
easy, but requires more "pointers" work... on the other hand, trees
are far more complicated but require less attention to the "framework"
pointers!

/* Question */
In other words, what is the model used (e.g. in Turbo Prolog 2.0) to
represent data (MULTIPLE facts of the form: f(a,b)) in memory so that
future "goals" can be answered (found: looked for in the data
structure) easily?

I'd appreciate more details on "this specific point". Thank you. 
Pete

From: Jens Kilian
Subject: Re: At the heart of Lisp and Prolog
Date: Mon, 07 Jan 2002 13:58:03 +0000
Message-ID: <sfn0zqqn4k.fsf@bstde026.germany.agilent.com>

·········@hotmail.com (Pete Abel) writes:
> /* Question */
> In other words, what is the model used (e.g. in Turbo Prolog 2.0) to
> represent data (MULTIPLE facts of the form: f(a,b)) in memory so that
> future "goals" can be answered (found: looked for in the data
> structure) easily?

If you can get your hands on a copy of the first edition of _Prolog for
Programmers_ (F. Kluzniak & S. Szpakowicz, Academic Press), you'll find
source code for a tiny Prolog interpreter written in Pascal.
It's not very powerful, though.

Most modern Prologs are compiled, using a nice abstract machine invented
by D.H.D. Warren (the WAM).  IIRC, Hassan Ait-Kaci's book on the WAM is
available online somewhere.

HTH,
	Jens.
-- 
··········@acm.org                 phone:+49-7031-464-7698 (TELNET 778-7698)
  http://www.bawue.de/~jjk/          fax:+49-7031-464-7351
PGP:       06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]