Languages for self-modifying code

From: ········@ulkyvx.louisville.edu
Subject: Languages for self-modifying code
Date: Sat, 04 Dec 1993 11:18:51 +0000
Message-ID: <1993Dec4.071851.1@ulkyvx.louisville.edu>

	There has been an on going discussion of what languages 
are effective for creating self-modifying code in the comp.ai 
newsgroup.  I will summarize some of the langauges I have 
identified thus far.  I am distributing this note to several 
different language groups because I am interested in hearing 
of other languages that are also effective for creating self-
modifying code.

	When discussing languages for creating self-modifying 
code we must first identify why we want such languages.  If 
you have such tools then you can write a program that itself 
will write a program.  This can be considered a form of 
learning where a program learns a procedure or function that 
can be used to do a task.

	Thus the main focus is to be able to create a function or 
procedure and then run it. We also want to be able to create a 
function or procedure and then be able to modify it easily.  
For the first function created might not do the task we want it 
to do, thus we would want our program to automatically 
modify the created function or procedure until it is modified 
into a form that will execute the task effectively. 

	What is the most effective representation of a function 
that can be modified?  If we represent a function as a string it 
is not as easily modified and manipulated as when it is 
represented as a structure of symbols.  Thus to have an 
effective language for the creation of self-modifying code it is 
best to have a language that manipulates symbols and 
structures effectively.  Examples of languages that fit into this 
category are: Lisp, Prolog, Pop, Smalltalk, Icon,  and many 
others.  C, Pascal, Ada and like languages do not have as 
effective built-in tools for the creation and manipulation of 
structures and symbols.

	We would also need tools that can then execute the 
symbol-structure representations of functions that we have 
created.  Preferrably we would want a tool that could execute 
the structure on different sets of input values.  This tool would 
resemble the apply tool that exists in some implementations of 
Lisp.  Apply takes two arguments.  The first argument is a 
structure that represents a function.  The second argument is 
a list  or structure of arguments to be passed to the function.

There are a number of programming languages that allow the 
creation of such 'apply' tools fairly easily.  The languages that I 
have identified so far are: Lisp; Prolog; Pop-11; and Smalltalk.

In Common Lisp apply is a built function that operates not on a 
structure but rather on a function type.  Thus we need to 
name our 'apply' tool differently.  We name it 'apply_to' and 
define it as follows.  (Scheme and other Lisp dialects probably 
have similar tools.)

> (defun apply_to (funlist arglist)
	(apply (eval (list 'function funlist)) arglist))

We execute apply_to as follows:

> (apply_to '(lambda (a b) (+ a b)) '(4 5))
9


In Pop-11 the name apply is also already taken, thus again we 
use 'apply_to' and define it as follows:  (Developed in 
discussions with A. Sloman.):

: define apply_to(funlist, arglist);
	popval(funlist) (explode (arglist));
  enddefine;

We call it as follows:

  apply_to([procedure(A, B); A+B =>; endprocedure], [4 5]);

** 9

Prolog's syntax is significantly different from Lisp and Pop-11.  
Predicates (functions) are broken into several parts called 
clauses.  Thus we need a structure that can represent  several 
clauses.  In order to represent a predicate in a structureal 
form we will use a structure that has the name 'lambda' and 
has one argument which itself is a list of nameless clauses.  We 
will also use a lambda structure of two arguments to represent 
a single clause.  The first argument being a list of parameters 
for this nameless clause and the second argument being the 
body.  Apply is then defined as follows:

apply(lambda([Clause1|_]), Args) :- 
	Function =.. [lambda|Clause1], apply(Function, Args).
apply(lambda([_|Rest], Args) :- apply(lambda(Rest), Args).
apply(lambda(Args, Body), Args) :- Body.

Apply can then be used as follows: (Note that a third argument 
'C-Result' is needed for a return value.)

?- apply(lambda([[[A,B,C], (C is A + B)]]), [4, 5, Result]).
C=Result=9

For a more detailed discussion of apply in Prolog see my article 
in "Computer Languages" 1993, Volume 18, # 1.

	Defining apply in Smalltalk is slightly more complex.  
There are no tools for 'apply' on structures.  There are only 
tools for the evaluation of strings.  Likewise unnamed 
functions are in the form of blocks which differ from the 
structures that exist in smalltalk.  Thus there are a number of 
conversions that must be made to apply a method (function) 
in the form of a structure.  The below defined apply is limited 
because of all of the ambiguities of translations of structures 
to strings and then to blocks.  The below apply only takes numbers as 
arguments to the unnamed function being passed to apply and 
only allows three parameters at most.

apply: lambda arguments:args

	^Compiler evaluate: 
		(((lambda printString) replaceFrom: 1 to: 1 
					with: '[' startingAt:1)
		 replaceFrom: ((lambda printString) size) to:
				((lambda printString) size) 
					with: ']' startingAt:1)
		, ((args collect: [:each | ' value:', 
					   (each printString)]) 
		   inject: '' into: [:string :next | string , next])

We would use the above method as follows:

^Compiler apply: #(:a :b | a+b) arguments: #(4 5)
9



	To the best of my knowledge other languages such as 
Icon, and Forth don't have tools such as 'apply' although they 
do have tools to create and manipulate symbols and 
structures.  Many people have commented that we could 
create a program that wrote a string that represented a 
program to a file and then exectued that file.   Icon reportedly 
has the power to do this.  While this of course is possible it  
adds a number of complicating steps in the process of creating 
self-modifying code, and thus makes these languages less 
suited for this purpose.  With Icon some of the compilcations 
include: translating a structure of symbols into a string; 
writing a file; managing files for there may be an apply within 
an apply;  executing a file; and accessing the value outputed 
from the execution of the file.  With other languages such as C 
the same compilcations exist, however we have a couple of 
additional difficulties: the development of a set of tools for 
structures; and a set of tools for representing symbols that can 
act as key words or variable names in the functions that are 
created by the self-modifying code.

	I am very interested in receiving any email or news 
responses.	 In particular demonstrations of 'apply' in other 
langauges.


H. Justin Coven, Ph.D.
Computer Science Dept.
Bellarmine College
Louisville, KY 40206
········@ulkyvx.louisville.edu

Re: Languages for self-modifying code Tom Kramer
- Re: Languages for self-modifying code Fernando Mato Mira
  - Re: Languages for self-modifying code Tom Kramer
    - Re: Languages for self-modifying code Kelly Murray
- Re:languages for self-modifying code Robin Popplestone

From: Tom Kramer
Subject: Re: Languages for self-modifying code
Date: Mon, 06 Dec 1993 16:06:16 +0000
Message-ID: <23528@cosmos.cme.nist.gov>

H. Justin Coven recently inquired about ideas for self-modifying code,
particularly by using "apply" in LISP. The question was raised in the
context of systems that learn. What follows is (only) my opinion,
based on a lot of writing LISP programs that write other programs, but
only a little work on learning systems.

In the context of learning systems, "apply", I think, is handy for
simple situations, but awkward to use in complex situations.  It is
useful to think specifically about the best form in which to use the
modified code, and there are usually many options, including using the
modified code in a different process from that which generated it.
This gets us quickly to thinking generally about how to deal with
programs generated by other programs.

The best form for programs generated by other programs depends upon
the circumstances in which the generated program is to be used. The
range of circumstances is large, from compilation to transient
generation and use. Deciding what the actual circumstances are may be
harder than it appears to be at first glance; there are apt to be
more choices than you originally think you have.

One issue is whether the generated program is to be in the same
language as the generator. If the generated program is in a different
language, it is usually necessary (especially if the generated program
is source code which is to be compiled) to write it to a file. It may
be useful to generate the entire program as a structure in active
memory before printing any of it, however. I have used LISP and C to
write RS274-D programs, for example, and find it useful to proceed
this way. I sometimes use LISP to write C, which I then compile, for a
second example.

A second issue is whether the generated program is to run in the same
process as the generator. In this case the generated program must be
in the same language as the generator or in a language (tcl, for
example) for which the generating language has an interpreter.

Three criteria for making a decision on how to implement when there is
a choice are programming difficulty, timing, and program persistence.

1. Difficulty - If generating the program is difficult, implement the
easiest way first. Reconsider other choices after you've got the easiest
method working. 

2. Timing - If the generated program is to be used many times or if it
is apt to take a long time run once, it may be a good idea to compile
it to minimize the time it takes to run it. Of course LISP is great in
this regard, since you can write a program, compile it, load it, and
run it - bang, bang, bang. On the other hand, if you want a small fast
program as output, generating in a lower level language (C, for
example) may be a good idea. I recently wrote a LISP program which
read data from a file and ran a long time (many days in some runs);
call it program A. I then then wrote a LISP program which read the
data used by program A and generated C-language structures; I wrote by
hand the C-language code which used the structures. The structures and
the hand-written code compiled into a program (call it B) which did
exactly the same thing as program A. Program B (even with all the data
compiled in) required about 1/20 of the memory used by program A, and
B ran about 40 times as fast as A.

3. Persistence - If the generated program is to be kept around for
repeated use, it is usually useful to write it to a file.

Tom Kramer
······@cme.nist.gov

From: Fernando Mato Mira
Subject: Re: Languages for self-modifying code
Date: Mon, 06 Dec 1993 17:59:20 +0000
Message-ID: <2dvrto$7h9@disuns2.epfl.ch>

In article <·····@cosmos.cme.nist.gov>, ······@cme.nist.gov (Tom Kramer) writes:

> exactly the same thing as program A. Program B (even with all the data
> compiled in) required about 1/20 of the memory used by program A, and
> B ran about 40 times as fast as A.

How much time did it take to make this port? Using a tree-shaker should
save a lot of memory in a snap. Optimizing to run "only" 10 times slower 
should be easy, but getting down to 2X might be a lot trickier, but worthwile
if the program is big and sophisticated.

-- 
Fernando D. Mato Mira                           
Computer Graphics Lab                           
Swiss Federal Institute of Technology (EPFL)    Phone    : +41 (21) 693 - 5248
CH-1015 Lausanne                                FAX      : +41 (21) 693 - 5328
Switzerland                                     E-mail   : ········@epfl.ch

"

From: Tom Kramer
Subject: Re: Languages for self-modifying code
Date: Mon, 06 Dec 1993 20:12:57 +0000
Message-ID: <23533@cosmos.cme.nist.gov>

In article <··········@disuns2.epfl.ch>, ········@di.epfl.ch (Fernando Mato Mira) writes:

I posted an article which included the following: 

 exactly the same thing as program A. Program B (even with all the data
 compiled in) required about 1/20 of the memory used by program A, and
 B ran about 40 times as fast as A.

Program A is in LISP. Program B is in C.

Fernando D. Mato Mira asks:

> How much time did it take to make this port? Using a tree-shaker should
> save a lot of memory in a snap. Optimizing to run "only" 10 times slower 
> should be easy, but getting down to 2X might be a lot trickier, but worthwile
> if the program is big and sophisticated.

This was a small (less than 2 pages of code in both languages,
excluding data) but sophisticated search program. The LISP I am using
starts up with about 9 Mb of memory and grows to about 12 Mb when the
program is running and the data has been read. The C program takes
about 0.5 Mb and does not grow. The port took less than a day. I did
not try to optimize the LISP other than compiling for maximum speed
and minimum safety. I would guess that a getting the speedup Fernando
Mira suggests is feasible. I do not see an easy way to get the size of
the LISP down other than getting a size minimizer from the vendor, and
the vendor claims at most 50% size reduction by using it; that is
still 10 times the size of the C program.

From: Kelly Murray
Subject: Re: Languages for self-modifying code
Date: Tue, 07 Dec 1993 19:44:41 +0000
Message-ID: <2e2mf9$300@snoopy.cis.ufl.edu>

In article <·····@cosmos.cme.nist.gov>, ······@cme.nist.gov (Tom Kramer) writes:
|> In article <··········@disuns2.epfl.ch>, ········@di.epfl.ch (Fernando Mato Mira) writes:
|> 
|> I posted an article which included the following: 
|>  
|>  exactly the same thing as program A. Program B (even with all the data
|>  compiled in) required about 1/20 of the memory used by program A, and
|>  B ran about 40 times as fast as A.
|> 
|> Program A is in LISP. Program B is in C.
|> 
|> Fernando D. Mato Mira asks:
|> 
|> > How much time did it take to make this port? Using a tree-shaker should
|> > save a lot of memory in a snap. Optimizing to run "only" 10 times slower 
|> > should be easy, but getting down to 2X might be a lot trickier, but worthwile
|> > if the program is big and sophisticated.
|> 
|> This was a small (less than 2 pages of code in both languages,
|> excluding data) but sophisticated search program. The LISP I am using
|> starts up with about 9 Mb of memory and grows to about 12 Mb when the
|> program is running and the data has been read. The C program takes
|> about 0.5 Mb and does not grow. The port took less than a day. I did
|> not try to optimize the LISP other than compiling for maximum speed
|> and minimum safety. I would guess that a getting the speedup Fernando
|> Mira suggests is feasible. I do not see an easy way to get the size of
|> the LISP down other than getting a size minimizer from the vendor, and
|> the vendor claims at most 50% size reduction by using it; that is
|> still 10 times the size of the C program.
|> 

Clearly, any program that is only 2 pages of code is not the type of program
suited for Common Lisp if speed is very important.  
It's like driving your car to your next door neighbor's house --
you'll get there faster and cheaper if you just walk (or run if you're in a hurry).
but if you need to go to the grocery store, I'd keep your car.

-- 
-- Kelly Murray  (···@prl.ufl.edu) 
University of Florida Parallel Research Lab  :: 96-node KSR1, 64-node nCUBE
-------------------------------------------------------------------------

From: Robin Popplestone
Subject: Re:languages for self-modifying code
Date: Tue, 07 Dec 1993 18:45:33 +0000
Message-ID: <CHoHFy.21s@dcs.glasgow.ac.uk>

> A second issue is whether the generated program is to run in the same
> process as the generator. In this case the generated program must be
> in the same language as the generator or in a language (tcl, for
> example) for which the generating language has an interpreter.

Or -compiler- in the case of POPLOG which generates native code for a range of
languages.  It will also support linking, relinking and loading of external
functions into itself. A simple example is:

/*
cfn is a POP-11 procedure which takes as arguments -f-, the name of an
function in the C language, and -string-, its definition. It compiles
the C code, links it into POPLOG and defines a POP-11 procedure
with the same name which calls the external code.
*/


define cfn(f,string);
  lvars string,i,
    rep = discout('temp.c'),              ;;; Make output file
    ;
    for i from 1 to datalength(string) do ;;; Copy string to file
      rep(string(i));
    endfor;
    rep(termin);                          ;;; Close file
    sysobey('gcc -c temp.c');             ;;; Call GNU-C compiler
    lvars Exptr = "Exptr_"<>f;            ;;; Name of pointer to external code
    popval([exload  ^f ['temp.o']         ;;; Link in the compiled code
             (language C)
             ^Exptr(1):int <- ^f;         ;;; With -f- code in Exptr
            endexload
           ]);
    popval([define ^f(x); exacc ^Exptr(x); ;;; Define POP-11 procedure
            enddefine]);

enddefine;

;;; Define a C-function "twice", compile it and incrementally link it to
;;; POPLOG

cfn("twice", 'int twice(int x) {return x*2;}');

;;; twice is now a POP-11 procedure which uses the compiled C - call it and
;;; print out the answer

twice(23)=>

** 46