8051 assembler in Common Lisp

From: Greg Menke
Subject: 8051 assembler in Common Lisp
Date: Tue, 26 Jul 2005 22:38:02 +0000
Message-ID: <m3fyu1gpmd.fsf@athena.pienet>

For a while I've been occasionally working on an assembler that takes a
stream of sexp-formatted 8051 instructions which compiles and links them
for execution.  The compiler works by setting up a temporary package
where the compile-time user symbols are interned, and have their values
set to the respective linked addresses, so arithmetic and address
references as seen below work.  The registers and instruction symbols
are imported from another package and use a type heirarchy and generic
methods to select appropriate instruction codings- the coolness of
Common Lisp is really apparent here.

I have two questions;

- use of eval.  I eval each top-level sexp, and later each instruction
  within them, my theory being I want to give the user full use of macro
  facilities and all language features to put together the instructions
  the compiler will process to generate the machine code (ie; I also
  import :cl-user so the user can do whatever they like macro-wise).  Is
  this the kind of place where eval is desirable- or should I be pursing
  a macroexpand and apply approach?

- use of symbols as labels.  In "tst-code" below, I have a symbol
  'supercat' which I'm using as an internal label to capture addresses
  for use by other instructions.  In the final compile pass, I intern
  each label and set its value to the linked address so references to
  the label will work like any other user-supplied symbol.  Since these
  labels are only defined within a top-level sexp and are unavailable to
  other sexps', I intern them before processing the top-level sexp and
  unintern them afterwards, again using eval so I form the calls
  properly.  This seems really clumsy, though it works.  Is there a more
  elegant method?

Thanks,

Gregm



Example code follows;


(defproject testproj
    (:text-base  #x100	
     :data-base  nil
     :text-align 16
     :data-align 8))

(defdata tst-data2 (:org #x20)
  (dw 'tst-code))

(defdata tst-data ()
  (dw 0 1 2 3 'tst-code 'supercat2)
  (db 4 5 6 7)
  (db "hello, world")
  supercat2
  (filldata 100))

(deftext tst-code ()
  (nop)
  (nop)
  supercat
  (addc  a 1)
  (anl   acc 15)
  ;;(ajmp  'supercat2)
  (acall 'supercat)
  (acall 'tst-data)
  (acall '(+ tst-code 2)))

Re: 8051 assembler in Common Lisp Peter Seibel
Re: 8051 assembler in Common Lisp Petter Gustad
- Re: 8051 assembler in Common Lisp Greg Menke
- Re: 8051 assembler in Common Lisp Andreas Hinze
Re: 8051 assembler in Common Lisp Jeff M.
- Re: 8051 assembler in Common Lisp Greg Menke
  - Re: 8051 assembler in Common Lisp Jeff M.
    - Re: 8051 assembler in Common Lisp Greg Menke
      - Re: 8051 assembler in Common Lisp Peter Seibel
        Re: 8051 assembler in Common Lisp Greg Menke
        Re: 8051 assembler in Common Lisp Peter Seibel
        Re: 8051 assembler in Common Lisp Greg Menke
        Re: 8051 assembler in Common Lisp Peter Seibel
        Re: 8051 assembler in Common Lisp Peter Seibel
        Re: 8051 assembler in Common Lisp Greg Menke
        Re: 8051 assembler in Common Lisp Greg Menke
        Re: 8051 assembler in Common Lisp Peter Seibel
        Re: 8051 assembler in Common Lisp Greg Menke
    - OT: Re: 8051 assembler Cameron MacKinnon
      - Re: OT: Re: 8051 assembler Jeff M.
- Re: 8051 assembler in Common Lisp Julian Squires
Re: 8051 assembler in Common Lisp Tim Wilson
- Re: 8051 assembler in Common Lisp ·······@gmail.com
  - Re: 8051 assembler in Common Lisp Tim Wilson
Re: 8051 assembler in Common Lisp Frode Vatvedt Fjeld

From: Peter Seibel
Subject: Re: 8051 assembler in Common Lisp
Date: Tue, 26 Jul 2005 23:11:58 +0000
Message-ID: <m2iryxcgch.fsf@gigamonkeys.com>

Greg Menke <············@toadmail.com> writes:

> For a while I've been occasionally working on an assembler that takes a
> stream of sexp-formatted 8051 instructions which compiles and links them
> for execution.  The compiler works by setting up a temporary package
> where the compile-time user symbols are interned, and have their values
> set to the respective linked addresses, so arithmetic and address
> references as seen below work.  The registers and instruction symbols
> are imported from another package and use a type heirarchy and generic
> methods to select appropriate instruction codings- the coolness of
> Common Lisp is really apparent here.
>
> I have two questions;
>
> - use of eval.  I eval each top-level sexp, and later each instruction
>   within them, my theory being I want to give the user full use of macro
>   facilities and all language features to put together the instructions
>   the compiler will process to generate the machine code (ie; I also
>   import :cl-user so the user can do whatever they like macro-wise).  Is
>   this the kind of place where eval is desirable- or should I be pursing
>   a macroexpand and apply approach?

Without thinking too hard about your specific case, my guess is that
the normal rule to avoid EVAL still applies. You might want to look at
Chapters 30 and 31 from my book:

  <http://www.gigamonkeys.com/book/practical-an-html-generation-library-the-interpreter.html>
  <http://www.gigamonkeys.com/book/practical-an-html-generation-library-the-compiler.html>

which show how to build an interpreter and compiler for a language
complete with macros and special ops.

> - use of symbols as labels.  In "tst-code" below, I have a symbol
>   'supercat' which I'm using as an internal label to capture addresses
>   for use by other instructions.  In the final compile pass, I intern
>   each label and set its value to the linked address so references to
>   the label will work like any other user-supplied symbol.  Since these
>   labels are only defined within a top-level sexp and are unavailable to
>   other sexps', I intern them before processing the top-level sexp and
>   unintern them afterwards, again using eval so I form the calls
>   properly.  This seems really clumsy, though it works.  Is there a more
>   elegant method?

Probably you just want to maintain your own symbol table (i.e. a
hashtable with symbols as keys and whatever as values) or possibly
hang some properties off the symbols while you are compiling. You may
end up storing the symbol table in a dynamic variable for easy access
during compilation.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Petter Gustad
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 02:25:48 +0000
Message-ID: <87br4pt26r.fsf@parish.home.gustad.com>

Greg Menke <············@toadmail.com> writes:

> For a while I've been occasionally working on an assembler that

I've did something similar for an ASIC that I've been working on. The
problem is that the instruction set of the microcode was constantly
changed during the development cycle. Earlier this was some lex/yacc
stuff which had to be updated whenever a new instruction was added,
removed, or modified.

This made me write a microcode assembler *generator*. It will generate
an assembler on the fly and run it. It's only 300 lines long and took
only a couple days to write and was a lot of fun.

The specs for the opcodes are written in Verilog (a hardware
description language) which is used to generate the hardware that will
run the microcode. It looks like this:

`define mcasm_jump_opcode_value   'h140
`define mcasm_jump_opcode_range   25:17
`define mcasm_jump_arg_addr        9:00

E.g. the jump opcode is 140 (hex) and located in bits 25 to 17. It has
one argument called addr which is located in bit 9 to 0. The assembler
code for the jump instruction will look like this:

...
(jump :addr 'down)          ; jump forward
....
(label 'down)               ; somewhere
...

There are only two hardcoded mnemonics in my assembler generator: org
and label. You get all the great features like macros, functions,
expressions, etc from Common Lisp for free in the assembler.

I ended up using eval as well. I keep the symbol table for the labels
and numerical constants as regular Lisp symbols.

Petter

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

From: Greg Menke
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 03:32:50 +0000
Message-ID: <m3r7dk7wkd.fsf@athena.pienet>

Petter Gustad <·············@gustad.com> writes:

> Greg Menke <············@toadmail.com> writes:
> 
> > For a while I've been occasionally working on an assembler that
> 

<snip>

> 
> I ended up using eval as well. I keep the symbol table for the labels
> and numerical constants as regular Lisp symbols.
> 

I think the problem with eval in this application is it runs in a null
lexical environment.  Which I think isn't a problem if all the symbols
are specials- which is what I do as well, but would be problematic if
the assembly language provides let-like constructs.

I'm starting to think my laborious looping over lists of sexps might be
making the problem over-difficult.

I've been toying with an idea where the compilation of a top-level sexp
is performed by accumulating the constituent sexp's into a single Lisp
form, which when eval'ed as a unit, emits the intermediate object code
by side effect.  Since only the 8051 instructions have the side effect,
the output object code is trivially distinct from everything else
produced by the form's execution.  Then, eval's null lexical environment
doesn't get in the way and no symbol gymnastics are required because the
normal REPL infrastructure is handling it all.  This is still sort of
vague at the moment- for example, I'm not sure how to usefully
distinguish 8051 instructions from ambient compilation "meta-Lisp"- or
even if making such a distinction is useful.

The size of the compiler right now is about 1200 lines, about half are
the generic methods which implement the instruction variations & the
register/bit definitions.

Regards,

Gregm

From: Andreas Hinze
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 10:23:51 +0000
Message-ID: <3kp5lnFv8d5mU1@uni-berlin.de>

Petter Gustad wrote:
[snip]
> 
> This made me write a microcode assembler *generator*. It will generate
> an assembler on the fly and run it. It's only 300 lines long and took
> only a couple days to write and was a lot of fun.
> 
Is the code for the generator available (just as a case study) ?

Kind regards
AHz

From: Jeff M.
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 16:47:01 +0000
Message-ID: <1122482820.977841.103140@g47g2000cwa.googlegroups.com>

Greg,

I've done something extremely similar for the ARM7TDMI. I'd have to dig
it up, but it might be of some use to you. Simply put, the assmebler
package contained a hash table of all the instruction mnemonics. A
macro 'define-instruction' created a mnemonic, handled the operands,
and returned one of two values: the opcode or the number of bytes the
opcode required. Which value was returned was based on whether or not
*pass* was set to 0 or 1.

Likewise, there was a label hash table that was reset at the start of
the 'asm' macro. The *labels* hash held the offset to the symbols from
the start of the macro. The 'asm' macro (in essence) boiled down to:

(defmacro asm (&body source)
  (let ((*labels* (make-hash-table)))
    (let ((*pass* 0))
      (gather-labels))
    (let ((*pass* 1))
      (assemble-instructions))))

Of course, my *labels* hash table is local to the asm body that it is
contained in. I think you would want a more global solution. Still,
though, I think using a hash table is better than interning the
symbols. Also, I used keywords instead for my labels, which made the
code a bit easier to read and did have the problem of symbols being in
different packages.

Jeff M.

From: Greg Menke
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 18:13:26 +0000
Message-ID: <m3u0igqfqx.fsf@athena.pienet>

"Jeff M." <·······@gmail.com> writes:

> Greg,
> 
> I've done something extremely similar for the ARM7TDMI. I'd have to dig
> it up, but it might be of some use to you. Simply put, the assmebler
> package contained a hash table of all the instruction mnemonics. A
> macro 'define-instruction' created a mnemonic, handled the operands,
> and returned one of two values: the opcode or the number of bytes the
> opcode required. Which value was returned was based on whether or not
> *pass* was set to 0 or 1.
> 
> Likewise, there was a label hash table that was reset at the start of
> the 'asm' macro. The *labels* hash held the offset to the symbols from
> the start of the macro. The 'asm' macro (in essence) boiled down to:
> 
> (defmacro asm (&body source)
>   (let ((*labels* (make-hash-table)))
>     (let ((*pass* 0))
>       (gather-labels))
>     (let ((*pass* 1))
>       (assemble-instructions))))
> 
> Of course, my *labels* hash table is local to the asm body that it is
> contained in. I think you would want a more global solution. Still,
> though, I think using a hash table is better than interning the
> symbols. Also, I used keywords instead for my labels, which made the
> code a bit easier to read and did have the problem of symbols being in
> different packages.
> 
> Jeff M.

Could you give an example of the assembly syntax showing how labels are
expressed & referenced- and particularly what sort of manipulation of
them is possible?

The nice thing about interning the labels & setting their values to
their respective addresses is I can let eval handle all label (meaning
symbol) related arithmetic, rather than recursing down into the assembly
tree and matching symbols to a hash table.  OTOH the sucky thing about
interning the labels is doing the intern/unintern'ing...

Gregm

From: Jeff M.
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 20:01:06 +0000
Message-ID: <1122491699.744430.28070@g49g2000cwa.googlegroups.com>

Here's a sample assembly example (taken from the GameBoy Advance):

(define-gba-function set-graphics-mode (mode)
  "Note: the argument does nothing except act as a comment."
  (asm
    (mov r2 4)
    (cmp r0 3)
    (blt :tiled-mode)
    (lsl r1 r2 8)
    (add r0 r1)
  :tiled-mode
    (lsl r2 24)
    (strh r0 r2)
    (bx lr)))

As you can see, I was using labels as just jump points. I never used
them as addresses to actual data. I would handle that another way; the
assembled data was consistently being written to an internal bank
(read: array of bytes or output stream).

Given the above example, 'set-graphics-mode' would have been interned
with the address where it appeared in the bank file. This could then be
used like any other symbol in the 'gba' package. I had other, similar
functions to define other values:

define-gba-pointer
define-gba-variable
define-gba-array

Each of these would be used in different ways internally, but
[generally speaking] all just interned an integer address into the
bank. Hope this helps.

Jeff M.

From: Greg Menke
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 22:16:56 +0000
Message-ID: <m3ack7c2sn.fsf@athena.pienet>

"Jeff M." <·······@gmail.com> writes:

> Here's a sample assembly example (taken from the GameBoy Advance):
> 
> (define-gba-function set-graphics-mode (mode)
>   "Note: the argument does nothing except act as a comment."
>   (asm
>     (mov r2 4)
>     (cmp r0 3)
>     (blt :tiled-mode)
>     (lsl r1 r2 8)
>     (add r0 r1)
>   :tiled-mode
>     (lsl r2 24)
>     (strh r0 r2)
>     (bx lr)))
> 
> As you can see, I was using labels as just jump points. I never used
> them as addresses to actual data. I would handle that another way; the
> assembled data was consistently being written to an internal bank
> (read: array of bytes or output stream).
> 
> Given the above example, 'set-graphics-mode' would have been interned
> with the address where it appeared in the bank file. This could then be
> used like any other symbol in the 'gba' package. I had other, similar
> functions to define other values:
> 
> define-gba-pointer
> define-gba-variable
> define-gba-array
> 
> Each of these would be used in different ways internally, but
> [generally speaking] all just interned an integer address into the
> bank. Hope this helps.
> 
> Jeff M.


Yes it does, thanks!  It looks like syntax is converging towards yours.
By interning the labels, I can do arithmetic on them;

(blt '(* :tiled-mode 2))

for example.

Gregm

From: Peter Seibel
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 22:42:05 +0000
Message-ID: <m23bpzc1mp.fsf@gigamonkeys.com>

Greg Menke <············@toadmail.com> writes:

> "Jeff M." <·······@gmail.com> writes:
>
>> Here's a sample assembly example (taken from the GameBoy Advance):
>> 
>> (define-gba-function set-graphics-mode (mode)
>>   "Note: the argument does nothing except act as a comment."
>>   (asm
>>     (mov r2 4)
>>     (cmp r0 3)
>>     (blt :tiled-mode)
>>     (lsl r1 r2 8)
>>     (add r0 r1)
>>   :tiled-mode
>>     (lsl r2 24)
>>     (strh r0 r2)
>>     (bx lr)))
>> 
>> As you can see, I was using labels as just jump points. I never used
>> them as addresses to actual data. I would handle that another way; the
>> assembled data was consistently being written to an internal bank
>> (read: array of bytes or output stream).
>> 
>> Given the above example, 'set-graphics-mode' would have been interned
>> with the address where it appeared in the bank file. This could then be
>> used like any other symbol in the 'gba' package. I had other, similar
>> functions to define other values:
>> 
>> define-gba-pointer
>> define-gba-variable
>> define-gba-array
>> 
>> Each of these would be used in different ways internally, but
>> [generally speaking] all just interned an integer address into the
>> bank. Hope this helps.
>> 
>> Jeff M.
>
>
> Yes it does, thanks!  It looks like syntax is converging towards yours.
> By interning the labels, I can do arithmetic on them;
>
> (blt '(* :tiled-mode 2))
>
> for example.

So that has nothing to do with interning the symbol. Interning is a
READ-TIME function. Assuming you've got the symbol you can use (setf
(symbol-value sym) whatever) and (eval sym) if you must. I'm still not
convinced that's a great idea but interning has (or should have)
nothing to do with it. To understand that, it might help to make sure
you understand how the following works:

  CL-USER> (defparameter *some-code* (list '#:foo '#:foo))
  *SOME-CODE*
  CL-USER> *some-code*
  (#:FOO #:FOO)
  CL-USER> (setf (symbol-value (first *some-code*)) 1)
  1
  CL-USER> (setf (symbol-value (second *some-code*)) 10)
  10
  CL-USER> (eval (list* '+ *some-code*))
  11

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Greg Menke
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 00:52:21 +0000
Message-ID: <m3ll3r7nwa.fsf@athena.pienet>

Peter Seibel <·····@gigamonkeys.com> writes:

> Greg Menke <············@toadmail.com> writes:
> 
> >
> > Yes it does, thanks!  It looks like syntax is converging towards yours.
> > By interning the labels, I can do arithmetic on them;
> >
> > (blt '(* :tiled-mode 2))
> >
> > for example.
> 
> So that has nothing to do with interning the symbol. Interning is a
> READ-TIME function. Assuming you've got the symbol you can use (setf
> (symbol-value sym) whatever) and (eval sym) if you must. I'm still not
> convinced that's a great idea but interning has (or should have)
> nothing to do with it. To understand that, it might help to make sure
> you understand how the following works:

If you say so.  

But I intern top-level symbols at read time but do the labels in the
final link pass once all addresses are known.  I add an item to each
top-level symbol's property list.  The symbol-value is set to the linked
address for all symbols so the address arithmetic will work.

Gregm

From: Peter Seibel
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 01:07:08 +0000
Message-ID: <m2u0ifagcj.fsf@gigamonkeys.com>

Greg Menke <············@toadmail.com> writes:

> Peter Seibel <·····@gigamonkeys.com> writes:
>
>> Greg Menke <············@toadmail.com> writes:
>> 
>> >
>> > Yes it does, thanks!  It looks like syntax is converging towards yours.
>> > By interning the labels, I can do arithmetic on them;
>> >
>> > (blt '(* :tiled-mode 2))
>> >
>> > for example.
>> 
>> So that has nothing to do with interning the symbol. Interning is a
>> READ-TIME function. Assuming you've got the symbol you can use (setf
>> (symbol-value sym) whatever) and (eval sym) if you must. I'm still not
>> convinced that's a great idea but interning has (or should have)
>> nothing to do with it. To understand that, it might help to make sure
>> you understand how the following works:
>
> If you say so.  
>
> But I intern top-level symbols at read time but do the labels in the
> final link pass once all addresses are known.  I add an item to each
> top-level symbol's property list.  The symbol-value is set to the linked
> address for all symbols so the address arithmetic will work.

Can you show us the code that "intern[s] top-level symbols at read
time"? Unless you wrote your own reader, the reader is already
interning the symbols which makes my wonder what you're really
doing. Also, a symbol doesn't need to be interned to have it's
symbol-value set. So, basically, what you're saying doesn't make any
sense. That may be because you're talking about something sensical in
a slightly incorrect way or because what you're doing actually doesn't
make complete sense.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Greg Menke
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 02:00:14 +0000
Message-ID: <m3fytz7kr5.fsf@athena.pienet>

Peter Seibel <·····@gigamonkeys.com> writes:


> Greg Menke <············@toadmail.com> writes:
> 
> > But I intern top-level symbols at read time but do the labels in the
> > final link pass once all addresses are known.  I add an item to each
> > top-level symbol's property list.  The symbol-value is set to the linked
> > address for all symbols so the address arithmetic will work.
> 
> Can you show us the code that "intern[s] top-level symbols at read
> time"? Unless you wrote your own reader, the reader is already
> interning the symbols which makes my wonder what you're really
> doing. Also, a symbol doesn't need to be interned to have it's
> symbol-value set. So, basically, what you're saying doesn't make any
> sense. That may be because you're talking about something sensical in
> a slightly incorrect way or because what you're doing actually doesn't
> make complete sense.

Sure.  Once the assember is working "industrially" enough to compile
some projects so I can test it on real hardware, I'll post the full
code.  Right now its changing quickly and I still need to generate .lis
files and symbol crossreferences.

Regards,

Gregm


Some terminology;

sym      is always bound to a top-level symbol

syminst  is the compiler state data struct recorded in each top-level
         symbol's property list


(sym-labels) accessor that gets/sets the list of labels contained in a
              top-level symbol

(sym-data)    accessor that gets/sets the non-evaled list of sexps forming the code
              in a top-level symbol

(sym-compiled)  accessor that gets/sets the list of eval'ed sexps of a
                top-level symbol.

(sym-linked)    accessor for the output of the compiler; a sequence of
                bytes comprising the compiled and linked code



These macros handle creating and interning the top-level symbols at read time;


(defmacro with-sym-setup ((name opts symtype rest) &body body)
  `(let ((sym      (intern (string ',name)))
         (syminst  (make-instance ,symtype )))

     (proclaim '(special ,name))

     ;; symbol-value is the symbol's linked address
     (setf (symbol-value sym)     nil)

     ;; add syminst to the symbol's prop list
     (setf (get sym     'syminst)  syminst)

     ;; and init the syminst fields
     (setf (sym-name     syminst)  (symbol-name sym))
     (setf (sym-address  syminst)  nil)

     (setf (sym-org      syminst)  (getf ',opts :org  nil))
     (setf (sym-align    syminst)  (getf ',opts :align  nil))

     (setf (sym-data     syminst)  ',rest)

     (setf asm51::*cursymbol* syminst)

     ,@body ))


(defmacro deftext (name opts &rest rest)
  `(asm51::with-sym-setup (,name ,opts '_textsym ,rest)
     (push sym asm51::*textsyms*)))

(defmacro defdata (name opts &rest rest)
  `(asm51::with-sym-setup (,name ,opts '_datasym ,rest)
     (push sym asm51::*datasyms*)))

(defmacro defbss (name opts &rest rest)
  `(asm51::with-sym-setup (,name ,opts '_bsssym ,rest)
     (push sym asm51::*bsssyms*)))


via this loop in pass #1 (read pass)

            (loop for e = (read str nil nil)
                  while e
                  do
                  (eval e))


In pass #2 (top-level symbol sexp-by-sexp code eval), I identify and
record each label symbol but do not intern it;

                  (loop for se in (sym-data syminst)
                        for e = nil
                        with rv = nil
                        do
                        ;;
                        ;; evaluate the sexp, accumulate non-nil results in rv
                        ;;
                        (cond ((symbolp se)
                               ;; se is symbol, make a label out of it
                               (setf e (make-label se))
                               ;; save it in the labels list
                               (push e (sym-labels syminst)) )

                              (t 
                               ;; not a symbol, eval it
                               (setf e (eval se)) ) )








Now later on in pass #5 (link pass), I intern & set the value of the
label sybols within each top-level symbol.  I do it here so the same
labels can be used in different top-level symbols, but are only defined
within the top-level symbol- sort of a "local" label.


                  ;; define all the labels in this symbol so the code-gen eval
                  ;; can be done
                  (loop for e in (sym-labels syminst)
                        for name = (lab-name e)
                        do
                        (eval `(progn 
                                 (intern (string ',name))
                                 (proclaim '(special ,name))
                                 (setf ,name (lab-address ,e)))) )


and after linking all the code in the top-level symbol, I un-intern the labels;


                  ;; symbol is compiled, release all its labels
                  ;;
                  (loop for e in (sym-labels syminst)
                        for name = (lab-name e)
                        do
                        (eval `(unintern ',name)) ) )

From: Peter Seibel
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 05:15:52 +0000
Message-ID: <m2iryva4u1.fsf@gigamonkeys.com>

Greg Menke <············@toadmail.com> writes:

> Peter Seibel <·····@gigamonkeys.com> writes:
>
>
>> Greg Menke <············@toadmail.com> writes:
>> 
>> > But I intern top-level symbols at read time but do the labels in the
>> > final link pass once all addresses are known.  I add an item to each
>> > top-level symbol's property list.  The symbol-value is set to the linked
>> > address for all symbols so the address arithmetic will work.
>> 
>> Can you show us the code that "intern[s] top-level symbols at read
>> time"? Unless you wrote your own reader, the reader is already
>> interning the symbols which makes my wonder what you're really
>> doing. Also, a symbol doesn't need to be interned to have it's
>> symbol-value set. So, basically, what you're saying doesn't make
>> any sense. That may be because you're talking about something
>> sensical in a slightly incorrect way or because what you're doing
>> actually doesn't make complete sense.
>
> Sure.  Once the assember is working "industrially" enough to compile
> some projects so I can test it on real hardware, I'll post the full
> code.  Right now its changing quickly and I still need to generate
> .lis files and symbol crossreferences.

Okay. Meanwhile, if you're interested I have some comments on what you
show here. See below.

> Some terminology;
>
> sym      is always bound to a top-level symbol
>
> syminst is the compiler state data struct recorded in each top-level
> symbol's property list
>
> (sym-labels) accessor that gets/sets the list of labels contained in
> a top-level symbol
>
> (sym-data) accessor that gets/sets the non-evaled list of sexps
> forming the code in a top-level symbol
>
> (sym-compiled) accessor that gets/sets the list of eval'ed sexps of
> a top-level symbol.
>
> (sym-linked) accessor for the output of the compiler; a sequence of
> bytes comprising the compiled and linked code
>
>
>
> These macros handle creating and interning the top-level symbols at
> read time;

This sentence doesn't really make sense. Macros operate *after* read
time. So it doesn't really make sense to talk about macros doing stuff
at read time.

> (defmacro with-sym-setup ((name opts symtype rest) &body body)
>   `(let ((sym      (intern (string ',name)))

The previous line is likely not having any effect at all. Note that:

  (eql (intern (string symbol)) symbol) ==> T

as long as (eql (symbol-package symbol) *package*). In other words, if
*package* hasn't changed value between the time the whose value is in
NAME was read and the macro-expansion of WITH-SYM-SETUP happens,
(intern (string ',name)) is just going to return the value of
NAME. You might as well have said:

  (let ((sym name)) ...)

or for that matter:

  (defmacro with-sym-setup ((sym opts symtype rest) &body body) ...)

>          (syminst  (make-instance ,symtype )))
>
>      (proclaim '(special ,name))
>
>      ;; symbol-value is the symbol's linked address
>      (setf (symbol-value sym)     nil)

The previous two lines seem hinky to me. For one thing the
proclemation has a global effect so is not very friendly. And it's
hard to imagine that it's actually necessary. (Though it may be
necessary given the way the rest of the current code works.)

>      ;; add syminst to the symbol's prop list
>      (setf (get sym     'syminst)  syminst)

This seems fine. It's appropriate for a compiler to hang information
about names off the name's plist.

>      ;; and init the syminst fields
>      (setf (sym-name     syminst)  (symbol-name sym))
>      (setf (sym-address  syminst)  nil)
>
>      (setf (sym-org      syminst)  (getf ',opts :org  nil))
>      (setf (sym-align    syminst)  (getf ',opts :align  nil))
>
>      (setf (sym-data     syminst)  ',rest)

This is all fine though it might be more obvious what's going on if
you'd define :initargs for these slots in the classes you may be
instantiating. Then you could write this:

  `(let ((syminst (make-instance ,symtype
				 :name (symbol-name sym)
				 :address nil
				 :org (getf ',opts :org nil)
				 :align (getf ',opts :align nil)
				 :data ',rest)))
     ..)

and get rid of the SETFs.

>      (setf asm51::*cursymbol* syminst)

I'm wondering why you're seting this rather than binding it.

>      ,@body ))
>
> (defmacro deftext (name opts &rest rest)
>   `(asm51::with-sym-setup (,name ,opts '_textsym ,rest)
>      (push sym asm51::*textsyms*)))

I wonder why you're package-qualifying the symbols with-sym-setup and
*textsyms*. Aren't these macros defined in a file with an (in-package
:asm51) form? (I point this out because it's a may be a symptom of
another flavor of confusion about the relation between read time and
macroexpand time and how they both relate to packages.)

> (defmacro defdata (name opts &rest rest)
>   `(asm51::with-sym-setup (,name ,opts '_datasym ,rest)
>      (push sym asm51::*datasyms*)))
>
> (defmacro defbss (name opts &rest rest)
>   `(asm51::with-sym-setup (,name ,opts '_bsssym ,rest)
>      (push sym asm51::*bsssyms*)))
>
>
> via this loop in pass #1 (read pass)
>
>             (loop for e = (read str nil nil)
>                   while e
>                   do
>                   (eval e))

I'm confused what the relationship between this loop and the macros
shown above is? Are you using this loop to read a series of forms from
a file where the top-level-forms read are likely to be DEFTEXT,
DEFDATA, and DEFBSS forms? If so, why don't you just use LOAD to load
the file? Then you could also use COMPILE-FILE to compile the files
down to something that already has the macros expanded, etc. and will
LOAD faster. (BTW, this use of EVAL, if I've understood things
correctly, is fine since it's the level of evaluation the LOAD would
normally do for you.)

> In pass #2 (top-level symbol sexp-by-sexp code eval), I identify and
> record each label symbol but do not intern it;

So the interning is still a red-herring. What do you think interning
does?

>                   (loop for se in (sym-data syminst)
>                         for e = nil
>                         with rv = nil
>                         do
>                         ;;
>                         ;; evaluate the sexp, accumulate non-nil results in rv
>                         ;;
>                         (cond ((symbolp se)
>                                ;; se is symbol, make a label out of it
>                                (setf e (make-label se))
>                                ;; save it in the labels list
>                                (push e (sym-labels syminst)) )
>
>                               (t 
>                                ;; not a symbol, eval it
>                                (setf e (eval se)) ) )

This seems, if you'll pardon me being blunt, like pretty much a
canonical example of misuse of EVAL in a macro. The value of se was
some code that was in the body of, say, a DEFTEXT form. If you want
that code to be evaluated, you should arrange for it to be put into
the expansion of DEFTEXT so it will be evaluated. Of course in your
case that's not a simple change because you've, as far as I can tell,
turned the whole macro-expansion machinery inside out.

> Now later on in pass #5 (link pass), I intern & set the value of the
> label sybols within each top-level symbol.

Again, what do you think you're accomplishing by interning these
symbols. They're already interned. They've been interned ever since
they were read. And even if they weren't interned, you could still do
everything with them that you're doing anyway.

> I do it here so the same labels can be used in different top-level
> symbols, but are only defined within the top-level symbol- sort of a
> "local" label.
>
>
>                   ;; define all the labels in this symbol so the code-gen eval
>                   ;; can be done
>                   (loop for e in (sym-labels syminst)
>                         for name = (lab-name e)
>                         do
>                         (eval `(progn 
>                                  (intern (string ',name))
>                                  (proclaim '(special ,name))
>                                  (setf ,name (lab-address ,e)))) )
>
>
> and after linking all the code in the top-level symbol, I un-intern the labels;
>
>
>                   ;; symbol is compiled, release all its labels
>                   ;;
>                   (loop for e in (sym-labels syminst)
>                         for name = (lab-name e)
>                         do
>                         (eval `(unintern ',name)) ) )

This last EVAL is quite odd--why not just write (unintern name). But
better yet, don't bother. This whole interning/uninterning business is
not doing what you think it's doing. (Actually, thanks to the UNINTERN
it's sort of doing something like what you want--once you've
uninterned the symbol, you've gotten rid of the actual symbol that was
read by the reader so that when you compile the data linked to
subsequent top-level names, the interning will create new symbols
which will be independent of the old symbols of the same name. But
that also means that over the course of your compiler running you have
many different symbols with the same name. Which seems like a recipe
for madness. I'm still not sure how all the pieces of your system fit
together but I'm pretty convinced that there's a much simpler way to
skin this cat.

To provide some food for thought, here's a sketch of part of an
assembler that works somewhat like I think yours ought to. Basically I
define a macro DEFTEXT which can be used to define an association
between a name and a snippet of machine code. The idea (which I'm sort
of guessing at from the code you've shown us) is that such as snippet
is then combined with other snippets into a final assembly. This
sketch doesn't include the ability to include arbitrary Common Lisp
code within a DEFTEXT body but only because I'm not sure how you're
envisioning that being used. At anyrate, note how there's no interning
or uninterning of symbols and no calls to EVAL. After a DEFTEXT form
is evaluated you can use the function GETTEXT to get at the machine
code. For instance you can EVAL the following form (or load a file
containing the form or whatever):

  (deftext foo
    (nop)
    (goto label2)
    (nop)
    label1 (nop)
    (push 128)
    (push 255)
    (goto label1)
    label2 (nop))

then:

  ASM51> (gettext 'foo)
  #(0 1 11 0 0 2 128 2 255 1 4 0)

Hope this gives you some ideas.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Peter Seibel
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 05:26:50 +0000
Message-ID: <m2ack7a4bp.fsf@gigamonkeys.com>

Peter Seibel <·····@gigamonkeys.com> writes:

> To provide some food for thought, here's a sketch of part of an
> assembler ...

Whoops. *Here's* that sketch:

(defpackage :asm51 (:use :cl))

(in-package :asm51)

(defmacro deftext (name &body body)
  "Compile a body of ops and labels into a form that can be included in a larger assembly."
  `(eval-when (:compile-toplevel :load-toplevel :execute)
     (setf (get ',name 'compiled-text) ,(compile-body body))))

(defun gettext (name)
  (get name 'compiled-text))

(defun compile-body (body)
  (pass3 (pass2 (pass1 body))))

(defun pass1 (body)
  "Macroexpand non labels. Output of this stage is a list of
labels and ops which can contain references to the labels."
  (loop for expression in body
       when (symbolp expression) nconc (list expression)
       else nconc (asm-macroexpand expression)))

(defun asm-macroexpand (expression)
  "This is a no-op at the moment. But it would be straightforward
  to allow folks to define assembler macros that expand a form
  into a list of ops and labels, similar to the way the HTML
  macros in Chapter 31 of Practical Common Lisp work."
  (list expression))

(defun pass2 (body)
  "Compute the addresses of all the labels."
  (loop with labels = ()
     with offset = 0
     for thing in body
     when (symbolp thing) do (setf labels (acons thing offset labels))
     else do (incf offset (size-of thing)) and collect thing into ops
     finally (return (list labels ops))))

(defun pass3 (labels-and-ops)
  "Generate actual code"
  (destructuring-bind (labels ops) labels-and-ops
    (loop with code = (make-array 100 :element-type '(unsigned-byte 8) :adjustable t :fill-pointer 0)
       for op in ops do (generate-code op labels code)
       finally (return code))))

(defun size-of (thing)
  (destructuring-bind (op &rest operands) thing
      (size-of-op op operands)))

(defgeneric size-of-op (op operands))

(defmethod size-of-op ((op (eql 'goto)) operands)
  (declare (ignore operands))
  2)

(defmethod size-of-op ((op (eql 'nop)) operands)
  (declare (ignore operands))
  1)

(defmethod size-of-op ((op (eql 'push)) operands)
  (declare (ignore operands))
  2)

(defun generate-code (op labels buffer)
  (destructuring-bind (op &rest operands) op
    (generate-code-for-op op (resolve-labels operands labels) buffer)))

(defun resolve-labels (operands labels)
  (loop for x in operands
     when (symbolp x) collect (cdr (assoc x labels))
     else collect x))

(defgeneric generate-code-for-op (op operands bufffer)
  (:documentation "Generate the actual machine code for a given
  OP with the given OPERANDS into BUFFER. All labels have already
  been translated into absolute addresses. The fill-pointer can
  be used to abtain the current address if the output is supposed
  to be a relative address."))

(defmethod generate-code-for-op ((op (eql 'nop)) operands buffer)
  (declare (ignore operands))
  (vector-push-extend 0 buffer))

(defmethod generate-code-for-op ((op (eql 'goto)) operands buffer)
  (vector-push-extend 1 buffer)
  (vector-push-extend (first operands) buffer))

(defmethod generate-code-for-op ((op (eql 'push)) operands buffer)
  (vector-push-extend 2 buffer)
  (vector-push-extend (first operands) buffer))

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Greg Menke
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 12:18:44 +0000
Message-ID: <m3ack786or.fsf@athena.pienet>

Peter Seibel <·····@gigamonkeys.com> writes:

> Peter Seibel <·····@gigamonkeys.com> writes:
> 
> > To provide some food for thought, here's a sketch of part of an
> > assembler ...
> 
> Whoops. *Here's* that sketch:

Thanks, Peter.  I'm going to have a close look at this.

Gregm

From: Greg Menke
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 12:12:47 +0000
Message-ID: <m3fytz86yo.fsf@athena.pienet>

Peter Seibel <·····@gigamonkeys.com> writes:

> Greg Menke <············@toadmail.com> writes:
> 
> > Peter Seibel <·····@gigamonkeys.com> writes:
> >
> >
> Okay. Meanwhile, if you're interested I have some comments on what you
> show here. See below.

I am- thanks!  My responses are interspersed.

 
> > Some terminology;
> >
> > sym      is always bound to a top-level symbol
> >
> > syminst is the compiler state data struct recorded in each top-level
> > symbol's property list
> >
> > (sym-labels) accessor that gets/sets the list of labels contained in
> > a top-level symbol
> >
> > (sym-data) accessor that gets/sets the non-evaled list of sexps
> > forming the code in a top-level symbol
> >
> > (sym-compiled) accessor that gets/sets the list of eval'ed sexps of
> > a top-level symbol.
> >
> > (sym-linked) accessor for the output of the compiler; a sequence of
> > bytes comprising the compiled and linked code
> >
> >
> >
> > These macros handle creating and interning the top-level symbols at
> > read time;
> 
> This sentence doesn't really make sense. Macros operate *after* read
> time. So it doesn't really make sense to talk about macros doing stuff
> at read time.

OK, so in pass 1 I read each top-level sexp and eval it, which creates,
interns and initializes all top-level symbols.

 
> > (defmacro with-sym-setup ((name opts symtype rest) &body body)
> >   `(let ((sym      (intern (string ',name)))
> 
> The previous line is likely not having any effect at all. Note that:
> 
>   (eql (intern (string symbol)) symbol) ==> T
> 
> as long as (eql (symbol-package symbol) *package*). In other words, if
> *package* hasn't changed value between the time the whose value is in
> NAME was read and the macro-expansion of WITH-SYM-SETUP happens,
> (intern (string ',name)) is just going to return the value of
> NAME. You might as well have said:
> 
>   (let ((sym name)) ...)
> 
> or for that matter:
> 
>   (defmacro with-sym-setup ((sym opts symtype rest) &body body) ...)


Right- removing intern doesn't change behavior, thanks!  Make it
simpler.


> 
> >          (syminst  (make-instance ,symtype )))
> >
> >      (proclaim '(special ,name))
> >
> >      ;; symbol-value is the symbol's linked address
> >      (setf (symbol-value sym)     nil)
> 
> The previous two lines seem hinky to me. For one thing the
> proclemation has a global effect so is not very friendly. And it's
> hard to imagine that it's actually necessary. (Though it may be
> necessary given the way the rest of the current code works.)

Lispworks grumbles about special variables if the proclaim is missing.
The symbol-value setf is superfluous.  The global proclaim is
intentional as this is a symbol that I want available globally to all
the other assembly code.


 
> >      ;; and init the syminst fields
> >      (setf (sym-name     syminst)  (symbol-name sym))
> >      (setf (sym-address  syminst)  nil)
> >
> >      (setf (sym-org      syminst)  (getf ',opts :org  nil))
> >      (setf (sym-align    syminst)  (getf ',opts :align  nil))
> >
> >      (setf (sym-data     syminst)  ',rest)
> 
> This is all fine though it might be more obvious what's going on if
> you'd define :initargs for these slots in the classes you may be
> instantiating. Then you could write this:
> 
>   `(let ((syminst (make-instance ,symtype
> 				 :name (symbol-name sym)
> 				 :address nil
> 				 :org (getf ',opts :org nil)
> 				 :align (getf ',opts :align nil)
> 				 :data ',rest)))
>      ..)
>
> and get rid of the SETFs.


Noted, however the setfs have come and gone and I've not gotten around
to formalizing their initargs.


> >      (setf asm51::*cursymbol* syminst)
> 
> I'm wondering why you're seting this rather than binding it.

Because I want to set it while processing the symbol so I can print
diagnostics later if something asserts during the pass 1 top-level
evals.



> >      ,@body ))
> >
> > (defmacro deftext (name opts &rest rest)
> >   `(asm51::with-sym-setup (,name ,opts '_textsym ,rest)
> >      (push sym asm51::*textsyms*)))
> 
> I wonder why you're package-qualifying the symbols with-sym-setup and
> *textsyms*. Aren't these macros defined in a file with an (in-package
> :asm51) form? (I point this out because it's a may be a symptom of
> another flavor of confusion about the relation between read time and
> macroexpand time and how they both relate to packages.)


*textsyms* is not imported into the compilation package, its intended to
be a symbol internal to the asm51 package, accumulating top-level
symbols.  Macroexpand is not in question here, it would be the same if
everything was defun.


> > (defmacro defdata (name opts &rest rest)
> >   `(asm51::with-sym-setup (,name ,opts '_datasym ,rest)
> >      (push sym asm51::*datasyms*)))
> >
> > (defmacro defbss (name opts &rest rest)
> >   `(asm51::with-sym-setup (,name ,opts '_bsssym ,rest)
> >      (push sym asm51::*bsssyms*)))
> >
> >
> > via this loop in pass #1 (read pass)
> >
> >             (loop for e = (read str nil nil)
> >                   while e
> >                   do
> >                   (eval e))
> 
> I'm confused what the relationship between this loop and the macros
> shown above is? Are you using this loop to read a series of forms from
> a file where the top-level-forms read are likely to be DEFTEXT,
> DEFDATA, and DEFBSS forms? If so, why don't you just use LOAD to load
> the file? Then you could also use COMPILE-FILE to compile the files
> down to something that already has the macros expanded, etc. and will
> LOAD faster. (BTW, this use of EVAL, if I've understood things
> correctly, is fine since it's the level of evaluation the LOAD would
> normally do for you.)

The stream may not be a file.


> 
> > In pass #2 (top-level symbol sexp-by-sexp code eval), I identify and
> > record each label symbol but do not intern it;
> 
> So the interning is still a red-herring. What do you think interning
> does?

Top-level intern is removed as per above.

> 
> >                   (loop for se in (sym-data syminst)
> >                         for e = nil
> >                         with rv = nil
> >                         do
> >                         ;;
> >                         ;; evaluate the sexp, accumulate non-nil results in rv
> >                         ;;
> >                         (cond ((symbolp se)
> >                                ;; se is symbol, make a label out of it
> >                                (setf e (make-label se))
> >                                ;; save it in the labels list
> >                                (push e (sym-labels syminst)) )
> >
> >                               (t 
> >                                ;; not a symbol, eval it
> >                                (setf e (eval se)) ) )
> 
> This seems, if you'll pardon me being blunt, like pretty much a
> canonical example of misuse of EVAL in a macro. The value of se was
> some code that was in the body of, say, a DEFTEXT form. If you want
> that code to be evaluated, you should arrange for it to be put into
> the expansion of DEFTEXT so it will be evaluated. Of course in your
> case that's not a simple change because you've, as far as I can tell,
> turned the whole macro-expansion machinery inside out.

Could you explain the last sentence a little more?  At this stage in the
code I don't care about macros- all I'm doing is identifying and
recording label symbols and eval'ing each sexp element of the top-level
symbol, producing an intermediate object for each instruction.  I might
be able to change this eval to apply since I'm assuming the only
elements in the sexp stream are label symbols or assembly instructions-
eval lets me cheat.


> 
> > Now later on in pass #5 (link pass), I intern & set the value of the
> > label sybols within each top-level symbol.
> 
> Again, what do you think you're accomplishing by interning these
> symbols. They're already interned. They've been interned ever since
> they were read. And even if they weren't interned, you could still do
> everything with them that you're doing anyway.

This intern (and its unintern complement) are also removed now.  Thanks!

> 
> To provide some food for thought, here's a sketch of part of an
> assembler that works somewhat like I think yours ought to. Basically I
> define a macro DEFTEXT which can be used to define an association
> between a name and a snippet of machine code. The idea (which I'm sort
> of guessing at from the code you've shown us) is that such as snippet
> is then combined with other snippets into a final assembly. This
> sketch doesn't include the ability to include arbitrary Common Lisp
> code within a DEFTEXT body but only because I'm not sure how you're
> envisioning that being used. At anyrate, note how there's no interning
> or uninterning of symbols and no calls to EVAL. After a DEFTEXT form
> is evaluated you can use the function GETTEXT to get at the machine
> code. For instance you can EVAL the following form (or load a file
> containing the form or whatever):
> 
>   (deftext foo
>     (nop)
>     (goto label2)
>     (nop)
>     label1 (nop)
>     (push 128)
>     (push 255)
>     (goto label1)
>     label2 (nop))
> 
> then:
> 
>   ASM51> (gettext 'foo)
>   #(0 1 11 0 0 2 128 2 255 1 4 0)
> 
> Hope this gives you some ideas.

Below is the level of syntax I want to support.  Your gettext example
cannot work because all symbols referenced by the assembly within it
must be defined before binary code can be produced.  The gettext above
has to emit intermediate code that captures the symbols, which is
finally executed once all symbols have addresses.


(defproject testproj
        (:text-base  #x100
         :data-base  nil
	 :text-align 16
	 :data-align 8))

(defmacro xyzpdq (parm)
  `(add acc ,parm))

(defparameter +RESERVE-DATA-LEN+   100)


(defdata tst-data2 (:org #x20)
  (dw 'tst-code))

(defbss tst-bss ()
  (reserve +RESERVE-DATA-LEN+))
                                        
(defdata tst-data ()
  (dw 0 1 2 3 'tst-code 'tst-bss)
  (db 4 5 6 7 '(lobyte tst-data2))
  (db \"hello, world\")
  supercat2
  (filldata +RESERVE-DATA-LEN+)
  (filldata 10 #\\x)
  (filldata 2 '(let ((x 1) (y 2) (z 3))
		(list x y z))) )

(deftext tst-code ()
  (nop)
  (xyzpdq 10)
  (nop)
  supercat
  (addc  a 1)
  (anl   acc 15)
  (inc   dptr)
  (cjne  acc 3 'tst-data)
  (clr   a)
  ;;(ajmp  'supercat2)
  (acall 'supercat)
  (acall 'tst-data)
  (acall '(+ tst-code 2)))



Gregm

From: Peter Seibel
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 17:52:46 +0000
Message-ID: <m264uuakcx.fsf@gigamonkeys.com>

Greg Menke <············@toadmail.com> writes:

> Peter Seibel <·····@gigamonkeys.com> writes:

>> To provide some food for thought, here's a sketch of part of an
>> assembler that works somewhat like I think yours ought to. Basically I
>> define a macro DEFTEXT which can be used to define an association
>> between a name and a snippet of machine code. The idea (which I'm sort
>> of guessing at from the code you've shown us) is that such as snippet
>> is then combined with other snippets into a final assembly. This
>> sketch doesn't include the ability to include arbitrary Common Lisp
>> code within a DEFTEXT body but only because I'm not sure how you're
>> envisioning that being used. At anyrate, note how there's no interning
>> or uninterning of symbols and no calls to EVAL. After a DEFTEXT form
>> is evaluated you can use the function GETTEXT to get at the machine
>> code. For instance you can EVAL the following form (or load a file
>> containing the form or whatever):
>> 
>>   (deftext foo
>>     (nop)
>>     (goto label2)
>>     (nop)
>>     label1 (nop)
>>     (push 128)
>>     (push 255)
>>     (goto label1)
>>     label2 (nop))
>> 
>> then:
>> 
>>   ASM51> (gettext 'foo)
>>   #(0 1 11 0 0 2 128 2 255 1 4 0)
>> 
>> Hope this gives you some ideas.
>
> Below is the level of syntax I want to support.  Your gettext example
> cannot work because all symbols referenced by the assembly within it
> must be defined before binary code can be produced.  The gettext above
> has to emit intermediate code that captures the symbols, which is
> finally executed once all symbols have addresses.

So if you look at my other post where I included the implementation of
DEFTEXT and GETTEXT, you'll see that DEFTEXT works in several
passes. It may be that you want to store the result of one of earlier
passes, before symbols have been resolved. Then you can stitch
together a bunch of bits of code defined with different DEFTEXT's and
DEFBSS's and so forth and then do the final resolution of symbols to
addresses.

Anyway, if you can annotate this test code a brief comment about what
each form does, that'd help me understand what's goin on.

>
> (defproject testproj
>         (:text-base  #x100
>          :data-base  nil
> 	 :text-align 16
> 	 :data-align 8))
>
> (defmacro xyzpdq (parm)
>   `(add acc ,parm))
>
> (defparameter +RESERVE-DATA-LEN+   100)
>
>
> (defdata tst-data2 (:org #x20)
>   (dw 'tst-code))
>
> (defbss tst-bss ()
>   (reserve +RESERVE-DATA-LEN+))
>                                         
> (defdata tst-data ()
>   (dw 0 1 2 3 'tst-code 'tst-bss)
>   (db 4 5 6 7 '(lobyte tst-data2))
>   (db \"hello, world\")
>   supercat2
>   (filldata +RESERVE-DATA-LEN+)
>   (filldata 10 #\\x)
>   (filldata 2 '(let ((x 1) (y 2) (z 3))
> 		(list x y z))) )
>
> (deftext tst-code ()
>   (nop)
>   (xyzpdq 10)
>   (nop)
>   supercat
>   (addc  a 1)
>   (anl   acc 15)
>   (inc   dptr)
>   (cjne  acc 3 'tst-data)
>   (clr   a)
>   ;;(ajmp  'supercat2)
>   (acall 'supercat)
>   (acall 'tst-data)
>   (acall '(+ tst-code 2)))

Finally, on a terminology note, you talk about "symbols" in a fairly
confusing way. If I've understood correctly, a form like a DEFTEXT
form creates something (exactly what isn't quite clear to me yet:
intermediate code or machine code or something) and then gives it a
name. The name is a symbol but the data is not the symbol. In other
words it's analogous to the way DEFUN creates something (a function)
and associates it with a name. For example given:

  (defun hello () (print "hello"))

we don't say the symbol HELLO *is* a function; we say it names the
function.

Similarly in your system, it would probably be better to talk about
"the address of the text segment named FOO" than "the address of the
symbol FOO".

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Greg Menke
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 22:05:37 +0000
Message-ID: <m3iryufuxa.fsf@athena.pienet>

Peter Seibel <·····@gigamonkeys.com> writes:
> Greg Menke <············@toadmail.com> writes:

> So if you look at my other post where I included the implementation of
> DEFTEXT and GETTEXT, you'll see that DEFTEXT works in several
> passes. It may be that you want to store the result of one of earlier
> passes, before symbols have been resolved. Then you can stitch
> together a bunch of bits of code defined with different DEFTEXT's and
> DEFBSS's and so forth and then do the final resolution of symbols to
> addresses.

Thats pretty much what I'm doing.  Or its pretty much what I think I'm
doing.  Or something.

> 
> Anyway, if you can annotate this test code a brief comment about what
> each form does, that'd help me understand what's goin on.
> 
> >
> > (defproject testproj
> >         (:text-base  #x100
> >          :data-base  nil
> > 	 :text-align 16
> > 	 :data-align 8))

Set overall code & data origin and linked symbol alignment policy.  As
given, data symbols will be located by default after all the code
symbols since there is no :data-base defined.  Each text symbol starts
on a 16 byte boundary, etc.  (I've since added :bss-base and
:bss-align).

> >
> > (defmacro xyzpdq (parm)
> >   `(add acc ,parm))

Assembly macro, substitutes its contents for itself wherever it occurs
in the code elsewhere.

> > (defparameter +RESERVE-DATA-LEN+   100)

Constant...

> >
> > (defdata tst-data2 (:org #x20)
> >   (dw 'tst-code))

Assemble a vector to txt-code at address #x0020, overriding origin
policy for this segment only (:align can also be used).  defdata,
deftext & defbss can all do this.

> > (defbss tst-bss ()
> >   (reserve +RESERVE-DATA-LEN+))

Reserve a defined # of bytes as a bss symbol- does not emit code, just
reserves a region of memory.

> > (defdata tst-data ()
> >   (dw 0 1 2 3 'tst-code 'tst-bss)
> >   (db 4 5 6 7 '(lobyte tst-data2))
> >   (db \"hello, world\")
> >   supercat2
> >   (filldata +RESERVE-DATA-LEN+)
> >   (filldata 10 #\\x)
> >   (filldata 2 '(let ((x 1) (y 2) (z 3))
> > 		(list x y z))) )
> >

Emit various sequences of bytes which together comprise a data segment.
dw items accept 16 bit values, emitting them as little endian byte
pairs.  db items are emitted as is- integers are limited to 8 bits.
(lobyte) takes an integer, returning the least significant 8 bits.
(hibyte) is also available.  'supercat2' is a label which is set to the
address of the byte just after the 'd' in the preceeding db sequence.
filldata builds and inserts sequences of bytes, length given by the 1st
parm, the 2nd parm can take a variety of forms and provides an
initialization value or produces a sequence of chars/ints.  The sequence
is trimmed or padded with 0 bytes as necessary to match the given
length.  The sexp parms are supposed to be evaluated at link-time to
produce the data- the length of each section must known beforehand so
memory regions can be worked out.

> > (deftext tst-code ()
> >   (nop)
> >   (xyzpdq 10)
> >   (nop)
> >   supercat
> >   (addc  a 1)
> >   (anl   acc 15)
> >   (inc   dptr)
> >   (cjne  acc 3 'tst-data)
> >   (clr   a)
> >   ;;(ajmp  'supercat2)
> >   (acall 'supercat)
> >   (acall 'tst-data)
> >   (acall '(+ tst-code 2)))
> 

Emit code. xyzpdq is a macro substitution.  'supercat' is set to the
address just after the preceeding nop.  If not commented out, the ajmp
produces a compile-time error because it is not defined within tst-code.
The operand symbol references are supposed to produce their link-time
addresses for coding into the instruction- but expressions are
appropriate too.

> Finally, on a terminology note, you talk about "symbols" in a fairly
> confusing way. If I've understood correctly, a form like a DEFTEXT
> form creates something (exactly what isn't quite clear to me yet:
> intermediate code or machine code or something) and then gives it a
> name. The name is a symbol but the data is not the symbol. In other
> words it's analogous to the way DEFUN creates something (a function)
> and associates it with a name. For example given:
> 
>   (defun hello () (print "hello"))
> 
> we don't say the symbol HELLO *is* a function; we say it names the
> function.
> 
> Similarly in your system, it would probably be better to talk about
> "the address of the text segment named FOO" than "the address of the
> symbol FOO".
> 
> -Peter

Gotcha.  As usual the problem exists between keyboard and
chair... Thanks.

Each assembly instruction is a generic method, operand types selecting
the variant.  The 805x instruction set revels in coding variations based
upon differences in operand types & registers.  When an instruction's
method is called it creates an instance of an class that specifies the
output length in bytes of the instruction and a macro which closes over
the operands (hope I used the right words there..), which when evaluated
later, generates the instruction bytes.  By deferring this step till
last, all sexps, arithmetic, etc.. are computed in one shot and the
instruction can be coded.  There is no fundamental difference between
text, data or bss coding- code instructions can be put in a data segment
and vice versa.  The compiler specifically allows only (reserve ..)
items in bss segments.

deftext et al name a data structure which holds the compile state for
the contents of the associated code/data/bss segment.  Each pass in the
compile process evolves the compile state in the structure.

Regards,

Gregm

From: Cameron MacKinnon
Subject: OT: Re: 8051 assembler
Date: Thu, 28 Jul 2005 02:06:45 +0000
Message-ID: <nfqdnZLxnq0qoHXfRVn-iw@rogers.com>

Jeff M. wrote:
> Here's a sample assembly example (taken from the GameBoy Advance):
> 
> (define-gba-function set-graphics-mode (mode)
>   "Note: the argument does nothing except act as a comment."
>   (asm
...
>     (lsl r1 r2 8)

What's with the three argument left shift? What does that do? I queried 
an ARM instruction set quick reference card, but it didn't solve the 
mystery. Just curious.

-- 
Cameron MacKinnon
Toronto, Canada

From: Jeff M.
Subject: Re: OT: Re: 8051 assembler
Date: Thu, 28 Jul 2005 14:11:08 +0000
Message-ID: <1122559868.290896.11200@g43g2000cwa.googlegroups.com>

r1 = r2 << 8

This is in THUMB mode. Of course, in ARM mode the instruction would
become:

(mov r1 r2 (lsl 8))   ; mov r1, r2, lsl #8 

Cheers,
Jeff M.

From: Julian Squires
Subject: Re: 8051 assembler in Common Lisp
Date: Fri, 29 Jul 2005 00:22:28 +0000
Message-ID: <slrndeiti4.e94.tek@localhost.localdomain>

On 2005-07-27, Jeff M. <·······@gmail.com> wrote:
> I've done something extremely similar for the ARM7TDMI. I'd have to dig
> it up, but it might be of some use to you. Simply put, the assmebler
> package contained a hash table of all the instruction mnemonics. A
> macro 'define-instruction' created a mnemonic, handled the operands,
> and returned one of two values: the opcode or the number of bytes the
> opcode required. Which value was returned was based on whether or not
> *pass* was set to 0 or 1.

It's interesting to see a bunch of assemblers in CL.  I wrote one
recently for the m68k, with an accompanying linker for producing Atari
ST binaries.  I was going to retarget it for the ARM sometime, but your
design sounds more interesting (I was going for an assembler with more
convenientional syntax, that can break out into lisp for macros).

I'm hoping to release mine soon, once I get the time to clean up a few
things (better errors and warnings).  I'd definitely like to see the
others, as well.

Cheers.

-- 
Julian Squires

From: Tim Wilson
Subject: Re: 8051 assembler in Common Lisp
Date: Wed, 27 Jul 2005 23:22:52 +0000
Message-ID: <pan.2005.07.28.00.14.38.215634@please.com>

On Tue, 26 Jul 2005 18:38:02 -0400, Greg Menke wrote:

> - use of eval.  I eval each top-level sexp, and later each instruction
>   within them, my theory being I want to give the user full use of macro
>   facilities and all language features to put together the instructions
>   the compiler will process to generate the machine code (ie; I also
>   import :cl-user so the user can do whatever they like macro-wise).  Is
>   this the kind of place where eval is desirable- or should I be pursing
>   a macroexpand and apply approach?
> 
> - use of symbols as labels.  In "tst-code" below, I have a symbol
>   'supercat' which I'm using as an internal label to capture addresses
>   for use by other instructions.  In the final compile pass, I intern
>   each label and set its value to the linked address so references to
>   the label will work like any other user-supplied symbol.  Since these
>   labels are only defined within a top-level sexp and are unavailable to
>   other sexps', I intern them before processing the top-level sexp and
>   unintern them afterwards, again using eval so I form the calls
>   properly.  This seems really clumsy, though it works.  Is there a more
>   elegant method?

I wrote an x86 assembler (mind you, Scheme not CL) and was
considering doing the EVAL approach, but I realized multithreading was
crucially important.  Plus I didn't really know what context the assembler
would later be used in, so I tried to make it generic as possible.  Here's
what I ended up with:

- interpreted s-expr asm syntax
- simple tries to store mnemonics (x86 contains hundreds of mnemonics,
most just a handful of characters long and many just a character or two
difference between them.. tries seemed to make sense)

Just having s-expr syntax gets you many things even if you don't use EVAL.
 You get READ/LOAD and you still get the quasiquote facilities.  The only
 difficulty and pain is when you have to implement address addition, etc.
 And, of course, grabbing the value of labels becomes impossible w/
 quasiquote.  You have to resort to pseudo-mnemonics for things of that
 nature.  It doesn't seem to be that big of a deal in practice, though,
 because quasiquote alone handles most things you will probably need
 (inserting variables, generating table data, etc.).

I also don't know if I could have gone the EVAL approach and still used
the s-expr x86 address notation I used (ModR/M, etc. is a hairy
monster).  Or the backpatching to fix forward labels.

You might also want to check out Movitz.  It's CL and x86.  Not quite sure
how they approach it.

Tim

From: ·······@gmail.com
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 02:33:00 +0000
Message-ID: <1122517980.447843.226110@f14g2000cwb.googlegroups.com>

Tim Wilson wrote :
> I also don't know if I could have gone the EVAL approach and still used
> the s-expr x86 address notation I used (ModR/M, etc. is a hairy
> monster).  Or the backpatching to fix forward labels.
(Apologies in advance if google messes up the threading. My uni doesn't
seem to provide any news server)

The addresses & variable length jumps problem looks to me like a
constraint solving problem that wouldn't be that hard to express. I'd
take a look at screamer (or screamer+, which has basic support for
working on aggregate objects like objects or lists). I think it would
still work with address arithmetic, too. Of course, this is a pretty
big hammer for your problem, but it's already written, so why reinvent
a (simpler) wheel?

Paul Khuong

From: Tim Wilson
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 04:33:32 +0000
Message-ID: <pan.2005.07.28.05.36.14.586982@please.com>

On Wed, 27 Jul 2005 19:33:00 -0700, pkhuong wrote:

> Tim Wilson wrote :
>> I also don't know if I could have gone the EVAL approach and still used
>> the s-expr x86 address notation I used (ModR/M, etc. is a hairy
>> monster).  Or the backpatching to fix forward labels.
> 
> The addresses & variable length jumps problem looks to me like a
> constraint solving problem that wouldn't be that hard to express. I'd
> take a look at screamer (or screamer+, which has basic support for
> working on aggregate objects like objects or lists). I think it would
> still work with address arithmetic, too. Of course, this is a pretty
> big hammer for your problem, but it's already written, so why reinvent
> a (simpler) wheel?

I definitely would have, and I did know of the AMB operator.  Hindsight is
20/20 and all that, though.  Knowing what would work good and knowing how
to make it work as such are two different beasts, I'm afraid ;-)

It's simple enough that one day I may go back and change it.  Right now
it's a 1.x pass assembler (where "x" is how many
lines/instructions contain an undefined label). Each line that references
an undefined label is tossed into a queue which is then replayed by the
same assembler engine. On the first pass each line is partially evaluated
and a guess is made to the final instruction size, so it does have its
problems (i.e. you must declare a byte label for small forward jumps, else
you get the default word size jump, which would be 16 or 32-bit).  Other
assemblers generate optimal (smallest) code by adding more passes until
no instructions change. I find this on the ugly size, though, and would
definitely try a constraint system prior to doing that.

Tim

From: Frode Vatvedt Fjeld
Subject: Re: 8051 assembler in Common Lisp
Date: Thu, 28 Jul 2005 20:01:36 +0000
Message-ID: <2hy87q1yzj.fsf@vserver.cs.uit.no>

Greg Menke <············@toadmail.com> writes:

> [..] I have two questions;
>
> - use of eval. [..]

My approach is to use a simple functional style, something like

  (defun assemble (program &key (text-base #x000) ..)
     ...)

where program is a list of instructions and labels etc., and use it
often something like

  (assemble `((:mov :a :b)
              (:call ,some-addres)
              ...))

and of course any other conceivable means of constructing
lists/programs. I ended up recognizing instruction-names,
register-names etc. by string=, and use keywords in programs by
convention. Furthermore, I found it useful to have the assemble
function understand a special :funcall "macro" whose arguments can be
assembly-labels, and which is called/expanded when each argument is
assigned a value.

> - use of symbols as labels.

I use symbols only for their identity, and store the values separately
(hash-table or assoc or somesuch). Actually I find it useful to have
the assemble function return the symbol-table as an assoc as its
secondary value.

> (deftext tst-code ()
>   (nop)
>   (nop)
>   supercat
>   (addc  a 1)
>   (anl   acc 15)
>   ;;(ajmp  'supercat2)
>   (acall 'supercat)
>   (acall 'tst-data)
>   (acall '(+ tst-code 2)))
>

In the few cases I've got top-level things like this (mostly my code
gets generated in small pieces here and there in a largish lisp
program.. well, a compiler) I do like this:

(defun make-testproj (&key (increment 1))
  `((:nop)
    (:nop)
    supercat
    (:addc :a ,increment)
    ...))

and do (assemble (make-testproj)).

-- 
Frode Vatvedt Fjeld