memory usage in cmucl

From: goose
Subject: memory usage in cmucl
Date: Thu, 22 Jun 2006 23:35:33 +0000
Message-ID: <1151019333.457447.244390@r2g2000cwb.googlegroups.com>

Hello all

The code below implements a simple protocol (receive). The protocol
expects frames to be preceded by the length + newline.

When running under cmucl, I get the following messages (repeatedly
with the memory size constantly increasing).

; [GC threshold exceeded with 12,009,624 bytes in use.  Commencing GC.]
; [GC completed with 182,952 bytes retained and 11,826,672 bytes
freed.]
; [GC will next occur when at least 12,182,952 bytes are in use.]
; [GC threshold exceeded with 12,194,024 bytes in use.  Commencing GC.]
; [GC completed with 193,936 bytes retained and 12,000,088 bytes
freed.]
; [GC will next occur when at least 12,193,936 bytes are in use.]
; [GC threshold exceeded with 12,202,704 bytes in use.  Commencing GC.]
; [GC completed with 239,040 bytes retained and 11,963,664 bytes
freed.]
; [GC will next occur when at least 12,239,040 bytes are in use.]

What am I doing wrong here? I've tried the same code with clisp
and I've gotten no messages of this sort. Is this a memory leak
that I should be worried about or will the memory usage eventually
stabilise?

Thanks


--------problem code--------------
;;; function to check if a character is a digit
(defun is-digit (char)
  (digit-char-p char))


;;; macro to set the state
(defmacro set-state (statename)
  `(setf state ,statename))

;;; The control character
(defparameter driver-c nil)
(defparameter incoming nil)
(defparameter size nil)

;;;; macro to define a state
(defmacro defstate (statename &body statebody)
  `(defconstant ,statename
     '(progn
        ; (format t "state=~a~%" ',statename)
        ,@statebody)))

;;; All the states
(defstate +start+
  (set-state +read-digit+)
  (setf incoming "0")
  (setf size -1))

(defstate +read-digit+
  (setf driver-c (read-char-no-hang))
   (cond
    ((eq driver-c nil) (progn
                         (format t ".")
                         (set-state +read-digit+)))
    ((is-digit driver-c) (set-state +store-digit+))
    (t (set-state +set-size+))))

(defstate +store-digit+
  (setf incoming (concatenate 'string incoming (string driver-c)))
  (set-state +read-digit+))

(defstate +set-size+
  (setf size (read-from-string incoming))
  (setf incoming "")
  (set-state +read-char+))

(defstate +read-char+
  (setf driver-c (read-char-no-hang ))
  (when driver-c
    (set-state +store-char+)))

(defstate +store-char+
  (setf incoming (concatenate 'string incoming (string driver-c)))
  (setf size (- size 1))
  (if (<= size 0)
      (set-state +end+)
    (set-state +read-char+)))

(defstate +end+
  nil)

(defparameter state +start+)

(defun run-machine ()
  (eval state)
  (if (eq state +end+)
      nil
    t))




(defun test ()
  (do ((machine (run-machine) (run-machine)))
      ((eq machine nil)
       (format t "state machine completed~%~a~%" incoming))
      (when (eq state +read-char+)
        (format t "reading...~%"))))

Re: memory usage in cmucl ··············@hotmail.com
- Re: memory usage in cmucl Rob Warnock
  - Re: memory usage in cmucl goose
    - Re: memory usage in cmucl Rob Warnock
- Re: memory usage in cmucl goose
  - Re: memory usage in cmucl ··············@hotmail.com
    - Re: memory usage in cmucl goose
      - Re: memory usage in cmucl ··············@hotmail.com
        Re: memory usage in cmucl goose
Re: memory usage in cmucl Wolfram Fenske
- Re: memory usage in cmucl Rob Warnock

From: ··············@hotmail.com
Subject: Re: memory usage in cmucl
Date: Fri, 23 Jun 2006 03:05:28 +0000
Message-ID: <1151031928.670326.144810@p79g2000cwp.googlegroups.com>

goose wrote:
> Hello all
>
> The code below implements a simple protocol (receive). The protocol
> expects frames to be preceded by the length + newline.
>
> When running under cmucl, I get the following messages (repeatedly
> with the memory size constantly increasing).

> What am I doing wrong here? I've tried the same code with clisp
> and I've gotten no messages of this sort. Is this a memory leak
> that I should be worried about or will the memory usage eventually
> stabilise?

This is incredibly ugly and un-Lispy code. I'll try to point out a few
major issues.

1) The absolute worst flaw in this code is the use of EVAL in the basic
inner loop. If you want to execute code which is known in advance, make
that code into a function and call that function. In your case, you
should make your "state transition functions" into functions, instead
of literal lists passed to eval.

2) Some cosmetic things:

> --------problem code--------------
> ;;; function to check if a character is a digit
> (defun is-digit (char)
>   (digit-char-p char))

This is completely superflous. Get rid of it. You are programming in
Common Lisp, feel free to use Common Lisp functions.

>
> ;;; macro to set the state
> (defmacro set-state (statename)
>   `(setf state ,statename))

Likewise. This macro is just taking up space. Setting the next state is
an abstraction that should be hidden in a more elegant state machine
notation, not by a simple wrapper macro.

> ;;; The control character
> (defparameter driver-c nil)
> (defparameter incoming nil)
> (defparameter size nil)
>
> ;;;; macro to define a state
> (defmacro defstate (statename &body statebody)
>   `(defconstant ,statename
>      '(progn
>         ; (format t "state=~a~%" ',statename)
>         ,@statebody)))

This is the first real sin; the progn is coding a function. Store it as
one.

(defmacro defstate (statename &body body)
  `(defconstant ,statename
    (lambda () ,@statebody)))

This macro does only slightly more than nothing.

> ;;; All the states
> (defstate +start+
>   (set-state +read-digit+)
>   (setf incoming "0")
>   (setf size -1))

Look up DECF

> (defstate +read-digit+
>   (setf driver-c (read-char-no-hang))
>    (cond
>     ((eq driver-c nil) (progn
>                          (format t ".")
>                          (set-state +read-digit+)))
>     ((is-digit driver-c) (set-state +store-digit+))
>     (t (set-state +set-size+))))

You don't need the progn. COND consequences can contain multiple forms.

I'm not sure why you use read-char-no-hang; you have nothing apparently
useful to do until a character arrives.

> (defstate +store-digit+
>   (setf incoming (concatenate 'string incoming (string driver-c)))
>   (set-state +read-digit+))
>
> (defstate +set-size+
>   (setf size (read-from-string incoming))
>   (setf incoming "")
>   (set-state +read-char+))

READ-FROM-STRING is too heavyweight. Use PARSE-INTEGER

> (defstate +read-char+
>   (setf driver-c (read-char-no-hang ))
>   (when driver-c
>     (set-state +store-char+)))
>
> (defstate +store-char+
>   (setf incoming (concatenate 'string incoming (string driver-c)))
>   (setf size (- size 1))
>   (if (<= size 0)
>       (set-state +end+)
>     (set-state +read-char+)))

Again, use DECF. Also, you can do this

(setf state (if (<= size 0) +end+ +read-char+))

I.e. IF can return a value just fine.

> (defstate +end+
>   nil)
>
> (defparameter state +start+)
>
> (defun run-machine ()
>   (eval state)
>   (if (eq state +end+)
>       nil
>     t))

Here is the most profound sin. STATE contains code that was known in
advance. That code can be put into a function as above. Then, you would
call it for each step.

(defun run-machine ()
  (funcall state)
  (not (eq state +end+)))

I would have called this "step-machine" But that is a nitpick.

> (defun test ()
>   (do ((machine (run-machine) (run-machine)))
>       ((eq machine nil)
>        (format t "state machine completed~%~a~%" incoming))
>       (when (eq state +read-char+)
>         (format t "reading...~%"))))

Note: you are not initializing the state machine. I would add a (setf
state +start+) before the loop.

Perhaps you are trying to define state machines as an exercise in state
machines. I think this protocol would be more compactly and clearly
implemented directly in Lisp.

From: Rob Warnock
Subject: Re: memory usage in cmucl
Date: Fri, 23 Jun 2006 04:43:11 +0000
Message-ID: <TOydnQok0evC7AbZnZ2dnUVZ_rOdnZ2d@speakeasy.net>

··············@hotmail.com <············@gmail.com> wrote:
+---------------
| > When running under cmucl, I get the following messages (repeatedly
| > with the memory size constantly increasing).
| > What am I doing wrong here? ...
| 
| This is incredibly ugly and un-Lispy code. I'll try to point out a few
| major issues.  [...[trimmed]...
+---------------

One additional issue is that the CMUCL interpreter tends to generate
quite a lot of garbage. You'll do a *lot* better if you compile your
code before you try to worry about the garbage-generation rate, e.g.:

    cmu> (defun delay (n) (dotimes (i n)))

    DELAY
    cmu> (time (delay 1000000))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 
    ; [GC threshold exceeded with 12,013,880 bytes in use.  Commencing GC.]
    ; [GC completed with 1,079,768 bytes retained and 10,934,112 bytes freed.]
    ; [GC will next occur when at least 13,079,768 bytes are in use.]
    ; [GC threshold exceeded with 13,090,712 bytes in use.  Commencing GC.]
    ; [GC completed with 1,079,776 bytes retained and 12,010,936 bytes freed.]
    ; [GC will next occur when at least 13,079,776 bytes are in use.]
    ; [GC threshold exceeded with 13,089,240 bytes in use.  Commencing GC.]
    ; [GC completed with 1,089,832 bytes retained and 11,999,408 bytes freed.]
    ; [GC will next occur when at least 13,089,832 bytes are in use.]
    ; [GC threshold exceeded with 13,099,296 bytes in use.  Commencing GC.]
    ; [GC completed with 1,089,832 bytes retained and 12,009,464 bytes freed.]
    ; [GC will next occur when at least 13,089,832 bytes are in use.]

    ; Evaluation took:
    ;   4.03f0 seconds of real time
    ;   4.0f0 seconds of user run time
    ;   0.03f0 seconds of system run time
    ;   7,266,606,333 CPU cycles
    ;   [Run times include 0.1f0 seconds GC run time]
    ;   133 page faults and
    ;   48,010,536 bytes consed.
    ; 
    NIL
    cmu> (compile 'delay)
    ; Compiling LAMBDA (N): 
    ; Compiling Top-Level Form: 

    DELAY
    NIL
    NIL
    cmu> (time (delay 1000000))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.01f0 seconds of real time
    ;   0.01f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   26,082,042 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

Compare the number of "bytes consed" (total size of all objects
allocated by the GC during the execution of "time") in the two
of these: ~48 MB versus *zero*!

Also consider the runtime: ~4 seconds vs. 0.01.


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: goose
Subject: Re: memory usage in cmucl
Date: Fri, 23 Jun 2006 11:45:12 +0000
Message-ID: <1151063112.877557.305850@y41g2000cwy.googlegroups.com>

Rob Warnock wrote:
> ··············@hotmail.com <············@gmail.com> wrote:
> +---------------
> | > When running under cmucl, I get the following messages (repeatedly
> | > with the memory size constantly increasing).
> | > What am I doing wrong here? ...
> |
> | This is incredibly ugly and un-Lispy code. I'll try to point out a few
> | major issues.  [...[trimmed]...
> +---------------
>
> One additional issue is that the CMUCL interpreter tends to generate
> quite a lot of garbage. You'll do a *lot* better if you compile your
> code before you try to worry about the garbage-generation rate, e.g.:

Thanks for the tip, although IIRC I did initially compile the
code[1]. If this happens in any future (compiled) cmucl code
where should I start looking? Are there any obvious (or non-
obvious) ways of leaking memory?

[1] I don't remember for sure right now, because I spent
a significant amount of time trying to find out how
one could be leaking memory in cmucl, but I will attempt
this again with compiled code and see if there is a difference.

<snipped>

thanks
goose

From: Rob Warnock
Subject: Re: memory usage in cmucl
Date: Sat, 24 Jun 2006 03:10:32 +0000
Message-ID: <noudnaGTkY21MAHZnZ2dnUVZ_sqdnZ2d@speakeasy.net>

goose <····@webmail.co.za> wrote:
+---------------
| If this happens in any future (compiled) cmucl code where
| should I start looking? Are there any obvious (or non-obvious)
| ways of leaking memory? 
+---------------

1. Don't forget the values of the REPL convenience variables:

      +    *    /   -
      ++   **   //
      +++  ***  ///

   Recently-evaluated forms/values can get "stuck" in there for up
   to four REPL entries [that is, remain visible to the GC], until
   they get pushed off the end of the "shift registers".

2. CMUCL allocates "large" objects separately, and does not always
   release them if you force a GC [with "(GC)"] before the "will
   next occur" limit is hit automatically. Performing a (GC :FULL T)
   should release such things.[1]

Combining these two [watch the "GC completed with" lines!]:

      > (setf *print-length* 25)

      25
      > (gc)
      ; [GC threshold exceeded with 1,044,712 bytes in use.  Commencing GC.]
      ; [GC completed with 1,046,576 bytes retained and -1,864 bytes freed.]
      ; [GC will next occur when at least 13,046,576 bytes are in use.]

      NIL
      > (make-array 1000000)

      #(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...)
      cmu> (gc :full t)
      ; [GC threshold exceeded with 5,052,792 bytes in use.  Commencing GC.]
      ; [GC completed with 5,042,984 bytes retained and 9,808 bytes freed.]
      ; [GC will next occur when at least 17,042,984 bytes are in use.]

      NIL
      > (gc :full t)
      ; [GC threshold exceeded with 5,046,952 bytes in use.  Commencing GC.]
      ; [GC completed with 5,040,768 bytes retained and 6,184 bytes freed.]
      ; [GC will next occur when at least 17,040,768 bytes are in use.]

      NIL
      > (gc :full t)
      ; [GC threshold exceeded with 5,044,736 bytes in use.  Commencing GC.]
      ; [GC completed with 5,040,784 bytes retained and 3,952 bytes freed.]
      ; [GC will next occur when at least 17,040,784 bytes are in use.]

      NIL
      > (gc :full t)
      ; [GC threshold exceeded with 5,044,752 bytes in use.  Commencing GC.]
      ; [GC completed with 1,040,784 bytes retained and 4,003,968 bytes freed.]
      ; [GC will next occur when at least 13,040,784 bytes are in use.]

      NIL
      > 

By the fourth (GC :FULL T), the array had finally shifted out of
the "***" & "///" variables, and the collection finally succeeded.

[You could have also typed NIL three times and then a single (GC :FULL T),
with the same result.]


-Rob

[1] Actually, in cases like the above, a (GC :GEN 2) will usually do it,
but as long as you're doing it manually, why not use (GC :FULL T)?

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: goose
Subject: Re: memory usage in cmucl
Date: Fri, 23 Jun 2006 12:37:16 +0000
Message-ID: <1151066235.962612.255140@c74g2000cwc.googlegroups.com>

··············@hotmail.com wrote:

<snipped>

>
> This is incredibly ugly and un-Lispy code. I'll try to point out a few
> major issues.

Thanks. I'll heed the good advice and try to explain
myself over why I did some things "wrong". Sadly,
I can only change this once I get home (writing this
response on my lunch break at work).

<snipped>

> 2) Some cosmetic things:
>
> > --------problem code--------------
> > ;;; function to check if a character is a digit
> > (defun is-digit (char)
> >   (digit-char-p char))
>
> This is completely superflous. Get rid of it. You are programming in
> Common Lisp, feel free to use Common Lisp functions.
>

Ok. The reason that that is there is because I only found
out that (digit-char-p ...) exists after I had written
the function body to actually check the char against
each character and was too lazy to go back and change
the reference to is-digit.

> >
> > ;;; macro to set the state
> > (defmacro set-state (statename)
> >   `(setf state ,statename))
>
> Likewise. This macro is just taking up space. Setting the next state is
> an abstraction that should be hidden in a more elegant state machine
> notation, not by a simple wrapper macro.

I don't understand why "a more elegant state machine notation"
actually is better than a simple macro. Maybe you have an example
you'd like to share (I am, after all, still very new to this :-)?

The reason I've used a macro (set-state...) as above
is because there might be something I want to do
out on every state transition and having a macro
let me (for example) put in a diagnostic message
on every state transition.

>
> > ;;; The control character
> > (defparameter driver-c nil)
> > (defparameter incoming nil)
> > (defparameter size nil)
> >
> > ;;;; macro to define a state
> > (defmacro defstate (statename &body statebody)
> >   `(defconstant ,statename
> >      '(progn
> >         ; (format t "state=~a~%" ',statename)
> >         ,@statebody)))
>
> This is the first real sin; the progn is coding a function. Store it as
> one.

True, I shall make this change.

>
> (defmacro defstate (statename &body body)
>   `(defconstant ,statename
>     (lambda () ,@statebody)))
>
> This macro does only slightly more than nothing.

Like (set-state...), it lets me add (if I wanted to later)
a prologue and/or epilogue to every state I define without
having to modify each state definition seperately.

For example, if I wanted each state to have an associated
timeout, I can put the reset-timeout code in the defstate
macro and every state would then have that code in it.

>
> > ;;; All the states
> > (defstate +start+
> >   (set-state +read-digit+)
> >   (setf incoming "0")
> >   (setf size -1))
>
> Look up DECF

Done, thanks :-)

>
> > (defstate +read-digit+
> >   (setf driver-c (read-char-no-hang))
> >    (cond
> >     ((eq driver-c nil) (progn
> >                          (format t ".")
> >                          (set-state +read-digit+)))
> >     ((is-digit driver-c) (set-state +store-digit+))
> >     (t (set-state +set-size+))))
>
> You don't need the progn. COND consequences can contain multiple forms.

Thanks, I didn't know that either.

>
> I'm not sure why you use read-char-no-hang; you have nothing apparently
> useful to do until a character arrives.
>

Because there are so many good reasons to not wait for input
forever:
1. I might want to decrement a counter so that I can timeout
   if need be.
2. The user might want to abort the state-machine if the other
   side is hanging.
3. I might want to update the display with a counter displaying
   how long we have been waiting.
4. I might want to read from another stream if I can (or
   else the program on the other side of my hypothetical
   other stream might think that I'm hanging).

Below (where you suggest "step-machine") you'll see that
I *could've* written the (run-machine..) function as a single
(do...) loop, but I wrote it so that it can be stepped through
a single state at a time. This allows the caller
of (run-machine...) to do other things (including all of
the above 4 reasons) on each step.

I can't actually think of a reason where I would *want*
any particular state to wait on a stream while blocking
execution of the program - it looks (to the user) as
if the program has hung when the program cannot even display
a spinning paddle to indicate its busy processing.

> > (defstate +store-digit+
> >   (setf incoming (concatenate 'string incoming (string driver-c)))
> >   (set-state +read-digit+))
> >
> > (defstate +set-size+
> >   (setf size (read-from-string incoming))
> >   (setf incoming "")
> >   (set-state +read-char+))
>
> READ-FROM-STRING is too heavyweight. Use PARSE-INTEGER

Another good idea that I didn't know about; thanks.

>
> > (defstate +read-char+
> >   (setf driver-c (read-char-no-hang ))
> >   (when driver-c
> >     (set-state +store-char+)))
> >
> > (defstate +store-char+
> >   (setf incoming (concatenate 'string incoming (string driver-c)))
> >   (setf size (- size 1))
> >   (if (<= size 0)
> >       (set-state +end+)
> >     (set-state +read-char+)))
>
> Again, use DECF. Also, you can do this
>
> (setf state (if (<= size 0) +end+ +read-char+))
>
> I.e. IF can return a value just fine.
>

Ok; I knew about this one but it seems I'm already
too steeped in imperative-type programming. I'll
(eventually) shake the habit off I suppose.

> > (defstate +end+
> >   nil)
> >
> > (defparameter state +start+)
> >
> > (defun run-machine ()
> >   (eval state)
> >   (if (eq state +end+)
> >       nil
> >     t))
>
> Here is the most profound sin. STATE contains code that was known in
> advance. That code can be put into a function as above. Then, you would
> call it for each step.
>
> (defun run-machine ()
>   (funcall state)
>   (not (eq state +end+)))
>

Yes. Thanks again for this fix. It certainly seems to look cleaner.

> I would have called this "step-machine" But that is a nitpick.

true - thats more descriptive, and will be changed as well.

>
> > (defun test ()
> >   (do ((machine (run-machine) (run-machine)))
> >       ((eq machine nil)
> >        (format t "state machine completed~%~a~%" incoming))
> >       (when (eq state +read-char+)
> >         (format t "reading...~%"))))
>
> Note: you are not initializing the state machine. I would add a (setf
> state +start+) before the loop.
>
> Perhaps you are trying to define state machines as an exercise in state

Yes.

> machines. I think this protocol would be more compactly and clearly
> implemented directly in Lisp.

Probably, but sooner or later I *will* have a need for a state-machine
and I may as well attempt it sooner rather than later.

Thanks for your help, feel free to add more :-)
goose

From: ··············@hotmail.com
Subject: Re: memory usage in cmucl
Date: Fri, 23 Jun 2006 16:43:55 +0000
Message-ID: <1151081035.184249.166830@c74g2000cwc.googlegroups.com>

goose wrote:
> ··············@hotmail.com wrote:
>

> > > ;;; macro to set the state
> > > (defmacro set-state (statename)
> > >   `(setf state ,statename))
> >
> > Likewise. This macro is just taking up space. Setting the next state is
> > an abstraction that should be hidden in a more elegant state machine
> > notation, not by a simple wrapper macro.
>
> I don't understand why "a more elegant state machine notation"
> actually is better than a simple macro. Maybe you have an example
> you'd like to share (I am, after all, still very new to this :-)?
>
> The reason I've used a macro (set-state...) as above
> is because there might be something I want to do
> out on every state transition and having a macro
> let me (for example) put in a diagnostic message
> on every state transition.

The basic concept I had was that hiding "I am updating a global state
variable" is not really that much benefit. Your macro has all of the
pitfalls, and it is not clear how you could refactor it such that
"set-state" as a primitive would be helpful.

The more "Lispy" approach from my point of view would be to take an
s-expression based "language for state machines" and using macros to
transform that language into Lisp code to implement it. This
transformation would typically be either an interpreter or a compiler,
and the power of Lisp makes the compiler achievable.

The general way a state machine gets expressed *might* be something
like

(def-state-machine my-input-parser
   :state-variables (input-char (line-char-count 0) result-line)
   :initial-state start-state
   :termination-state end-state
   :return-value result-line
    (start-state :action (setf input-char (read-char))
      :transition
      (cond
        ((digit-char-p input-char) store-digit)
        (t start-digit)))
    (store-digit :action
      (setf line-char-count (+ (* 10 line-char-count) (digit-char-p
input-char))
      :transition
      ...)))

I.e. a state machine consists of

1) state variables
2) state nodes, each of which has
  -  an action function which gets called when the state is "entered"
  -  a "transition function" which is called and returns the next node
in the state machine to be executed.
3) an initial state, which are values for the state variables and an
initial node
4) a final state (or states?) which, when reached, indicate the state
machine has finished executing. (It might make sense to have a
convention that the node represented by NIL is always a final node...)
5) a "return value" which, when the final state is reached, represents
the summary of the state variables which is of interest

The macro "def-state-machine" would create a data structure containing
those nodes, where the nodes probably contain *compiled* Lisp
representations of the action and transition functions, encapsulating
the state variables, and Lisp function called to create the final
return value, and define a corresponding class.

  There would almost be

(setf *my-parser* (make-instance 'my-input-parser))

Then, there would be a method

(run-state-machine *my-parser*)
;; runs, gathers input 5abcde
--> "abcde"

i.e., it executes the state machine; in your example, reading
characters from the input stream, and processing them to get a final
string, which gets returned as the value of the run-state-machine
expression.

I was hesitant to discuss this in such detail, because I was hoping
some more experienced Lisper would point you to a state machine package
which has a fully designed solution for this problem space, instead of
some 15-minute design that I whipped up for Usenet.

From: goose
Subject: Re: memory usage in cmucl
Date: Fri, 23 Jun 2006 23:55:58 +0000
Message-ID: <1151106958.631056.53620@p79g2000cwp.googlegroups.com>

··············@hotmail.com wrote:
> goose wrote:

<snipped>

> The more "Lispy" approach from my point of view would be to take an
> s-expression based "language for state machines" and using macros to
> transform that language into Lisp code to implement it. This
> transformation would typically be either an interpreter or a compiler,
> and the power of Lisp makes the compiler achievable.
>

Yup; that is the intended aim in the end (can't have spurious
defconstants scattered throughout the program).

My first few attempts failed. I then made this attempt (the ugly one)
hoping that I'd be able to wrap it up later with a let* or similar.

> The general way a state machine gets expressed *might* be something
> like
>

*might* ? How do you normally do it?

> (def-state-machine my-input-parser
>    :state-variables (input-char (line-char-count 0) result-line)
>    :initial-state start-state
>    :termination-state end-state
>    :return-value result-line
>     (start-state :action (setf input-char (read-char))
>       :transition
>       (cond
>         ((digit-char-p input-char) store-digit)
>         (t start-digit)))
>     (store-digit :action
>       (setf line-char-count (+ (* 10 line-char-count) (digit-char-p
> input-char))
>       :transition
>       ...)))
>
> I.e. a state machine consists of
>
> 1) state variables

Sure

> 2) state nodes, each of which has
>   -  an action function which gets called when the state is "entered"

Not really needed; the most I'd use it for is diagnostics (which is why
I
wrote a macro to define the state).

>   -  a "transition function" which is called and returns the next node
> in the state machine to be executed.

Really, the logic of a single state (or body). "Transition" implies
a state change; very often you may remain in the same state.

> 3) an initial state, which are values for the state variables and an
> initial node

Yup

> 4) a final state (or states?)

Only one. In a state machine having a number of different
end states makes as much sense as having a number of
different start states.

> which, when reached, indicate the state
> machine has finished executing. (It might make sense to have a
> convention that the node represented by NIL is always a final node...)

I did that, hence only a single end state.

> 5) a "return value" which, when the final state is reached, represents
> the summary of the state variables which is of interest

sure.

>
> The macro "def-state-machine" would create a data structure containing
> those nodes, where the nodes probably contain *compiled* Lisp
> representations of the action and transition functions, encapsulating
> the state variables, and Lisp function called to create the final
> return value, and define a corresponding class.

I will have a bash at something like that, and repost.

>
>   There would almost be
>
> (setf *my-parser* (make-instance 'my-input-parser))
>
> Then, there would be a method
>
> (run-state-machine *my-parser*)
> ;; runs, gathers input 5abcde
> --> "abcde"
>

I would change this slightly to allow the caller to step
the machine one state at a time (for all the reasons I mentioned
in my previous post), but other than that, I initially did something
similar using a class to hold the entire machine; I always got
lost when trying to reference states that were not yet declared
so I decided to use a defconstant for each state.

I will maybe make another attempt and repost this weekend
if I get the time.

<snipped>

> I was hesitant to discuss this in such detail, because I was hoping
> some more experienced Lisper would point you to a state machine package
> which has a fully designed solution for this problem space, instead of
> some 15-minute design that I whipped up for Usenet.

Unfortunately using a prepackaged solution may solve the immediate
problem (implementing my protocol), it defeats the purpose entirely
(which is to explore and experiment this new programming language
I found:-)

anyway, thanks for your comments and tips, I'll make an attempt
to keep all of it in mind (saved to a disk, actually) for my next
iteration
of this.

goose
(btw: I still do not know why cmucl is giving me memory leaks on
the original code. Any ideas in that direction about how lisp code
can actually leak memory would be helpful)

From: ··············@hotmail.com
Subject: Re: memory usage in cmucl
Date: Sat, 24 Jun 2006 01:04:04 +0000
Message-ID: <1151111043.968522.94040@g10g2000cwb.googlegroups.com>

goose wrote:
> ··············@hotmail.com wrote:

> > The general way a state machine gets expressed *might* be something
> > like
> >
>
> *might* ? How do you normally do it?

"might" is my disclaimer, roughly meaning, "I spent about 15 minutes on
this, didn't code an implementation, and have no idea whether this
actually is a good syntax."

For the kind of parser originally posted, I would probably code it
directly in Lisp.
For more complex lexing, I would probably find a Common Lisp library
designed for that purpose; hopefully one that would be nicely designed
and carefully coded to produce fast execution. Yet another method is to
use a TAGBODY/GO structure like one might use in Fortran. Finally, if I
had a problem that really needed elaborate state machine
implementation, I would either find a library or spend more time
polishing my design.

> > (def-state-machine my-input-parser
> >    :state-variables (input-char (line-char-count 0) result-line)
> >    :initial-state start-state
> >    :termination-state end-state
> >    :return-value result-line
> >     (start-state :action (setf input-char (read-char))
> >       :transition
> >       (cond
> >         ((digit-char-p input-char) store-digit)
> >         (t start-digit)))
> >     (store-digit :action
> >       (setf line-char-count (+ (* 10 line-char-count) (digit-char-p
> > input-char))
> >       :transition
> >       ...)))
> >
> > I.e. a state machine consists of
> >
> > 1) state variables
>
> Sure
>
> > 2) state nodes, each of which has
> >   -  an action function which gets called when the state is "entered"
>
> Not really needed; the most I'd use it for is diagnostics (which is why
> I wrote a macro to define the state).

My design intent was to separate the "work getting done" from the logic
determining the "next state".

> >   -  a "transition function" which is called and returns the next node
> > in the state machine to be executed.
>
> Really, the logic of a single state (or body). "Transition" implies
> a state change; very often you may remain in the same state.

Of course, in which case your transition function could return the
identical state. Common notation for that is an arc starting and ending
on the same node.

>
> > 4) a final state (or states?)
>
> Only one. In a state machine having a number of different
> end states makes as much sense as having a number of
> different start states.

Some might wish to indicate errors or nonsensical conditions or a
variety of conditions (e.g. a lexer recognizing distinct productions:
"integer" is an end state, "symbol" is another, etc.) with a
distinguished state. Equivalently, one could make a state variable hold
an "error indicator" or "end condition" to be checked at termination.
Or simply return a value of the appropriate Lisp type.

> >
> > The macro "def-state-machine" would create a data structure containing
> > those nodes, where the nodes probably contain *compiled* Lisp
> > representations of the action and transition functions, encapsulating
> > the state variables, and Lisp function called to create the final
> > return value, and define a corresponding class.
>
> I will have a bash at something like that, and repost.
>
> >
> >   There would almost be
Oops. "also"

> >
> > (setf *my-parser* (make-instance 'my-input-parser))
> >
> > Then, there would be a method
> >
> > (run-state-machine *my-parser*)
> > ;; runs, gathers input 5abcde
> > --> "abcde"
> >
>
> I would change this slightly to allow the caller to step
> the machine one state at a time (for all the reasons I mentioned
> in my previous post),

Yes, there could be a "single step" method, and an interface exposing
the "current state" in some useful way. (E.g., return the corresponding
symbol.)

> but other than that, I initially did something
> similar using a class to hold the entire machine; I always got
> lost when trying to reference states that were not yet declared
> so I decided to use a defconstant for each state.

One could use a hash table or tables mapping symbols to the
corresponding nodes.

> I will maybe make another attempt and repost this weekend
> if I get the time.

Enjoy.

> (btw: I still do not know why cmucl is giving me memory leaks on
> the original code. Any ideas in that direction about how lisp code
> can actually leak memory would be helpful)

I don't have great insight into CMUCL's internals, but I suspect the
use of EVAL could invoke the compiler; that might generate quite a bit
of garbage compared to CLISP's. Or the use of CONCATENATE on strings
might cause different GC behavior on the two implementations.

It probably isn't a true leak; one could try invoking a full GC to see
how much memory is actually being retained.

From: goose
Subject: Re: memory usage in cmucl
Date: Sun, 25 Jun 2006 21:43:26 +0000
Message-ID: <1151271806.607797.17750@r2g2000cwb.googlegroups.com>

··············@hotmail.com wrote:
> goose wrote:
> > ··············@hotmail.com wrote:
>

Man, this thread just gets more and more interesting :-)

> > > The general way a state machine gets expressed *might* be something
> > > like
> > >
> >
> > *might* ? How do you normally do it?
>
> "might" is my disclaimer, roughly meaning, "I spent about 15 minutes on
> this, didn't code an implementation, and have no idea whether this
> actually is a good syntax."
>

Fair enough; I kinda assumed that you had an idiomatic way
of doing this[1].

> For the kind of parser originally posted, I would probably code it
> directly in Lisp.
> For more complex lexing, I would probably find a Common Lisp library
> designed for that purpose; hopefully one that would be nicely designed
> and carefully coded to produce fast execution. Yet another method is to
> use a TAGBODY/GO structure like one might use in Fortran. Finally, if I
> had a problem that really needed elaborate state machine
> implementation, I would either find a library or spend more time
> polishing my design.

I'm really just "learning via experience"; I thought it would be a good
exercise (I did see a recent thread on cll about state-machines but
decided to see if what I could come up with was any good at all).

<snipped>

> > > I.e. a state machine consists of
> > >
> > > 1) state variables
> >
> > Sure
> >
> > > 2) state nodes, each of which has
> > >   -  an action function which gets called when the state is "entered"
> >
> > Not really needed; the most I'd use it for is diagnostics (which is why
> > I wrote a macro to define the state).
>
> My design intent was to separate the "work getting done" from the logic
> determining the "next state".

Well, I dunno ... the state-machine is, in and of itself, the logic one
tries to express. Seperating "work getting done" from logic should
be a combination of using a high-level-language and helper-functions.

<snipped>

> > > 4) a final state (or states?)
> >
> > Only one. In a state machine having a number of different
> > end states makes as much sense as having a number of
> > different start states.
>
> Some might wish to indicate errors or nonsensical conditions or a
> variety of conditions (e.g. a lexer recognizing distinct productions:
> "integer" is an end state, "symbol" is another, etc.) with a
> distinguished state. Equivalently, one could make a state variable hold
> an "error indicator" or "end condition" to be checked at termination.
> Or simply return a value of the appropriate Lisp type.
>

My normal method (in C) is to have an error state which gets
entered anytime any error is found with the error itself being
recorded in a state variable. The error state will then do
whatever processing I require and set the next state to be
end.

Have only a single definitive end-state; errors (and other
states which need to exit early) go to error-state (which on
the diagram will *always* transition to end-state).

My state diagrams are easier for me to verify if I can grab
a pencil and trace the path through the machine and
*always* end up in the end-state (my main reason for
using a state machine is *because* I can easily verify that
a complicated process *always* ends, even if it ends due to
errors).

<snipped>

>
> > but other than that, I initially did something
> > similar using a class to hold the entire machine; I always got
> > lost when trying to reference states that were not yet declared
> > so I decided to use a defconstant for each state.
>
> One could use a hash table or tables mapping symbols to the
> corresponding nodes.

I'd like to do something that enables the simple:
(defparameter my-state
   (make-state-machine
      (make-state start (...))
      (make-state state1 (...))
      ....
      (make-state stateN (...))
      (make-state end (...))))

The function that steps through each state (run-step my-state)
will return the symbol of each state (which we can test against
"end") and the function/macro (make-state-machine) will
set the state initially to start.

(Although that means that you must name the start state "start"
and the end state "end". I see no harm in this as failure to do
so will cause an error immediately, and not after the system
has been deployed).

>
> > I will maybe make another attempt and repost this weekend
> > if I get the time.
>
> Enjoy.

I intended to, but on my only fee day I took the wife to the zoo
(where humans outside the cages make funny noises, faces
and antics at the animals sitting calmly, solemnly and civilised
*inside* the cages; the irony is sadly wasted on many visitors
to the zoo :-).

>
> > (btw: I still do not know why cmucl is giving me memory leaks on
> > the original code. Any ideas in that direction about how lisp code
> > can actually leak memory would be helpful)
>
> I don't have great insight into CMUCL's internals, but I suspect the
> use of EVAL could invoke the compiler; that might generate quite a bit
> of garbage compared to CLISP's. Or the use of CONCATENATE on strings
> might cause different GC behavior on the two implementations.
>
> It probably isn't a true leak; one could try invoking a full GC to see
> how much memory is actually being retained.

My only worry is that this is not a cmucl problem, but a problem
with my code; I suspected that I was doing something not kosher
that was causing the GC to never reclaim memory.

cheers
goose

From: Wolfram Fenske
Subject: Re: memory usage in cmucl
Date: Sat, 24 Jun 2006 00:11:57 +0000
Message-ID: <1151107917.427122.21380@b68g2000cwa.googlegroups.com>

"goose" <····@webmail.co.za> writes:

> Hello all
>
> The code below implements a simple protocol (receive). The protocol
> expects frames to be preceded by the length + newline.
>
> When running under cmucl, I get the following messages (repeatedly
> with the memory size constantly increasing).
>
> ; [GC threshold exceeded with 12,009,624 bytes in use.  Commencing GC.]
> ; [GC completed with 182,952 bytes retained and 11,826,672 bytes
> freed.]
> ; [GC will next occur when at least 12,182,952 bytes are in use.]
> ; [GC threshold exceeded with 12,194,024 bytes in use.  Commencing GC.]
> ; [GC completed with 193,936 bytes retained and 12,000,088 bytes
> freed.]
> ; [GC will next occur when at least 12,193,936 bytes are in use.]
> ; [GC threshold exceeded with 12,202,704 bytes in use.  Commencing GC.]
> ; [GC completed with 239,040 bytes retained and 11,963,664 bytes
> freed.]
> ; [GC will next occur when at least 12,239,040 bytes are in use.]
>
> What am I doing wrong here? I've tried the same code with clisp
> and I've gotten no messages of this sort.

I'm quite sure you're not doing anything wrong.  What you see here is
just CMUCL's garbage collector being a little verbose.

> Is this a memory leak that I should be worried about or will the
> memory usage eventually stabilise?

The memory use will stabilize at some point.  My guess is that CMUCL
uses a generational garbage collector.  Some definitions from [1]:

    *generational hypothesis* (also known as infant mortality)

        Infant mortality or the generational hypothesis is the
        observation that, in most cases, young objects are much more
        likely to die than old objects.

    *generational garbage collection* (also known as generation
    scavenging)

        Generational garbage collection is tracing garbage collection
        that makes use of the generational hypothesis.  Objects are
        gathered together in generations.  New objects are allocated
        in the /youngest/ or /nursery/ generation, and promoted to
        /older/ generations if they survive.  Objects in older
        generations are condemned less frequently, saving CPU time.

This means that during a so-called /minor collection,/ a generational
garbage collector only collects garbage in the youngest generation,
since it's much more likely to find unused objects there than in older
generations.  To avoid memory leaks, the objects in the older
generations have to be scanned as well, of course.  This is done
during the so-called /major collections./ But because they take
longer, these are executed less frequently than the minor
collections.

What I mean to say is: memory usage will stop increasing when CMUCL's
garbage collector finally decides to execute a major collection.  It's
up to the garbage collector to decide when exactly it's time to do
that.  E. g. it could execute one for every ten minor collections or
when so-and-so-many objects have been allocated or when the CPU is
idle or when you're running out of system memory etc. [2]

> [...]

Wolfram

Footnotes:
[1]  <http://www.memorymanagement.org/glossary/g.html>

[2]  I once wrote a JAVA program that normally should have needed very
little memory but it created a lot of objects and discarded them
shortly after creation, i. e. it generated a lot of garbage.  When I
ran it one a machine with little RAM, the program used up more and
more memory and eventually died with an out-of-memory-error because
the JVM failed to collect the garbage in time.  IMO, that was a bug in
JAVA's garbage collector.

From: Rob Warnock
Subject: Re: memory usage in cmucl
Date: Sat, 24 Jun 2006 03:32:33 +0000
Message-ID: <R6qdnX5BwuzMLwHZnZ2dnUVZ_v2dnZ2d@speakeasy.net>

Wolfram Fenske <·····@gmx.net> wrote:
+---------------
| "goose" <····@webmail.co.za> writes:
| > When running under cmucl, I get the following messages (repeatedly
| > with the memory size constantly increasing).
| > ; [GC threshold exceeded with 12,009,624 bytes in use.  Commencing GC.]
| > ; [GC completed with 182,952 bytes retained and 11,826,672 bytes freed.]
| > ; [GC will next occur when at least 12,182,952 bytes are in use.]
...
| > What am I doing wrong here? I've tried the same code with clisp
| > and I've gotten no messages of this sort.
| 
| I'm quite sure you're not doing anything wrong.  What you see here is
| just CMUCL's garbage collector being a little verbose.
+---------------

Just (SETF *GC-VERBOSE* NIL) if you don't like it. Often things one
does will cons a *lot* of intermediate results that quickly becomes
garbage. A minor GC in CMUCL is quite cheap [see below], so it's common
to simply turn off those messages while you're doing exploratory coding.
Later, when you've compiled everything and added enough type declarations
that CMUCL isn't boxing all those intermediate results, you might turn
*GC-VERBOSE* back on to investigate your remaining "opportunities" for
improvement.

+---------------
| > Is this a memory leak that I should be worried about or will the
| > memory usage eventually stabilise?
| 
| The memory use will stabilize at some point.
+---------------

*Very* probably! I support a persistent application server [that sits
behind Apache] written in CMUCL, and it's been up for >75 days [since
the system last rebooted for a kernel upgrade], and the "bytes retained"
has stabilized fairly quickly [less than a dozen GCs] at ~3.5 MB, with
the "threshold exceeded" being ~15.5 MB, and now it just ticks along
doing a GC every 2-3 hours when idle[1], more often when busy. So in
any reasonable application it *will* stabilize at some point, as Wolfram
notes.

+---------------
| My guess is that CMUCL uses a generational garbage collector.
+---------------

On some platforms, yes, specifically on x86.

+---------------
| This means that during a so-called /minor collection,/ a generational
| garbage collector only collects garbage in the youngest generation,
| since it's much more likely to find unused objects there than in older
| generations.  To avoid memory leaks, the objects in the older
| generations have to be scanned as well, of course.  This is done
| during the so-called /major collections./ But because they take
| longer, these are executed less frequently than the minor
| collections.
+---------------

Exactly.

+---------------
| What I mean to say is: memory usage will stop increasing when CMUCL's
| garbage collector finally decides to execute a major collection.  It's
| up to the garbage collector to decide when exactly it's time to do that.
+---------------

Or you can force one with (GC :FULL T), per my other parallel reply.


-Rob

[1] I'm using CMUCL threads, with the scheduling interval set
    fairly low, and the MP scheduler seems to generate a low but
    continual background rate of garbage even when the application
    server is "idle". I should look into at that someday...  ;-}

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607