I've been working my way through Practical Common Lisp (PCL) and would
like some advice on how to make this function better.
Currently I'm re-writing python scripts for my Discourse Analysis
research on chat an email within a system. The log files are character
delimited, one message per line, but the last field on the line is
"dirty" with unescaped delimiters. So a typical line might look
something like this (not an actual line):
userID^username^gender^site^messageDate^world^messagetext
2706^user^m^center^2004-03-01 09:21^chatWorld^Dirty text with ^caret.
What I want is a generalizable function that allows me to do something
like this:
;;----
;;;the field list
(defparameter *field-list* '("userID"
"username"
"gender"
"site"
"messageDate"
"world"
"messageText")
"the field list for splitting and identifying fields")
(setf record (split-line #\^ line *field-list*))
(get-field "userID" record)
(get-field "gender" record)
;;---------
Because I'm dealing with different log file formats, I really want to be
able to reference field values by name rather than remember that the
message text is (nth 4 line) in one format, and (nth 6 line) in another
format.
;;---------
;;This function splits the line into count number of fields
;;tacking on the last field as a possibly "dirty" remainder.
(defun pythonic-split (split-char line count)
"split the line using split-char to produce a maximum of count fields"
(multiple-value-bind (result-list place)
;;subtract one from count so that you can pass the total number
;;of desired fields
(split-sequence:split-sequence split-char line :count (- count 1))
(append result-list `(,(subseq line place)))))
(defun split-line (split-char line labels)
"split a line into an alist of length count with labels"
;;a solution for matching fields to labels, use the length of the
;;labels list to get the count.
(pairlis labels (pythonic-split split-char line (length labels))))
(defun get-field (field record)
"get a single field from the record"
(cdr (assoc "userID" fields :test 'string=)))
;;-------
My choice of an associated list here is based on the relatively naive
impression that they are slighly better than hashes for short and
short-lived data structures. Any advice would be very much
appreciated.
--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Kirk Job Sluder <····@jobsluder.net> writes:
> I've been working my way through Practical Common Lisp (PCL) and would
> like some advice on how to make this function better.
>
> Currently I'm re-writing python scripts for my Discourse Analysis
> research on chat an email within a system. The log files are character
> delimited, one message per line, but the last field on the line is
> "dirty" with unescaped delimiters. So a typical line might look
> something like this (not an actual line):
>
> userID^username^gender^site^messageDate^world^messagetext
> 2706^user^m^center^2004-03-01 09:21^chatWorld^Dirty text with ^caret.
>
> What I want is a generalizable function that allows me to do something
> like this:
>
> ;;----
> ;;;the field list
> (defparameter *field-list* '("userID"
> "username"
> "gender"
> "site"
> "messageDate"
> "world"
> "messageText")
> "the field list for splitting and identifying fields")
>
> (setf record (split-line #\^ line *field-list*))
> (get-field "userID" record)
> (get-field "gender" record)
>
> ;;---------
I would use keywords instead of strings for field names, because it's
more efficient to EQ two symbols than to STRING= two strings.
(defparameter *field-list* '(:userid
:username
:gender
:site
:messagedate
:world
:messagetext)
"the field list for splitting and identifying fields")
(defun get-field (field record)
"get a single field from the record"
(cdr (assoc field record)))
;; or: :test (function eq), but the default (function eql) is good enough.
(get-field :userid record)
> Because I'm dealing with different log file formats, I really want to be
> able to reference field values by name rather than remember that the
> message text is (nth 4 line) in one format, and (nth 6 line) in another
> format.
Then I'd would even do without the assoc. and use defstruct which
defines accessors automatically:
(defmacro defstruct+split (name-and-options &rest fields)
(let ((name (if (consp name-and-options)
(car name-and-options)
name-and-options))
(split-char (or (second (find :split-char name-and-options
:key (lambda (item) (if (consp item)
(car item)
item))))))
(n&o (remove :split-char name-and-options
:key (lambda (item) (if (consp item)
(car item)
item)))))
`(progn (defstruct ,n&o ,@fields)
(defun ,(intern (format nil "PARSE-~A" name)) (line)
(split-line ',split-char line ,(length fields))))))
(defun split-line (split-char line count)
"split a line into a list of count elements"
(pythonic-split split-char line count))
(defstruct+split (log1 (:type list) (:split-char #\^))
userid username gender site messagedate world messagetext)
(defstruct+split (passwd (:type list) (:split-char #\:))
login password uid gid gecos home shell)
[251]> (log1-username l)
"Pascal Bourguignon"
[252]> (setf p (parse-passwd "pjb:x:1000:1000:Pascal Bourguignon:/home/pjb:/bin/bash"))
("pjb" "x" "1000" "1000" "Pascal Bourguignon" "/home/pjb" "/bin/bash")
[253]> (passwd-login p)
"pjb"
> ;;---------
> ;;This function splits the line into count number of fields
> ;;tacking on the last field as a possibly "dirty" remainder.
> (defun pythonic-split (split-char line count)
> "split the line using split-char to produce a maximum of count fields"
> (multiple-value-bind (result-list place)
> ;;subtract one from count so that you can pass the total number
> ;;of desired fields
> (split-sequence:split-sequence split-char line :count (- count 1))
> (append result-list `(,(subseq line place)))))
>
> (defun split-line (split-char line labels)
> "split a line into an alist of length count with labels"
> ;;a solution for matching fields to labels, use the length of the
> ;;labels list to get the count.
> (pairlis labels (pythonic-split split-char line (length labels))))
>
> (defun get-field (field record)
> "get a single field from the record"
> (cdr (assoc "userID" fields :test 'string=)))
>
> ;;-------
>
> My choice of an associated list here is based on the relatively naive
> impression that they are slighly better than hashes for short and
> short-lived data structures. Any advice would be very much
> appreciated.
You're right, small lists are quite efficient.
Now we have the ease of use with these structure accessors. If you
need the speed too, you could use vectors instead of lists:
(defstruct+split (passwd (:type vector) (:split-char #\:))
login password uid gid gecos home shell)
[261]> (setf p (parse-passwd "pjb:x:1000:1000:Pascal Bourguignon:/home/pjb:/bin/bash"))
#("pjb" "x" "1000" "1000" "Pascal Bourguignon" "/home/pjb" "/bin/bash")
[262]> (passwd-login p)
"pjb"
--
__Pascal Bourguignon__ http://www.informatimago.com/
I need a new toy.
Tail of black dog keeps good time.
Pounce! Good dog! Good dog!
Pascal Bourguignon <···@informatimago.com> writes:
> Kirk Job Sluder <····@jobsluder.net> writes:
Ahh, thank you very very much,
> I would use keywords instead of strings for field names, because it's
> more efficient to EQ two symbols than to STRING= two strings.
I was wondering this, but I didn't know that you could make a list of
keywords out of the context of a parameter list.
> Then I'd would even do without the assoc. and use defstruct which
> defines accessors automatically:
I was thinking along these lines, and your example code helps quite a
bit. One of my concerns with this is how would I access the structure
slots dynamically? For example, something like:
(defun do-something (fieldname)
...
(do-foo (struct-$fieldname$ baz))
...)
I suppose what I want is a function that given "foo" will produce slot
accessor function STRUCT-FOO.
>
> --
> __Pascal Bourguignon__ http://www.informatimago.com/
> I need a new toy.
> Tail of black dog keeps good time.
> Pounce! Good dog! Good dog!
--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round
Kirk Job Sluder <····@jobsluder.net> writes:
> Pascal Bourguignon <···@informatimago.com> writes:
>
>> Kirk Job Sluder <····@jobsluder.net> writes:
>
> Ahh, thank you very very much,
>
>> I would use keywords instead of strings for field names, because it's
>> more efficient to EQ two symbols than to STRING= two strings.
>
> I was wondering this, but I didn't know that you could make a list of
> keywords out of the context of a parameter list.
Keywords are symbols like other symbols (only they're in the KEYWORD
package and automatically self evaluating).
>> Then I'd would even do without the assoc. and use defstruct which
>> defines accessors automatically:
>
> I was thinking along these lines, and your example code helps quite a
> bit. One of my concerns with this is how would I access the structure
> slots dynamically? For example, something like:
>
> (defun do-something (fieldname)
> ...
> (do-foo (struct-$fieldname$ baz))
> ...)
>
> I suppose what I want is a function that given "foo" will produce slot
> accessor function STRUCT-FOO.
Well, with normal structures, this is not possible (portably).
But here we don't have real structures, we have lists or vectors.
So you can apply the normal sequence functions on them:
(let ((record (parse-log1 "100^Pascal Bourguignon^M^La Manga del Mar Menor^2005-10-12^Mine^Tralal lalere.")))
(dotimes (i (length record))
(print (elt record i))))
And, you can extend the macro at will:
(defmacro defstruct+split (name-and-options &rest fields)
(flet ((optkey (item) (if (consp item) (car item) item)))
(let ((name (if (consp name-and-options)
(car name-and-options)
name-and-options))
(split-char (or (second (find :split-char name-and-options
:key (function optkey)))))
(n&o (remove :split-char name-and-options
:key (function optkey)))
(type (second (find :type name-and-options
:key (function optkey)))))
`(progn (defstruct ,n&o ,@fields)
(defun ,(intern (format nil "PARSE-~A" name)) (line)
,(case type
((null)
(error "You must use (:type list) or (:type vector)"))
((list)
`(split-line ',split-char line ,(length fields)))
((vector)
`(coerce (split-line ',split-char line
,(length fields)) 'vector))
(otherwise
(error "You must use (:type list) or (:type vector)"))))
(defun ,(intern (format nil "~A-FIELD" name)) (record field)
(case field
,@(let ((index -1))
(mapcar (lambda (field)
`((,field) (elt record ,(incf index))))
fields))
(otherwise (error "~A is not a field of ~A"
field ',name))))
,@(let ((index -1))
(mapcar
(lambda (field)
`(defconstant ,(intern (format nil "+~A-~A+" name field))
,(incf index))) fields))
',name))))
[303]> (setf l (parse-log1 "100^Pascal Bourguignon^M^La Manga del Mar Menor^2005-10-12^Mine^Tralal lalere."))
("100" "Pascal Bourguignon" "M" "La Manga del Mar Menor" "2005-10-12" "Mine"
"Tralal lalere.")
[304]> (log1-field l 'username)
"Pascal Bourguignon"
[305]> (elt l +log1-username+)
"Pascal Bourguignon"
[306]> (setf p (parse-passwd "pjb:x:1000:1000:Pascal Bourguignon:/home/pjb:/bin/bash"))
#("pjb" "x" "1000" "1000" "Pascal Bourguignon" "/home/pjb" "/bin/bash")
[307]> (list (passwd-login p) (passwd-field p 'login) (elt p +passwd-login+))
("pjb" "pjb" "pjb")
[308]>
--
__Pascal Bourguignon__ http://www.informatimago.com/
I need a new toy.
Tail of black dog keeps good time.
Pounce! Good dog! Good dog!
Thanks for the suggestion. I don't know if I need to go there yet, but
the example code is nice.
The same script using a struct runs at 1/3rd of the time of the same
code using an alist, so it was worthwhile for building furture scripts.
--
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust." --Scary Go Round