From: Vassil Nikolov
Subject: picking up a name for a data structure
Date: 
Message-ID: <l03130300b3b74babe6f3@195.138.129.93>
Hello,

I need to implement some conversions of character encodings, and
for that I'll need to implement a representation of sets of integers
as combinations of enumerations and ranges, e.g. {2, 5, 8-11} stands
for {2, 5, 8, 9, 10, 11} (2 and 5 are enumerated, and the rest are
specified as a range).

Not being a native speaker of English, I would appreciate a suggestion
for the most appropriate English word to name such a data structure.

Thanks,
Vassil.

Posted via Deja's mail-to-news gateway.


Vassil Nikolov
Permanent forwarding e-mail: ········@poboxes.com
For more: http://www.poboxes.com/vnikolov
  Abaci lignei --- programmatici ferrei.

From: Kent M Pitman
Subject: Re: picking up a name for a data structure
Date: 
Message-ID: <sfwemi5g70w.fsf@world.std.com>
Vassil Nikolov <········@poboxes.com> writes:

> Hello,
> I need to implement some conversions of character encodings, and
> for that I'll need to implement a representation of sets of integers
> as combinations of enumerations and ranges, e.g. {2, 5, 8-11} stands
> for {2, 5, 8, 9, 10, 11} (2 and 5 are enumerated, and the rest are
> specified as a range).
> 
> Not being a native speaker of English, I would appreciate a suggestion
> for the most appropriate English word to name such a data structure.

I'm not sure you're as terminologically impoverished as you think.
Probably I would pick something with a word like "range" or "enumeration"
in it.  Probably I'd call the set a range-list.  (The other base word that 
you might use is "interval".)   The notation "[3,4]" or "(3,4)" or
"[3,4)" etc. are called, as surely you already know but didn't mention 
here, closed and open and half-open (or half-closed) intervals.

You might be best off representing it as ((closed-low . open-high) ...)
so that (2 . 3) represents the lone number 2, just for simplicity and speed
of searching.  That is, so that
 ((2 . 3) (5 . 6) (8 . 12))
is the range-list you cite in your example.  Having to do typep checks
in the middle to accomodate
 (2 5 (8 . 11))
might be more trouble than it is worth.
You may also prefer to just use
 (2 3 5 6 8 12)
just depending on personal preference.  Uses the same number of conses.


Depending on how many of these ranges you do, though, you might want
to consider something faster than searching.  Just depends on how much
memory you have to trade for how much speed.  And it's worth
benchmarking because array access times, etc., might affect which is
better.  But, for example, if you're trying to represent alphabetics
and a finite number of other ranges known at compile time, and if the
total space is not excessively large, you might want to set yourself
up with a constant table and do a direct access, as in:

 (defmacro define-code-predicate (predicate &rest ranges)
   (let* ((hi (reduce #'max ranges :key #'cdr))
	  (a (make-array hi :element-type 'bit :initial-element 0)))
     (dolist (x ranges)
       (loop for i from (car x) to (cdr x)
              do (setf (aref a i) 1)))
     `(let ((table ,a))
        (defun ,predicate (code)
          (cond ((< code ,hi) (= (aref table code) 1))
		(t nil))))))

 (define-code-predicate alphabetic-code?
   (65 . 90) (97 . 122))

 (define-code-predicate alphanumeric-code?
   (48 . 57) (65 . 90) (97 . 122))

Code only very lightly tested, and then I did some cosmetic edits after
without retesting.  Hope it still works...
From: Vassil Nikolov
Subject: Re: picking up a name for a data structure
Date: 
Message-ID: <l03130305b3b853760adc@195.138.129.107>
Kent M Pitman wrote:                [1999-07-18 18:14 +0000]

  > Vassil Nikolov <········@poboxes.com> writes:
  > 
  > > Hello,
  > > I need to implement some conversions of character encodings, and
  > > for that I'll need to implement a representation of sets of integers
  > > as combinations of enumerations and ranges, e.g. {2, 5, 8-11} stands
  > > for {2, 5, 8, 9, 10, 11} (2 and 5 are enumerated, and the rest are
  > > specified as a range).
  > > 
  > > Not being a native speaker of English, I would appreciate a suggestion
  > > for the most appropriate English word to name such a data structure.
  > 
  > I'm not sure you're as terminologically impoverished as you think.

I hope so; but in this case the best I could invent was `er-set' (short
for `Enumeration & Range set') and it didn't seem the best that could
be.  Moreover, I don't expect to be the first who needs this so there
might be a name in use already.  (I also expect there are some programs
out there that do it but in this case I feel like it's less effort to do
it myself than to search for and then adapt existing stuff.)

  > Probably I would pick something with a word like "range" or "enumeration"
  > in it.  Probably I'd call the set a range-list.  (The other base word that 
  > you might use is "interval".)   The notation "[3,4]" or "(3,4)" or
  > "[3,4)" etc. are called, as surely you already know but didn't mention 
  > here, closed and open and half-open (or half-closed) intervals.

I deliberately didn't mention any such possibilities to avoid setting
people's minds in one or another direction.

  > You might be best off representing it as ((closed-low . open-high) ...)
  [e.g.]
  >  ((2 . 3) (5 . 6) (8 . 12))
  [...]

In fact, I'll stick with (2 5 (8 . 11)) because I have the requirement
that it is easy to specify and check the set manually.  In my approach,
such a set is a sort of a designator and I want to be able to enter it,
and/or check it is correct, with as little mental effort as possible.
For example, the set of the 32 `basic' Cyrillic letters on the Macintosh
(uppercase first) would be ((#x80 . #x9F) (#xE0 . #xFE) #xDF) which
is immediately checkable against e.g. a printed code table by the
human eye.  (And it is trivial to convert it internally into
((closed-low . open-high)...) by MAPCAR.)

In fact, with some code tables it's mostly ranges (as with the above
example), while in other cases the set would contain mostly
enumerations (e.g. KOI-8 where the Cyrillic letters are not ordered
alphabetically).

  > Depending on how many of these ranges you do, though, you might want
  > to consider something faster than searching.  Just depends on how much
  > memory you have to trade for how much speed.  And it's worth
  > benchmarking because array access times, etc., might affect which is
  > better.

Indeed!  Those designator sets are for humans, and internally they'll
be `compiled' into something more efficient, e.g. (AREF XLAT-TABLE CODE)
for character translation.

  [instructive code examples]

Thank you for your comments and suggestions.  In fact, it is very useful
to see a wider spectrum of approaches even if I don't actually use all
of them.

Have a nice day or night,
Vassil.

Posted via Deja's mail-to-news gateway.


Vassil Nikolov
Permanent forwarding e-mail: ········@poboxes.com
For more: http://www.poboxes.com/vnikolov
  Abaci lignei --- programmatici ferrei.
From: Kenny Tilton
Subject: Re: picking up a name for a data structure
Date: 
Message-ID: <37947708.B31ACCB8@liii.com>
>   > > Not being a native speaker of English, I would appreciate a suggestion
>   > > for the most appropriate English word to name such a data structure.

Ah, a question after my heart! I have always believed one should spend
more time on picking names than on the algorithm (having spent a lot of
time working on OPC (Other People's Code).

I would call them interval-specifications or interval-definitions, the
idea being to find a supercategory subsuming both an enumeration and a
range.

But in this case I might just call them ranges, since the option of
supplying an enumeration looks like just a programmer convenience which
need not be documented in the structure name.

ken