gethash substring

From: Frode Vatvedt Fjeld
Subject: gethash substring
Date: Wed, 21 Aug 2002 13:39:01 +0000
Message-ID: <2hbs7we3gq.fsf@vserver.cs.uit.no>

I often find that I need to look up a sub-string in a hash-table. By
"sub-string" I mean a string that starts and ends at given indexes of
another string. It goes something like this:

  (gethash (subseq <original-string> <start> <end>)
           <hash-table>)

The problem is that I can't afford to cons up a new string (as subseq
does) just for performing such look-ups.

The only solution I see to this problem, is to implement my own
hash-table. Is there a better way?

-- 
Frode Vatvedt Fjeld

Re: gethash substring Kaz Kylheku
Re: gethash substring Barry Margolin
Re: gethash substring Johannes Grødem
- Re: gethash substring Frode Vatvedt Fjeld
  - Re: gethash substring Pierpaolo BERNARDI
    - Re: gethash substring Ed L Cashin
      - Re: gethash substring Erik Naggum
        Re: gethash substring Stephen J. Bevan
        I/O in pure functional languages (Re: gethash substring) Kalle Olavi Niemitalo
        Re: I/O in pure functional languages (Re: gethash substring) Stephen J. Bevan
        Re: I/O in pure functional languages (Re: gethash substring) Kaz Kylheku
        Re: I/O in pure functional languages (Re: gethash substring) Stephen J. Bevan
        Re: I/O in pure functional languages (Re: gethash substring) Kaz Kylheku
        Re: I/O in pure functional languages (Re: gethash substring) Kent M Pitman
Re: gethash substring Pierpaolo BERNARDI
- Re: gethash substring Marco Antoniotti
- Re: gethash substring Espen Vestre

From: Kaz Kylheku
Subject: Re: gethash substring
Date: Wed, 21 Aug 2002 22:10:59 +0000
Message-ID: <cf333042.0208211410.662ce9e8@posting.google.com>

Frode Vatvedt Fjeld <······@acm.org> wrote in message news:<··············@vserver.cs.uit.no>...
> I often find that I need to look up a sub-string in a hash-table. By
> "sub-string" I mean a string that starts and ends at given indexes of
> another string. It goes something like this:
> 
>   (gethash (subseq <original-string> <start> <end>)
>            <hash-table>)
> 
> The problem is that I can't afford to cons up a new string (as subseq
> does) just for performing such look-ups.

You can create a string object which shares the actual character
data with the original string, though some space is still allocated
obviously to keep track of where the substring beings and ends.
Look at the :DISPLACED-TO keyword parameter of MAKE-ARRAY.
Can you spare that much space?

  (defparameter *string1* "abcdefg")
  (defparameter *string2* (make-array '(3)
                                      :element-type 'character
                                      :displaced-to *string1*
                                      :displaced-index-offset 1))
  *string2* ==> "bcd"

From: Barry Margolin
Subject: Re: gethash substring
Date: Wed, 21 Aug 2002 15:25:48 +0000
Message-ID: <0iO89.6$xl4.1035@paloalto-snr1.gtei.net>

In article <··············@vserver.cs.uit.no>,
Frode Vatvedt Fjeld  <······@acm.org> wrote:
>The problem is that I can't afford to cons up a new string (as subseq
>does) just for performing such look-ups.

Why not?  If the Lisp implementation has an ephemeral GC (as many do), the
cost of short-lived objects like this is pretty small.

But the other responder's suggestion to reuse the same string is a good
one, too.  You might also search around for "resources", a way to maintain
a collection of reusable objects.

-- 
Barry Margolin, ······@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Johannes Grødem
Subject: Re: gethash substring
Date: Wed, 21 Aug 2002 13:47:10 +0000
Message-ID: <lzwuqknx29.fsf@unity.copyleft.no>

* Frode Vatvedt Fjeld <······@acm.org>:

> The only solution I see to this problem, is to implement my own
> hash-table. Is there a better way?

Couldn't you just allocate an array with a fill-pointer and copy into
this and use it as the key-parameter to gethash?  Or am I missing
something here?

-- 
johs

From: Frode Vatvedt Fjeld
Subject: Re: gethash substring
Date: Wed, 21 Aug 2002 15:40:06 +0000
Message-ID: <2hwuqkcjah.fsf@vserver.cs.uit.no>

"Johannes Gr�dem" <··@kopkillah.com> writes:

> Couldn't you just allocate an array with a fill-pointer and copy
> into this and use it as the key-parameter to gethash?

Yes, I could.

> Or am I missing something here?

This function wouldn't be reentrant, which might or might not be a
problem, I suppose. I'm not a functional (programming) fanatic, but
this sort of thing feels slightly non-Common-Lispian to a Patriot such
as myself.

-- 
Frode Vatvedt Fjeld

From: Pierpaolo BERNARDI
Subject: Re: gethash substring
Date: Wed, 21 Aug 2002 16:01:40 +0000
Message-ID: <EPO89.10204$xi1.517283@news2.tin.it>

"Frode Vatvedt Fjeld" <······@acm.org> ha scritto nel messaggio ···················@vserver.cs.uit.no...

> > Or am I missing something here?
>
> This function wouldn't be reentrant, which might or might not be a
> problem,

Yes, in a MT CL you'll have to put the copy or displacing and
table access in a critical section, unless mutual exclusion is
guaranted by some other protocol.

> I suppose. I'm not a functional (programming) fanatic, but
> this sort of thing feels slightly non-Common-Lispian to a Patriot such
> as myself.

If you are using hash-tables you are not doing functional
programming anyway.


P.

From: Ed L Cashin
Subject: Re: gethash substring
Date: Sun, 25 Aug 2002 02:49:18 +0000
Message-ID: <8765xz64b5.fsf@cs.uga.edu>

"Pierpaolo BERNARDI" <··················@hotmail.com> writes:

> "Frode Vatvedt Fjeld" <······@acm.org> ha scritto nel messaggio ···················@vserver.cs.uit.no...
> 
> > > Or am I missing something here?
> >
> > This function wouldn't be reentrant, which might or might not be a
> > problem,
> 
> Yes, in a MT CL you'll have to put the copy or displacing and
> table access in a critical section, unless mutual exclusion is
> guaranted by some other protocol.
> 
> > I suppose. I'm not a functional (programming) fanatic, but
> > this sort of thing feels slightly non-Common-Lispian to a Patriot such
> > as myself.
> 
> If you are using hash-tables you are not doing functional
> programming anyway.

Why not?  Could you elaborate?  If functions can return numbers then
why not hash tables?

-- 
--Ed L Cashin            |   PGP public key:
  ·······@uga.edu        |   http://noserose.net/e/pgp/

From: Erik Naggum
Subject: Re: gethash substring
Date: Sun, 25 Aug 2002 03:04:12 +0000
Message-ID: <3239233452681995@naggum.no>

* Pierpaolo BERNARDI
| If you are using hash-tables you are not doing functional programming
| anyway.

* Ed L Cashin
| Why not?  Could you elaborate?  If functions can return numbers then
| why not hash tables?

  The hash table operations do not return a new hashtable with the specified
  changes made and retain the old value.  Quite contrary, the whole point is
  to modify the state of an existing object.  This is a side effect and is not
  thus sufficiently functional.  (When functional programming languages
  support I/O, I see little point in arguing about such things, as they have
  already proved that at least the side effect of changing the state of some
  I/O stream is allowable, but for some reason, the functional programming
  paradigm has not been shot down as fundamentally silly in its excesses.)

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

From: Stephen J. Bevan
Subject: Re: gethash substring
Date: Sun, 25 Aug 2002 05:35:17 +0000
Message-ID: <m31y8nijso.fsf@dino.dnsalias.com>

Erik Naggum <····@naggum.no> writes:
>   thus sufficiently functional.  (When functional programming languages
>   support I/O, I see little point in arguing about such things, as they have
>   already proved that at least the side effect of changing the state of some
>   I/O stream is allowable, but for some reason, the functional programming
>   paradigm has not been shot down as fundamentally silly in its excesses.)

Leaving aside whether it is fundamentally silly, Haskell and Clean
have I/O but they do not work by allowing a change of state of an I/O
stream.  Instead the type system ensures that the stream is only used
in a single threaded manner (either via monads in Haskell or unique
types in Clean) so that the "side-effect" isn't a side-effect at all,
it is a direct effect and is clear from the type of any I/O function.

From: Kalle Olavi Niemitalo
Subject: I/O in pure functional languages (Re: gethash substring)
Date: Sun, 25 Aug 2002 09:32:58 +0000
Message-ID: <87hehjgu5x.fsf_-_@Astalo.y2000.kon.iki.fi>

·······@dino.dnsalias.com (Stephen J. Bevan) writes:

> Instead the type system ensures that the stream is only used
> in a single threaded manner (either via monads in Haskell or unique
> types in Clean) so that the "side-effect" isn't a side-effect at all,
> it is a direct effect and is clear from the type of any I/O function.

Does this mean any function that writes to a log file must take
the original file as a parameter and return the modified file?
And the same with anything that calls it.

Do they have an equivalent to the Unix98 pread function, which
does not change the file offset?

From: Stephen J. Bevan
Subject: Re: I/O in pure functional languages (Re: gethash substring)
Date: Sun, 25 Aug 2002 15:29:35 +0000
Message-ID: <m3vg5zgdpr.fsf@dino.dnsalias.com>

Kalle Olavi Niemitalo <···@iki.fi> writes:
> Does this mean any function that writes to a log file must take
> the original file as a parameter and return the modified file?
> And the same with anything that calls it.

We might have different ideas about what it means to "take the
original file as a parameter" so I'll answer by showing the types of
the functions in Haskell that would be used to open a file and write a
String to the file :-

  openFile  :: FilePath -> IOMode -> IO Handle
  hPutStrLn :: Handle -> String -> IO ()

Here the IO type is the important one since this ensures that I/O is
single threaded and represented in the types so that it is not
possible to hide I/O inside a function which doesn't have IO in its
type signature (note I'm not arguing whether this is good or bad, just
noting that I/O is not a side-effect).  In Clean instead of using the
(monadic) IO type, unique types are used to ensure that functions
which take the "world" as an argument use that argument in a linear
manner thereby ensuring that I/O is not a side effect but an a direct
effect.

> Do they have an equivalent to the Unix98 pread function, which
> does not change the file offset?

Haskell does not.  You can look at the I/O library yourself at
http://www.haskell.org/onlinelibrary/io.html.  The Clean approach
using unique types is briefly described in
http://www.cs.kun.nl/~clean/CleanExtra/report20/chapter2/s23.html#s231b
An overview of the I/O system (including some screen shots of games produced
using it) are available at :-
http://www.cs.kun.nl/~clean/About_Clean/page_modified/page_modified.html
and the full gory details of the Clean I/O library are available at 
ftp://ftp.cs.kun.nl/pub/Clean/supported/ObjectIO.1.2/doc/tutorial.ps.gz

From: Kaz Kylheku
Subject: Re: I/O in pure functional languages (Re: gethash substring)
Date: Sun, 25 Aug 2002 17:44:32 +0000
Message-ID: <akb51u$f6j$1@luna.vcn.bc.ca>

In article <··············@dino.dnsalias.com>, Stephen J. Bevan wrote:
> We might have different ideas about what it means to "take the
> original file as a parameter" so I'll answer by showing the types of
> the functions in Haskell that would be used to open a file and write a
> String to the file :-
> 
>   openFile  :: FilePath -> IOMode -> IO Handle
>   hPutStrLn :: Handle -> String -> IO ()
> 
> Here the IO type is the important one since this ensures that I/O is
> single threaded and represented in the types so that it is not
> possible to hide I/O inside a function which doesn't have IO in its
> type signature (note I'm not arguing whether this is good or bad, just
> noting that I/O is not a side-effect).

If output to a file is not a side-effect, then it must necessarily work by
consing up a fresh computer that is a copy of the previous one, except that
some bits are changed on the hard drive. That way any parts of the system
that have a reference to the old computer don't notice anything. :)

From: Stephen J. Bevan
Subject: Re: I/O in pure functional languages (Re: gethash substring)
Date: Mon, 26 Aug 2002 03:25:47 +0000
Message-ID: <m3elcmgv4k.fsf@dino.dnsalias.com>

Kaz Kylheku <···@ashi.footprints.net> writes:
> If output to a file is not a side-effect, then it must necessarily work by
> consing up a fresh computer that is a copy of the previous one, except that
> some bits are changed on the hard drive. That way any parts of the system
> that have a reference to the old computer don't notice anything. :)

You are thinking too small, the Clean argument that ensures
single-threaded behaviour isn't typically called "world" for nothing :-)

From: Kaz Kylheku
Subject: Re: I/O in pure functional languages (Re: gethash substring)
Date: Tue, 27 Aug 2002 07:02:30 +0000
Message-ID: <akf865$b2m$3@luna.vcn.bc.ca>

In article <··············@dino.dnsalias.com>, Stephen J. Bevan wrote:
> Kaz Kylheku <···@ashi.footprints.net> writes:
>> If output to a file is not a side-effect, then it must necessarily work by
>> consing up a fresh computer that is a copy of the previous one, except that
>> some bits are changed on the hard drive. That way any parts of the system
>> that have a reference to the old computer don't notice anything. :)
> 
> You are thinking too small, the Clean argument that ensures
> single-threaded behaviour isn't typically called "world" for nothing :-)

Maybe quantum computing will rescue these pure programmers, assuming that the
parallel-worlds interpretation is valid.

When you change the state of the computer at all, you cons up a new reality
that proceeds in parallel with the old one.

Garbage collection can be done by collapsing wave functions.

From: Kent M Pitman
Subject: Re: I/O in pure functional languages (Re: gethash substring)
Date: Mon, 26 Aug 2002 16:14:48 +0000
Message-ID: <sfwfzx1wq9z.fsf@shell01.TheWorld.com>

Kalle Olavi Niemitalo <···@iki.fi> writes:

> ·······@dino.dnsalias.com (Stephen J. Bevan) writes:
> 
> > Instead the type system ensures that the stream is only used
> > in a single threaded manner (either via monads in Haskell or unique
> > types in Clean) so that the "side-effect" isn't a side-effect at all,
> > it is a direct effect and is clear from the type of any I/O function.
> 
> Does this mean any function that writes to a log file must take
> the original file as a parameter and return the modified file?
> And the same with anything that calls it.
> 
> Do they have an equivalent to the Unix98 pread function, which
> does not change the file offset?

No, you get back a whole new file system that has the output done.

This should have been obvious.  Think about delete-file.

;)

From: Pierpaolo BERNARDI
Subject: Re: gethash substring
Date: Wed, 21 Aug 2002 15:53:50 +0000
Message-ID: <iIO89.10168$xi1.514824@news2.tin.it>

"Frode Vatvedt Fjeld" <······@acm.org> ha scritto nel messaggio ···················@vserver.cs.uit.no...
> I often find that I need to look up a sub-string in a hash-table. By
> "sub-string" I mean a string that starts and ends at given indexes of
> another string. It goes something like this:
>
>   (gethash (subseq <original-string> <start> <end>)
>            <hash-table>)
>
> The problem is that I can't afford to cons up a new string (as subseq
> does) just for performing such look-ups.
>
> The only solution I see to this problem, is to implement my own
> hash-table. Is there a better way?

You can create _one_ displaced string and reuse
it multiple times with ADJUST-ARRAY.

(let ((substring (make-array 0 :element-type 'character
                 :displaced-to "" :displaced-index-offset 0)))
  ...

  (gethash
    (adjust-array substring (- <end> <start>)
            :displaced-to <original-string>
            :displaced-index-offset <start>)
    <hash-table>))


P.

From: Marco Antoniotti
Subject: Re: gethash substring
Date: Wed, 21 Aug 2002 18:25:27 +0000
Message-ID: <y6csn18f4rs.fsf@octagon.mrl.nyu.edu>

"Pierpaolo BERNARDI" <··················@hotmail.com> writes:

> "Frode Vatvedt Fjeld" <······@acm.org> ha scritto nel messaggio ···················@vserver.cs.uit.no...
> > I often find that I need to look up a sub-string in a hash-table. By
> > "sub-string" I mean a string that starts and ends at given indexes of
> > another string. It goes something like this:
> >
> >   (gethash (subseq <original-string> <start> <end>)
> >            <hash-table>)
> >
> > The problem is that I can't afford to cons up a new string (as subseq
> > does) just for performing such look-ups.
> >
> > The only solution I see to this problem, is to implement my own
> > hash-table. Is there a better way?
> 
> You can create _one_ displaced string and reuse
> it multiple times with ADJUST-ARRAY.
> 
> (let ((substring (make-array 0 :element-type 'character
>                  :displaced-to "" :displaced-index-offset 0)))
>   ...
> 
>   (gethash
>     (adjust-array substring (- <end> <start>)
>             :displaced-to <original-string>
>             :displaced-index-offset <start>)
>     <hash-table>))

This is a very good suggestion.  It should go in the CL Cookbook.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
715 Broadway 10th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Espen Vestre
Subject: Re: gethash substring
Date: Thu, 22 Aug 2002 07:46:08 +0000
Message-ID: <kw7kijnxof.fsf@merced.netfonds.no>

"Pierpaolo BERNARDI" <··················@hotmail.com> writes:

> You can create _one_ displaced string and reuse
> it multiple times with ADJUST-ARRAY.

Good idea! 

Even if you don't reuse it, using a displaced array for the substring
_sometimes_ pays off (If both the whole string and the substring
are fairly large, you may save some memory. You really need to do
tests to see how much, if anything, you gain with your implementation
of choice).
-- 
  (espen)