soft cache

From: Alex Mizrahi
Subject: soft cache
Date: Fri, 11 Jan 2008 15:45:55 +0000
Message-ID: <47878f35$0$90266$14726298@news.sunsite.dk>

it's tempting to use hash-tables for caching various stuff, but if number of 
objects we are dealing with is great, we have a chance to run out of memory.

on of solutions to this problem is weak hash tables -- they do not prevent 
entries of hash tables from being garbage collect, if they are not 
referenced in other places. we can't run out of memory with such table, but 
it gives no guarantee how long entries stay in cache. if our application 
processes data on requests (web server, for example), and we do not preserve 
state in memory between requests, they are high chances that generational GC 
will delete entries cached for current request as soon as request completes, 
thus in some cases there will be no effect from cache at all!

so, for some caching purposes weak hash tables are not good at all.

Java programming language has so-called "soft references" that can be used 
for implementation of memory-sensitive -- as i understand, garbage collector 
will clear them in low-memory situation, but not just because it can. 
unfortunately, i've seen nothing like this for Common Lisp.

so i'm looking for a user-mode library for managing cache. the only 
requirement i have is low overhead -- something comparable with hash-table 
access time. i don't need some optimized strategies and automatic 
size-tuning at time, but that would be nice of course.
(if there's something implementation-dependant for SBCL that will work too).

i wasn't able to find one in internets, so it seems i'll have to write my 
own. i have two ideas:

1. ht + LRU list.

HT just points to entry in LRU singly linked list, which contains data 
itself. on hit entry is moved to the top of the list.
on miss, if destination size is reached, bottom in list is evicted from 
cache (pointer in HT deleted) and new one is added on top.

seems to be simple. probably LRU can be replaced by something more advanced 
like ARC.

2. 2 hts

we have two hash tables -- one is primary and another secondary. new entries 
are added only to primary one, but in case of miss they are searched in 
secondary. in case of hit in secondary, entry is added to primary ht and 
erase in secondary one.

when primary fills to some level, it becomes new secondary, old secondary is 
abandoned, and fresh primary is created.

(more hts can be added for advanced strategies)

it seems design with LRU list is more mainstream, so i'd better stick to it, 
it seems.

additionally, if i was able to get random access to hash table keys --  
ability to pick one random key -- i could implement soft cache in easy way: 
when target size is reached, random entries are removed when new ones are 
added to table. while it seems weird, it has some advantages -- it has 
nearly zero overhead (as long as getting random elements works fast), and 
it's less prone to pathalogic cases than LRU ones. but unfortunately CL 
standard doesn't allow random access to keys -- either i'll have to use 
implementation-dependant things, or "user-level" HT implementation (that 
could be less optimized than implementation one).

also i see a problem that cache HAS to be configured manually, it's not 
possible to get really soft behaviour without intimately interacting with GC 
:(

From: Barry Margolin
Subject: Re: soft cache
Date: Sat, 12 Jan 2008 03:30:01 +0000
Message-ID: <barmar-AE721B.22300111012008@comcast.dca.giganews.com>

In article <·························@news.sunsite.dk>,
 "Alex Mizrahi" <········@users.sourceforge.net> wrote:

> so i'm looking for a user-mode library for managing cache. the only 
> requirement i have is low overhead -- something comparable with hash-table 
> access time. i don't need some optimized strategies and automatic 
> size-tuning at time, but that would be nice of course.
> (if there's something implementation-dependant for SBCL that will work too).

Search for DEFRESOURCE.  This implements a pool of objects, and probably 
allows you to specify how big the pool should be, so that extra objects 
are removed from the pool and allowed to be GCed.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***