Programming challenge: wildcard exclusion in cartesian products

From: ·········@cox.net
Subject: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 11:14:23 +0000
Message-ID: <1142507663.796075.212430@v46g2000cwv.googlegroups.com>

The python code below generates a cartesian product subject to any
logical combination of wildcard exclusions. For example, suppose I want
to generate a cartesian product S^n, n>=3, of [a,b,c,d] that excludes
'*a*b*' and '*c*d*a*'. See below for details.

CHALLENGE: generate an equivalent in ruby, lisp, haskell, ocaml, or in
a CAS like maple or mathematica.

#-------------------------------------------------------------------------------
# Short algorithm description
# using function _genAll the program generates
# cartesian product without sets, which match
# some wildcarts
# Sets generation uses recursion ->
# first of all sets will be generated with dimension 1 and than
filtered through wildcarts
# then sets will be generated with dimension 2 and filtered again
# until the required set dimension is reached
# Program avoids explicit generation of some part of CP sets
# if the end of whildcart is asterics (*) and if the first part of
whildcart (without astrics)
# matches current set => then this set will be filtered out and won't
be used in
# higher dimension set generation
# example *,1,*,2,* [1,2] dim = 10
# by dimension 2 only arrays [1,1],[2,1],[2,2] are will be generated
# => array [1,2] won't be used in next recursion levels
#-------------------------------------------------------------------------------
# To obtaine result use function
# CPWithoutWC first parameter is a list of any elements
(char,int,string,class exemplar ,.... any type)
# secont param is CP dimension
# other parameters are wildcarts => lists with any values then may
include
# special value ALL - asterics equivalent
#Example of usage: command line
# >>> import cartesianProduct as cp
# >>> for i in cp.CPWithoutWC([1,2],3,[1,cp.ALL,2]):
#         print i
# [1, 1, 1]
# [1, 2, 1]
# [2, 1, 1]
# [2, 1, 2]
# [2, 2, 1]
# [2, 2, 2]
# >>> for i in
cp.CPWithoutWC(['a','b'],3,['a',cp.ALL,'b'],['b',cp.ALL,'a']):
#         print i
# ['a', 'a', 'a']
# ['a', 'b', 'a']
# ['b', 'a', 'b']
# ['b', 'b', 'b']
# >>> for i in cp.CPWithoutWC([1,2],3,[1,cp.ALL,2],[2,cp.ALL,1]):
#        print i
# [1, 1, 1]
# [1, 2, 1]
# [2, 1, 2]
# [2, 2, 2]
# >>>
# >>> for i in cp.CPWithoutWC([1,2],121212,[1,cp.ALL],[2,cp.ALL,1]):
#        print i
##  execute immediately
# >>>
# if You don't want to print cp. before ALL and CPWithoutWC use import
like this:
# from cartesianProduct import ALL,CPWithoutWC
# CPWithoutWC is a python generator. Which means that it returns values

# immediately and generate next in next cycle.
# Program example
#
## from cartesianProduct import ALL,CPWithoutWC
## def main():
##     for i in
cp.CPWithoutWC(['a','b'],3,['a',cp.ALL,'b'],['b',cp.ALL,'a']):
##         ## do what You want with current value
##         .........
##         ## go back to for statement and generate new
## if __name__ == "__main__":
##     main()
#
"""
 Using logical combinations of WC:
 1) It's possible to pass on to the function CPWithoutWC
   any number of wildcarts after first two parameters, for example:
   CPWithoutWC([1,2],121212,[1,cp.ALL],[2,cp.ALL,1],...)
   where ... - is any other wildcart's additional function parameters.
   Number of additional WC is not limited.
   Function will filter out all combinations, which match any passed on
WC.
   It's equal to WC1 | WC2 | .... , where | is python analog of OR
logical operations.
 2) To use more complex WC combinations follow these steps
   a) First of all create all needed WC
   b) Then use operators |, & and braces () to create combinations
required and then pass it on to function
   CPWithoutWCEx as the third parameter. Don't use "or" and "and"
python statement, otherwise program will
   work improper. First two parameters of this function are the same as
of CPWithoutWC function - set of
   elements and CP dimension. An example of what was described above in
command line:
   >>> from cartesianProduct import ALL,CPWithoutWC,CPWithoutWCEx,WC
   >>> a = WC([ALL,1,ALL])
   >>> b = WC([ALL,2,ALL])
   >>> c = a & b #filter out all sets which match a and b
   >>> for i in CPWithoutWCEx([1,2],3,c) : print i
   [1, 1, 1]
   [2, 2, 2]
   >>> # all sets where both 1 and 2 are present will be filtered out
   >>> d = a | b
   >>> for i in CPWithoutWCEx([1,2],3,d) : print i
   >>> # returns nothing
   >>> for i in CPWithoutWCEx([1,2,3],3,d) : print i
   [3, 3, 3]
   >>> a = WC([2,1,ALL])
   >>> b = WC([1,2,ALL])
   >>> c = WC([ALL,2])
   >>> d = ( a | b ) & c
   >>> for i in CPWithoutWCEx([1,2],3,d) : print i
   [1, 1, 1]
   [1, 1, 2]
   [1, 2, 1]
   [2, 1, 1]
   [2, 2, 1]
   [2, 2, 2]
   >>> # filters out all combinations which start with [1,2] or [2,1]
and end with 2

   Number of WC, which are used to form logical combinations is not
limited.
"""
"""
13.02.2006
    a)Two new function - CPWithoutWCEx_L and CPWithoutWC_L are added.
    Their interface is the same as of CPWithoutWCEx and CPWithoutWC
    accordingly, except that the third parameter is WC list and
    they accept strictly three parameters.

    As You can see these functions are very simple because
    python is quite flexible =>
    >>> def s(x,y): return x * y
    >>> d = [3,2]
    >>> s(*d) ## == s(3,2)
    6

    b)Now WC can take string as parameter, and You can use string
    as parameters of functions CPWithoutWC and CPWithoutWC_L
    instead of WC lists.
      Strings transform into WC according to these rules
      1)If first symbol in the string is
     alphanumeric (a-z or A-Z or 0-9) or '*'
      character the every character of the string will be recognized as
    a distinct set element. Examples:
       "ad*d*" == ['a','d',cp.ALL,'d',cp.ALL]
       "*A*b3*%^('" == [cp.ALL,'A',cp.ALL.'b','3',cp.ALL,'%','(',"'"]
      2)If first character is not (alphanumeric or '*')
      it will be treated as a delimitator. Examples:
       ":a:A:1:*" == ['a','A','1',cp.ALL]
       ":aA1:*"   == ['aA1',cp.ALL]
       it's not necessary to write delimitators around the asterics
      ":aA1*"     == ['aA1',cp.ALL]
      "%aA%1*"    == ['aA','1',cp.ALL]
      3)If all non delimit and non asterics character in elements
     are digits => they will be treated as numbers.Examples:
       "123*"     == [1,2,3,cp.ALL]
       ":12:3*"   == [12,3,cp.ALL]
       but
       ":12:a:3*" == ['12','a','3',cp.ALL]
      Examples of use:
>>> for i in cp.CPWithoutWC(['a','b'],3,'a*b','b*a'):
        print i
['a', 'a', 'a']
['a', 'b', 'a']
['b', 'a', 'b']
['b', 'b', 'b']
>>> for i in cp.CPWithoutWC_L(['a','b'],3,['a*b','b*a']):
        print i
['a', 'a', 'a']
['a', 'b', 'a']
['b', 'a', 'b']
['b', 'b', 'b']
#You can mixe strings and lists for wildcarts
>>> for i in cp.CPWithoutWC_L(['a','b'],3,['a*b',['b',cp.ALL,'a']]):
        print i
['a', 'a', 'a']
['a', 'b', 'a']
['b', 'a', 'b']
['b', 'b', 'b']
>>> for i in cp.CPWithoutWC_L(['abc','xyz'],3,[':abc*xyz']):
      print i
['abc', 'abc', 'abc']
['abc', 'xyz', 'abc']
['xyz', 'abc', 'abc']
['xyz', 'abc', 'xyz']
['xyz', 'xyz', 'abc']
['xyz', 'xyz', 'xyz']
"""
#-------------------------------------------------------------------------------
class ALL(object):pass
#-------------------------------------------------------------------------------
class NO_ONE(object):pass
#-------------------------------------------------------------------------------
class BFunctor(object):
  def __init__(self,func):
    self.func = func
  def __call__(self,*dt,**mp):
    return self.func(*dt,**mp)
  @classmethod
  def OR(cls,x,y):
    return cls(lambda *dt,**mp : x(*dt,**mp) | y(*dt,**mp))
  @classmethod
  def AND(cls,x,y):
    return cls(lambda *dt,**mp : x(*dt,**mp) & y(*dt,**mp))

#-----------------------------------------------------------------------------
  def __or__(self,x):
    return BFunctor.OR(self,x)

#-----------------------------------------------------------------------------
  def __and__(self,x):
    return BFunctor.AND(self,x)
#-------------------------------------------------------------------------------
def _genAll(head,n,WCF,curr):
  if len(curr) != 0 and n != 0:
    for i in curr:
      nhead = head + [i]
      if n != 1 :
        # needed dimension are not reached
        # -> we mast tell WC that some other values
        # may concatenate in the end of nhead in next recursion levels
        # but if WC is ended with asterics (ALL), than dosn't matter
        # so i use special walue NO_ONE to resolve this problem
        # if WC with final asterics like [1,2,3,ALL] are matched nhead
=>
        # they matched nhead + [NO_ONE] to
        # but if WC is like [1,ALL,2,3] => they dont match
[1,2,3,NO_ONE] =>
        # they don't prevent to generate [1,2,3,4] on next recursion
level
        x = WCF(nhead + [NO_ONE],curr)
      else :      x = WCF(nhead,curr)
      if False == x:
        if n == 1 : yield nhead
        else:
          for i in _genAll(nhead,n - 1,WCF,curr):
            yield i
  elif n == 0 :
    yield head
#-------------------------------------------------------------------------------
class WC(object):
  def __init__(self,wc):
    self.wc = wc
    self.transformWC()
    self.num_els = 0
    self.compress()
    self.comphdr = None
    self.findMaxHeader()
    self.ln = len(self.wc)

#-----------------------------------------------------------------------------
  def transformWC(self):
    if self.wc.__class__ not in (str,unicode) : return
    if len(self.wc) == 0 : return
    if self.wc[0].isalnum() or self.wc[0] == "*":
      wc = self.wc
    else:
      wc = self.wc[1:].split(self.wc[0])
    nwc = []
    for i in wc:
      if   i == '*' : nwc.append(ALL)
      elif '*' in i :
        for j in i.split('*'):
          if j : nwc.append(j)
          nwc.append(ALL)
        del nwc[-1]
      else : nwc.append(i)
    #check if all elements are numbers or *
    allnum = True
    for i in nwc:
      if i is ALL : continue
      try : int(i)
      except :
        allnum = False
        break
    if allnum:
      for i,j in enumerate(nwc):
        if j is not ALL:
          nwc[i] = int(j)
    self.wc = nwc

#-----------------------------------------------------------------------------
  def findMaxHeader(self):
    return

#-----------------------------------------------------------------------------
  def compress(self):
    "delete dublicated * values"
    if len(self.wc) == 0 : return
    wc_ = self.wc[:1]
    for i in self.wc[1:]:
      if i == ALL and i == wc_[-1] : continue
      wc_.append(i)
    self.wc = wc_

#-----------------------------------------------------------------------------
  def matchExact(self,hd,pos = 0):
    if pos == len(self.wc) : return len(hd) == 0
    if self.wc[pos] == ALL :
      if pos + 1 == len(self.wc) : return True
      vl = self.wc[pos + 1]
      cpos = -1
      while True:
        try    : cpos = hd.index(vl,cpos + 1)
        except : return False
        if self.matchExact(hd[cpos + 1:],pos + 2) : return True
    else:
      if len(hd) == 0 : return False
      if hd[0] != self.wc[pos] : return False
      return self.matchExact(hd[1:],pos + 1)

#-----------------------------------------------------------------------------
  def __or__(self,x):
    return BFunctor.OR(self,x)

#-----------------------------------------------------------------------------
  def __and__(self,x):
    return BFunctor.AND(self,x)

#-----------------------------------------------------------------------------
  def __call__(self,hd,st):
    return self.matchExact(hd)
#-------------------------------------------------------------------------------
def CPWithoutWCEx(set,n,wc):
  for i in _genAll([],n,wc,set) :
    yield i
#-------------------------------------------------------------------------------
def CPWithoutWC(set,n,*dt):
  if len(dt) == 0 :
    wc = lambda hd,st : True
  else:
    wc = WC(dt[0])
    #print wc.wc
    for i in dt[1:]:
      wc = wc | WC(i)
  for i in _genAll([],n,wc,set) :
    yield i
#-------------------------------------------------------------------------------
def CPWithoutWC_L(set,n,WCs):
  for i in CPWithoutWC(set,n,*WCs):
    yield i
#-------------------------------------------------------------------------------
def CPWithoutWCEx_L(set,n,WCs):
  for i in CPWithoutWCEx(set,n,*WCs):
    yield i
#-------------------------------------------------------------------------------
def main():
  for i in CPWithoutWC_L(['abc','xyz'],3,[':abc*xyz']):
    print i
#-------------------------------------------------------------------------------
if __name__ == "__main__" : main()
#-------------------------------------------------------------------------------

Re: Programming challenge: wildcard exclusion in cartesian products Tomasz Zielonka
- Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
- Re: Programming challenge: wildcard exclusion in cartesian products Tomasz Zielonka
- Re: Programming challenge: wildcard exclusion in cartesian products Tomasz Zielonka
  - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
    - Re: Programming challenge: wildcard exclusion in cartesian products Dr.Ruud
    - Re: Programming challenge: wildcard exclusion in cartesian products Kaz Kylheku
      - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
Re: Programming challenge: wildcard exclusion in cartesian products Wade Humeniuk
- Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
  - Re: Programming challenge: wildcard exclusion in cartesian products Wade Humeniuk
    - Re: Programming challenge: wildcard exclusion in cartesian products Wade Humeniuk
    - Re: Programming challenge: wildcard exclusion in cartesian products Geoffrey Summerhayes
  - Re: Programming challenge: wildcard exclusion in cartesian products Alan Crowe
    - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
- Re: Programming challenge: wildcard exclusion in cartesian products funkyj
  - Re: Programming challenge: wildcard exclusion in cartesian products funkyj
    - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
      - Re: Programming challenge: wildcard exclusion in cartesian products Dr.Ruud
      - Re: Programming challenge: wildcard exclusion in cartesian products Aaron Denney
  - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
Re: Programming challenge: wildcard exclusion in cartesian products sa
- Re: Programming challenge: wildcard exclusion in cartesian products Azolex
- Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
  - Re: Programming challenge: wildcard exclusion in cartesian products Joachim Durchholz
Re: Programming challenge: wildcard exclusion in cartesian products Marcin 'Qrczak' Kowalczyk
- Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
  - Re: Programming challenge: wildcard exclusion in cartesian products Dan Piponi
    - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
      - Re: Programming challenge: wildcard exclusion in cartesian products Dan Piponi
      - Re: Programming challenge: wildcard exclusion in cartesian products Tomasz Zielonka
    - Re: Programming challenge: wildcard exclusion in cartesian products Tomasz Zielonka
      - Re: Programming challenge: wildcard exclusion in cartesian products Dan Piponi
      - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
        Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
- Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
- Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
  - Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
    - Re: Programming challenge: wildcard exclusion in cartesian products Geoffrey Summerhayes
      - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
        Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
        Re: Programming challenge: wildcard exclusion in cartesian products Geoffrey Summerhayes
  - Re: Programming challenge: wildcard exclusion in cartesian products funkyj
    - Re: Programming challenge: wildcard exclusion in cartesian products Doug Quale
      - Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
        Re: Programming challenge: wildcard exclusion in cartesian products funkyj
        Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
  - Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
    - Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
Re: Programming challenge: wildcard exclusion in cartesian products Mark Tarver
Re: Programming challenge: wildcard exclusion in cartesian products Mark Carter
- Re: Programming challenge: wildcard exclusion in cartesian products Mark Tarver
  - Re: Programming challenge: wildcard exclusion in cartesian products Mark Carter
    - Re: Programming challenge: wildcard exclusion in cartesian products ·······@gmail.com
    - Re: Programming challenge: wildcard exclusion in cartesian products Mark Carter
      - Re: Programming challenge: wildcard exclusion in cartesian products QCD Apprentice
        Re: Programming challenge: wildcard exclusion in cartesian products Alexander Schmolck
        Re: Programming challenge: wildcard exclusion in cartesian products Lars Brinkhoff
      - Re: Programming challenge: wildcard exclusion in cartesian products Lars Brinkhoff
      - Re: Programming challenge: wildcard exclusion in cartesian products Alexander Schmolck
- Re: Programming challenge: wildcard exclusion in cartesian products ······@corporate-world.lisp.de
  - Re: Programming challenge: wildcard exclusion in cartesian products Alexander Schmolck
- TEA (was: Re: Programming challenge: wildcard exclusion in cartesian products) Marcus Breiing
  - Re: TEA Pascal Bourguignon
    - Re: TEA Marcus Breiing
      - Re: TEA Pascal Bourguignon
        Re: TEA Marcus Breiing
        Re: TEA Rob Warnock
        Re: TEA Pascal Bourguignon
        Re: TEA Marcus Breiing
- Re: Programming challenge: wildcard exclusion in cartesian products Pascal Bourguignon
- Re: Programming challenge: wildcard exclusion in cartesian products Christophe Rhodes
Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
- Re: Programming challenge: wildcard exclusion in cartesian products Dirk Thierbach
  - Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
    - Re: Programming challenge: wildcard exclusion in cartesian products Dirk Thierbach
  - Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
    - Re: Programming challenge: wildcard exclusion in cartesian products Dirk Thierbach
      - Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
        Re: Programming challenge: wildcard exclusion in cartesian products ·········@cox.net
        Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
        Re: Programming challenge: wildcard exclusion in cartesian products Dirk Thierbach
        Re: Programming challenge: wildcard exclusion in cartesian products Dirk Thierbach
        Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
        Re: Programming challenge: wildcard exclusion in cartesian products Dirk Thierbach
        Re: Programming challenge: wildcard exclusion in cartesian products Dinko Tenev
        Re: Programming challenge: wildcard exclusion in cartesian products Dirk Thierbach
        Re: Programming challenge: wildcard exclusion in cartesian products funkyj
        Re: Programming challenge: wildcard exclusion in cartesian products Chris F Clark
        Re: Programming challenge: wildcard exclusion in cartesian products funkyj

From: Tomasz Zielonka
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 14:37:22 +0000
Message-ID: <slrne1iu0i.604.tomasz.zielonka@localhost.localdomain>

·········@cox.net wrote:
> The python code below generates a cartesian product subject to any
> logical combination of wildcard exclusions. For example, suppose I want
> to generate a cartesian product S^n, n>=3, of [a,b,c,d] that excludes
> '*a*b*' and '*c*d*a*'. See below for details.
>
> CHALLENGE: generate an equivalent in ruby, lisp, haskell, ocaml, or in
> a CAS like maple or mathematica.

What is your goal? You want to learn or to cause a flamewar? ;-)

Anyway, I found the problem entertaining, so here you go, here is my
Haskell code. It could be shorter if I didn't care about performance and
wrote in specification style. It's not very efficient either, because it
will generate all lists matching the given patterns.

In GHCi you can test it by:

    $ ghci
    :l WildCartesian.hs
    test

I apologise for the lack of comments.

----8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<----
module WildCartesian where

import Data.Set (Set)
import qualified Data.Set as Set
import Control.Monad
import Control.Exception (assert)
import Maybe
import List

data Pat a = All | Lit a deriving Show

generateMatching :: (Ord a) => Int -> Set a -> [Pat a] -> [[a]]
generateMatching 0   _        []    = [[]]
generateMatching 0   _        (_:_) = []
generateMatching len alphabet (Lit x : ps)
    | x `Set.member` alphabet =
        [ (x : xs) | xs <- generateMatching (len - 1) alphabet ps ]
    | otherwise =
        [ ]
generateMatching len alphabet (All : ps) =
    [ (x : xs)
    | x <- Set.toList alphabet
    , xs <- unionSorted
                (generateMatching (len - 1) alphabet ps)
                (generateMatching (len - 1) alphabet (All : ps)) ]
    `unionSorted`
    generateMatching len alphabet ps
generateMatching _   _        [] = []

generateNotMatching :: (Ord a) => [a] -> Int -> [[Pat a]] -> [[a]]
generateNotMatching alphabet len patterns =
    generateMatching len alphaSet [All]
    `subtractSorted`
    foldr unionSorted []
        (map (generateMatching len alphaSet .  simplifyPat) patterns)
  where
    alphaSet = Set.fromList alphabet

simplifyPat (All : All : ps) = simplifyPat (All : ps)
simplifyPat (p : ps) = p : simplifyPat ps
simplifyPat [] = []

joinSorted :: Ord a => [a] -> [a] -> [(Maybe a, Maybe a)]
joinSorted (x1:x2:_) _ | assert (x1 < x2) False = undefined
joinSorted _ (y1:y2:_) | assert (y1 < y2) False = undefined
joinSorted (x:xs) (y:ys) =
    case x `compare` y of
        LT -> (Just x, Nothing) : joinSorted xs (y:ys)
        EQ -> (Just x, Just y)  : joinSorted xs ys
        GT -> (Nothing, Just y) : joinSorted (x:xs) ys
joinSorted (x:xs) [] = (Just x, Nothing) : joinSorted xs []
joinSorted [] (y:ys) = (Nothing, Just y) : joinSorted [] ys
joinSorted [] []     = []

unionSorted :: Ord a => [a] -> [a] -> [a]
unionSorted xs ys = catMaybes (map (uncurry mplus) (joinSorted xs ys))

subtractSorted :: Ord a => [a] -> [a] -> [a]
subtractSorted xs ys = catMaybes (map f (joinSorted xs ys))
  where
    f (Just x, Nothing) = Just x
    f _ = Nothing

test = do
    t [1,2] 3 [[Lit 1, All, Lit 2]]
    t ['a','b'] 3 [[Lit 'a', All, Lit 'b'], [Lit 'b', All, Lit 'a']]
    t [1,2] 3 [[Lit 1, All, Lit 2], [Lit 2, All, Lit 1]]
  where
    t a b c = do
        putStrLn (concat (intersperse " " ["generateMatching", show a, show b, show c]))
        mapM_ (putStrLn . ("  "++) . show) (generateNotMatching a b c)
----8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<----

Best regards
Tomasz

-- 
I am searching for programmers who are good at least in
(Haskell || ML) && (Linux || FreeBSD || math)
for work in Warsaw, Poland

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 17:35:19 +0000
Message-ID: <1142530519.761020.24960@i39g2000cwa.googlegroups.com>

The point is to submit elegant code that showcases the features of each
language. And the problem is, just to clarify, given a set WC of
wildcards in any logical combination, and if WC(S^n) is the set all s
in S^n that matches the wildcards, then efficiently generate the
complement S^n\WC(S^n). You are free to restate the problem in any
equivalent way.

From: Tomasz Zielonka
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 14:38:50 +0000
Message-ID: <slrne1iu3a.604.tomasz.zielonka@localhost.localdomain>

Tomasz Zielonka wrote:
>         putStrLn (concat (intersperse " " ["generateMatching", show a, show b, show c]))

Minor correction: it should be "generateNotMatching".

Best regards
Tomasz

-- 
I am searching for programmers who are good at least in
(Haskell || ML) && (Linux || FreeBSD || math)
for work in Warsaw, Poland

From: Tomasz Zielonka
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 14:57:31 +0000
Message-ID: <slrne1iv6b.604.tomasz.zielonka@localhost.localdomain>

Major correction (missing case):

Tomasz Zielonka wrote:
> generateMatching :: (Ord a) => Int -> Set a -> [Pat a] -> [[a]]
> generateMatching 0   _        []    = [[]]
  generateMatching 0   alphabet (All:ps) = generateMatching 0 alphabet ps
> generateMatching 0   _        (_:_) = []

Best regards
Tomasz

-- 
I am searching for programmers who are good at least in
(Haskell || ML) && (Linux || FreeBSD || math)
for work in Warsaw, Poland

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 17:23:46 +0000
Message-ID: <1142529826.007508.286420@i39g2000cwa.googlegroups.com>

Flame war? Absolutely not. My reason is to learn. There are many sites
dedicated to reasonably objective comparisons between languages. Here
are two examples:

http://www.smallscript.org/Language%20Comparison%20Chart.asp
http://www.jvoegele.com/software/langcomp.html

The wildcard exclusion problem is interesting enough to have many
distinct, elegant solutions in as many languages. It would be
interesting to see if they converge to roughly the same solution or if
there are essential differences. And your code is a crash course in
Haskell! Tossing aside the 'flame war' inquiry your code response is my
only goal. I hope many others find the problem and responses as
fascinating as I do.

From: Dr.Ruud
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 17:40:13 +0000
Message-ID: <dvcbt9.130.1@news.isolution.nl>

·········@cox.net schreef:

> There are many sites
> dedicated to reasonably objective comparisons between languages. Here
> are two examples:
> 
> http://www.smallscript.org/Language%20Comparison%20Chart.asp
> http://www.jvoegele.com/software/langcomp.html

  http://shootout.alioth.debian.org/ 

-- 
Affijn, Ruud

"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'

From: Kaz Kylheku
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 01:56:02 +0000
Message-ID: <1142646962.657648.308550@g10g2000cwb.googlegroups.com>

·········@cox.net wrote:
> The wildcard exclusion problem is interesting enough to have many
> distinct, elegant solutions in as many languages.

In that case, you should have crossposted to comp.lang.python also.

Your program looks like a dog's breakfast.

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 08:58:32 +0000
Message-ID: <1142672312.674255.165940@i39g2000cwa.googlegroups.com>

Yes, the program is quite a jumble: but it works. And I didn't post to
python newsgroup since I was limited to just 5 newsgroups and didn't
feel like doing multiple postings to multiple newsgroups.

From: Wade Humeniuk
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 18:55:02 +0000
Message-ID: <awiSf.29928$M52.21488@edtnps89>

Without much testing.  Common Lisp

Pattern exclusions are made lispy.


(defun all-lists (list length)
   (unless (zerop length)
     (if (= length 1) (mapcar #'list list)
       (loop for elt in list
             nconc
             (mapcar (lambda (rest)
                       (cons elt rest))
                     (loop for rest on list
                           nconc (all-lists rest (1- length))))))))

(defun cp-without-wc (source-list &rest patterns)
   (let* ((length (length (first patterns)))
          (all-lists (all-lists source-list length)))
     (dolist (pattern patterns)
       (setf all-lists
             (set-difference all-lists
                             (mapcar (lambda (insertion)
                                       (let ((cp (copy-list pattern)))
                                         (loop for place on cp
                                               when (eql :any (car place)) do
                                               (setf (car place) (pop insertion)))
                                         cp))
                                     (all-lists source-list (count :any pattern)))
                             :test #'equal)))
     (remove-duplicates all-lists :test #'equal)))

CL-USER 22 > (cp-without-wc '(a b) '(a :any b) '(b :any a))
((A A A) (A B A) (B A B) (B B B))

CL-USER 23 > (cp-without-wc '(abc xyz) '(abc :any xyz))
((XYZ XYZ XYZ) (XYZ XYZ ABC) (XYZ ABC XYZ) (XYZ ABC ABC) (ABC XYZ ABC) (ABC ABC ABC))

CL-USER 24 > (cp-without-wc '(a b) '(a :any :any))
((B B B) (B B A) (B A B) (B A A))

CL-USER 25 > (cp-without-wc '(a b) '(a :any :any) '(b :any :any))
NIL

CL-USER 26 > (cp-without-wc '(a b) '(:any :any b))
((B B A) (B A A) (A B A) (A A A))

CL-USER 27 >

Wade

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 19:20:42 +0000
Message-ID: <1142536842.634000.326560@j52g2000cwj.googlegroups.com>

What I have in mind is the efficient, <enumerated> generation of the
complement S^n/WC(S^n). A good program should initialize, generate, and
terminate.

T=cartprodex(S,n,WC); //initialize
for all i in T do
  what you want with i
  test to see if any more
  terminate if not

and it should do this without explicitly generating WC and then
complementing. For example, if the cardinality of S is m, and the WC is
just '*a*b*', with a != b, then EX(S^n):=S^n\WC(S^n) has cardinality
(m-1)^(n-1)*(m+n-1). Specifically, if m=5 and n=10, then |EX|=3670016
while |S^10|=9765625, so that |EX|/|S^10| is about 0.3758. In general
the program should directly generate EX from arbitrary WC. Of course,
in practice the WC should themselves occur in a logically consistent
manner, but let's just assume they're a given.

From: Wade Humeniuk
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 02:01:17 +0000
Message-ID: <NLoSf.2808$Zf3.941@clgrps12>

·········@cox.net wrote:
> What I have in mind is the efficient, <enumerated> generation of the
> complement S^n/WC(S^n). A good program should initialize, generate, and
> terminate.
> 
> T=cartprodex(S,n,WC); //initialize
> for all i in T do
>   what you want with i
>   test to see if any more
>   terminate if not
> 
> and it should do this without explicitly generating WC and then
> complementing. For example, if the cardinality of S is m, and the WC is
> just '*a*b*', with a != b, then EX(S^n):=S^n\WC(S^n) has cardinality
> (m-1)^(n-1)*(m+n-1). Specifically, if m=5 and n=10, then |EX|=3670016
> while |S^10|=9765625, so that |EX|/|S^10| is about 0.3758. In general
> the program should directly generate EX from arbitrary WC. Of course,
> in practice the WC should themselves occur in a logically consistent
> manner, but let's just assume they're a given.
> 

Another attempt.  I have made no special attempt to create an
exclusion language, just used an anonymous lambda predicate.


;; Wade Humeniuk

(defclass odometer ()
   ((base :initform 0 :accessor base)
    (meter :initform nil :accessor meter)
    (n-digits :initarg :n-digits :accessor n-digits)
    (digit-set :initarg :digit-set :accessor digit-set)))

(defmethod initialize-instance :after ((obj odometer) &rest initargs)
   (setf (base obj) (length (digit-set obj))
         (meter obj) (make-array (n-digits obj) :initial-element 0)
         (digit-set obj) (coerce (digit-set obj) 'vector)))

(defun inc-odometer (odometer)
   (loop with carry = 1
         for i from (1- (n-digits odometer)) downto 0
         for digit = (incf (aref (meter odometer) i) carry)
         if (= digit (base odometer)) do
           (setf (aref (meter odometer) i) 0)
           (setf carry 1)
         else do
           (setf carry 0)
         while (not (zerop carry))))

(defun zero-meter-p (odometer)
   (every #'zerop (meter odometer)))

(defmethod next-set ((obj odometer))
   (prog1 (map 'list (lambda (digit)
                       (aref (digit-set obj) digit))
               (meter obj))
     (inc-odometer obj)))

(defclass cs-with-wc (odometer)
   ((exclusion :initarg :exclusion :accessor exclusion)
    (at-end :initform nil :accessor at-end)))

(defmethod next-set ((obj odometer))
   (tagbody
    :next
    (unless (at-end obj)
      (let ((set (call-next-method)))
        (when (zero-meter-p obj) (setf (at-end obj) t))
        (if (not (funcall (exclusion obj) set))
            (return-from next-set set)
          (go :next))))))

(defun print-all-cs (set length exclusion)
   (let ((cs-with-wc (make-instance 'cs-with-wc :n-digits length :digit-set set
                                    :exclusion exclusion)))
     (loop for set = (next-set cs-with-wc)
           while set do (print set))))

CL-USER 134 > (cs-with-wc '(a b) 3 (lambda (set)
                                      (destructuring-bind (x y z)
                                          set
                                        (or (and (eql x 'a) (eql z 'b))
                                            (and (eql x 'b) (eql z 'a))))))

(A A A)
(A B A)
(B A B)
(B B B)
NIL

CL-USER 135 > (cs-with-wc '(a b) 3 (lambda (set)
                                      (eql (second set) 'a)))

(A B A)
(A B B)
(B B A)
(B B B)
NIL

CL-USER 136 > (cs-with-wc '(abc xyz) 3 (lambda (set)
                                          (and (eql (first set) 'abc)
                                               (eql (third set) 'xyz))))

(ABC ABC ABC)
(ABC XYZ ABC)
(XYZ ABC ABC)
(XYZ ABC XYZ)
(XYZ XYZ ABC)
(XYZ XYZ XYZ)
NIL

From: Wade Humeniuk
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 02:05:59 +0000
Message-ID: <bQoSf.25526$Cp4.8167@edtnps90>

Oops, problems cutting an pasting, should be,

;; Wade Humeniuk

(defclass odometer ()
   ((base :initform 0 :accessor base)
    (meter :initform nil :accessor meter)
    (n-digits :initarg :n-digits :accessor n-digits)
    (digit-set :initarg :digit-set :accessor digit-set)))

(defmethod initialize-instance :after ((obj odometer) &rest initargs)
   (setf (base obj) (length (digit-set obj))
         (meter obj) (make-array (n-digits obj) :initial-element 0)
         (digit-set obj) (coerce (digit-set obj) 'vector)))

(defun inc-odometer (odometer)
   (loop with carry = 1
         for i from (1- (n-digits odometer)) downto 0
         for digit = (incf (aref (meter odometer) i) carry)
         if (= digit (base odometer)) do
           (setf (aref (meter odometer) i) 0)
           (setf carry 1)
         else do
           (setf carry 0)
         while (not (zerop carry))))

(defun zero-meter-p (odometer)
   (every #'zerop (meter odometer)))

(defmethod next-set ((obj odometer))
   (prog1 (map 'list (lambda (digit)
                       (aref (digit-set obj) digit))
               (meter obj))
     (inc-odometer obj)))

(defclass cs-with-wc (odometer)
   ((exclusion :initarg :exclusion :accessor exclusion)
    (at-end :initform nil :accessor at-end)))

(defmethod next-set ((obj cs-with-wc))
   (tagbody
    :next
    (unless (at-end obj)
      (let ((set (call-next-method)))
        (when (zero-meter-p obj) (setf (at-end obj) t))
        (if (not (funcall (exclusion obj) set))
            (return-from next-set set)
          (go :next))))))

(defun print-all-cs (set length exclusion)
   (let ((cs-with-wc (make-instance 'cs-with-wc :n-digits length :digit-set set
                                    :exclusion exclusion)))
     (loop for set = (next-set cs-with-wc)
           while set do (print set))))

CL-USER 7 > (print-all-cs '(a b) 3 (lambda (set)
                                      (destructuring-bind (x y z)
                                          set
                                        (or (and (eql x 'a) (eql z 'b))
                                            (and (eql x 'b) (eql z 'a))))))

(A A A)
(A B A)
(B A B)
(B B B)
NIL

CL-USER 8 > (print-all-cs '(abc xyz) 3 (lambda (set)
                                          (and (eql (first set) 'abc)
                                               (eql (third set) 'xyz))))

(ABC ABC ABC)
(ABC XYZ ABC)
(XYZ ABC ABC)
(XYZ ABC XYZ)
(XYZ XYZ ABC)
(XYZ XYZ XYZ)
NIL

CL-USER 9 >

From: Geoffrey Summerhayes
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 17:51:38 +0000
Message-ID: <1142617898.901635.133020@u72g2000cwu.googlegroups.com>

Wade Humeniuk wrote:
> ·········@cox.net wrote:
> > What I have in mind is the efficient, <enumerated> generation of the
> > complement S^n/WC(S^n). A good program should initialize, generate, and
> > terminate.
> >
> > T=cartprodex(S,n,WC); //initialize
> > for all i in T do
> >   what you want with i
> >   test to see if any more
> >   terminate if not
> >
> > and it should do this without explicitly generating WC and then
> > complementing. For example, if the cardinality of S is m, and the WC is
> > just '*a*b*', with a != b, then EX(S^n):=S^n\WC(S^n) has cardinality
> > (m-1)^(n-1)*(m+n-1). Specifically, if m=5 and n=10, then |EX|=3670016
> > while |S^10|=9765625, so that |EX|/|S^10| is about 0.3758. In general
> > the program should directly generate EX from arbitrary WC. Of course,
> > in practice the WC should themselves occur in a logically consistent
> > manner, but let's just assume they're a given.
> >
>
> Another attempt.  I have made no special attempt to create an
> exclusion language, just used an anonymous lambda predicate.
>

FWIW, here's my Q-and-D pattern matcher (only partially tested).

(defun match(list pattern &optional (test #'eql))
  "Match a list of atoms against a pattern list
using :all as a 0-to-many wildcard, :single as a
1-to-1 wildcard, a list of elements or a single
element to match a specific place. Optional
argument test for comparing elements (default eql).

Returns: T if match is made, NIL otherwise.

Examples: (match '(0 1 2 3 4 5) '(:all (2 3) 3 :single 5 :all)) => T
          (match '(0 1 2 3 4 5) '(:all (2 3) 3 :single 5 :single)) =>
NIL"
  (let ((current (first pattern))
        (next-pattern (rest pattern))
        (candidate (first list)))
    (cond ((and (null pattern) (null list))
           t)
          ((and (eq :single current) candidate)
           (match (rest list) next-pattern test))
          ((eq :all current)
           (loop for new-list on list
                 when (match new-list next-pattern test)
                 do (return-from match t))
           (null next-pattern)) ; last case null remainder
          ((if(atom current)
               (funcall test candidate current)
             (member candidate current :test test))
           (match (rest list) next-pattern test)))))

--
Geoff

From: Alan Crowe
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 18:10:27 +0000
Message-ID: <86veudhrho.fsf@cawtech.freeserve.co.uk>

··········@cox.net" <·········@cox.net> writes:

> What I have in mind is the efficient, <enumerated> generation of the
> complement S^n/WC(S^n). A good program should initialize, generate, and
> terminate.
> 
> T=cartprodex(S,n,WC); //initialize
> for all i in T do
>   what you want with i
>   test to see if any more
>   terminate if not
> 
> and it should do this without explicitly generating WC and then
> complementing. For example, if the cardinality of S is m, and the WC is
> just '*a*b*', with a != b, then EX(S^n):=S^n\WC(S^n) has cardinality
> (m-1)^(n-1)*(m+n-1). Specifically, if m=5 and n=10, then |EX|=3670016
> while |S^10|=9765625, so that |EX|/|S^10| is about 0.3758. In general
> the program should directly generate EX from arbitrary WC. Of course,
> in practice the WC should themselves occur in a logically consistent
> manner, but let's just assume they're a given.

The following code doesn't build a data structure. It
recurses n deep and those n stack frames contain state that
tracks progress through the product. But it still generates
and tests, taking time proportional to S^n. Is this what you
had in mind?

;; A macro to let you loop over the S^n possibilities
;; without building a big data structure
;; The macro is structured as syntactic sugar on
;; a higher order function
;;
(defmacro cartesian-power ((variable set exponent) &body code)
  (let ((body-function (gensym)))
    `(flet ((,body-function(,variable),@code))
      (cartesian-power-hof ,set ,exponent '() (function ,body-function)))))

;; The standard idea of using recursion to implement a nest
;; of loops of indefinite depth
;;
(defun cartesian-power-hof (set exponent prefix f)
  (if (zerop exponent)
      (funcall f prefix)
      (dolist (item set)
        (cartesian-power-hof set
                             (- exponent 1)
                             (cons item prefix)
                             f))))

;; A simple recursive pattern match
;; I haven't thought through the implications
;; I guess that it is exponentially slow on some long
;; patterns
;;
(defun wild-match (pattern data)
  (cond ((endp pattern) (endp data))
        ((endp data) (or (endp pattern)
                         (equal pattern '(:wild))))
        ((eql (car pattern) :wild)
         (or (null (cdr pattern))
             (wild-match (cdr pattern)
                         data)
             (wild-match pattern
                         (cdr data))))
        ('literal-pattern
         (and (eql (car pattern)
                   (car data))
              (wild-match (cdr pattern)
                          (cdr data))))))

;; close over a data item to get a function 
;; suitable for checking several patterns
(defun match-data (data)
  (lambda(pattern)
    (wild-match pattern data)))

;; Use the macro and the utilities to count how many are not excluded
(defun count-remainder (set exponent &rest exclusions)
  (let ((count 0))
    (cartesian-power (item set exponent)
      (when (notany (match-data item) exclusions)
        (incf count)))
    count))

CL-USER> (loop for i from 3 to 10 
               do (format t "~&~4D~10D" i
                          (count-remainder '(a b c d e) i '(:wild a :wild b :wild))))
   3       112
   4       512
   5      2304
   6     10240
   7     45056
   8    196608
   9    851968
  10   3670016

I can see how a pattern such as (a b :wild) would knock out
an element from each of the first two sets so reducing the
task from m^n to (m-1)^2 * m^(n-2). 

Also (:wild a :wild) knocks it down from m^n to (m-1)^n

However I can only see the exploitation of (:wild a :wild b
:wild) as a hairy special case. Pick one of n places for the
first a. Pick earlier elements from the set excluding a,
pick later elements from the set excluding b. Add in all the
items with a missing altogether (so b can be used freely).
I cannot see what algorithm exploits the constraints more
generally. Is there a standard technique, page nn of Knuth
or the like? Is that what you are actually wanting to see
coded?

Alan Crowe
Edinburgh
Scotland

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 17:53:11 +0000
Message-ID: <1142704391.498414.256250@i40g2000cwc.googlegroups.com>

The cardinality of excluding '*a*b*' from S^n should be
(m-1)^(n-1)*(m+n-1), where m=|S|. For m=5 this becomes 4^(n-1)*(n+4),
and your table fits this formula. As far as generating and testing, an
'ideal' solution would be to 'start from the ground up', as in
excluding length 2 wc, and then length 3, etc, until all wc's have been
excluded. The 'ideal' solution would intrinsically exclude wc's and not
test against a background generation of all of S^n. Does that make
sense?

From: funkyj
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 21:26:13 +0000
Message-ID: <1142544373.164023.50140@z34g2000cwc.googlegroups.com>

here is my version of the same.

REPL output:

CL-USER> (tests)


set      = (1 2)
n        = 3
patterns = ((1 ANY 2))
-----------------------
(1 1 1)
(1 2 1)
(2 1 1)
(2 1 2)
(2 2 1)
(2 2 2)


set      = (A B)
n        = 3
patterns = ((A ANY B) (B ANY A))
-----------------------
(A A A)
(A B A)
(B A B)
(B B B)


set      = (1 2)
n        = 3
patterns = ((1 ANY 2) (2 ANY 1))
-----------------------
(1 1 1)
(1 2 1)
(2 1 2)
(2 2 2)
NIL
CL-USER>

source:

;;;; cartesian products minus wildcard patterns per:
;;;;
;;;; >Newsgroups: comp.lang.lisp, etc...
;;;; >Subject: Programming challenge: wildcard exclusion in cartesian
products
;;;; >Date: 16 Mar 2006 03:14:23 -0800
;;;;
;;;;

(defun show-me (x) (format t "~A~%" x))

(defun set^n (fn set n &optional acc)
  "call `fn' on each permutation of `set' raised to the `n' power"
  (if (<= n 0)
      (funcall fn (reverse acc))
      (dolist (e set)
        (set^n fn set (- n 1) (cons e acc)))))

;; test set^n by printing and visually inspecting the result
(defun pr-set^n (set n)   (set^n #'show-me set n))

;; curry `set^n' so that `fn' is the only parameter
(defun set^n-gen (set n)
  (lambda (fn) (set^n fn set n)))

(defun mk-matchl-p (pat-list)
  "return a function that tests a value against the patterns in
`pat-list'"
  (labels ((matchp (pat val)
             (cond ((null pat) t)
                   ((or (eq (car pat) (car val))
                        (eq (car pat) :any))
                    (matchp (cdr pat) (cdr val))))))
    (lambda (val)
      "predicate: return true if val matches any pattern in `pat-list'"
      (dolist (p pat-list)
        (if (matchp p val)
            (return t))))))

(defun not-fp (f-pred)
  "return the complement of predicate `f-pred'"
  (lambda (x) (not (funcall f-pred x))))

;; f-gen is a generator of the form returned by set^n-gen
(defun accumulate-if (f-gen f-pred)
  "accumulate values generated by f-gen that satisfy f-pred"
  (let (acc)
    (funcall f-gen (lambda (x) (if (funcall f-pred x) (push x acc))))
    (nreverse acc)))

;; `pr-set^n-withoutWC' is the lisp equivalent (more or less) of
;; python code:
;;   >>> for i in cp.CPWithoutWC(x,y,z): print i
(defun pr-set^n-withoutWC (set n pat-list)
  (format t "~%~%set      = ~A~%n        = ~A~%patterns = ~A~%~A~%"
          set n pat-list "-----------------------")
  (dolist (e (accumulate-if (set^n-gen set n)
                            (not-fp (mk-matchl-p pat-list))))
    (format t "~A~%" e)))

(defun tests ()
  "generate test output per the original problem examples"
  (pr-set^n-withoutWC '(1 2) 3 '((1 :any 2)))
  (pr-set^n-withoutWC '(a b) 3 '((a :any b) (b :any a)))
  (pr-set^n-withoutWC '(1 2) 3 '((1 :any 2) (2 :any 1))))

From: funkyj
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 21:40:15 +0000
Message-ID: <1142545215.632905.284520@j33g2000cwa.googlegroups.com>

NOTE: I am a lisp newbie.  I'm sure our resident lisp experts can
create much better (both faster, shorter and clearer) solutions than
the one above.

Even I could have created something shorter but I thought it would be
fun to apply the "utility function" approach in decomposing the
problem.

  --jfc

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 10:32:10 +0000
Message-ID: <1143109930.768224.160910@j33g2000cwa.googlegroups.com>

Hello,

The solution that would have the most utility would be one where the
elements are generated one-by-one, loop-like, so that they can be used
in the body of a loop, and to avoid the fact that even with exclusion
the cardinality of the target set EX^n could be in the millions even
with a full list of wc's, that is, a list containing at least one wc of
every length in 2..(n-1). I don't know enough Lisp, Haskell or
Qi/Prolog to know if the solutions so far can be modified to do this.
The Python program is too slow for large sets.

Walter Kehowski

From: Dr.Ruud
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 13:14:17 +0000
Message-ID: <dvuash.1e4.1@news.isolution.nl>

·········@cox.net schreef:

> The solution that would have the most utility would be one where the
> elements are generated one-by-one, loop-like, so that they can be used
> in the body of a loop, and to avoid the fact that even with exclusion
> the cardinality of the target set EX^n could be in the millions even
> with a full list of wc's, that is, a list containing at least one wc
> of every length in 2..(n-1). I don't know enough Lisp, Haskell or
> Qi/Prolog to know if the solutions so far can be modified to do this.
> The Python program is too slow for large sets.

Use a bitmapping, see also
  ··················@agate.berkeley.edu

Detect the exclusions with a bitwise AND.

-- 
Affijn, Ruud

"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'

From: Aaron Denney
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 17:28:56 +0000
Message-ID: <slrne25mob.kgd.wnoise@ofb.net>

On 2006-03-23, ·········@cox.net <·········@cox.net> wrote:
> The solution that would have the most utility would be one where the
> elements are generated one-by-one, loop-like, so that they can be used
> in the body of a loop, and to avoid the fact that even with exclusion
> the cardinality of the target set EX^n could be in the millions even
> with a full list of wc's, that is, a list containing at least one wc of
> every length in 2..(n-1). I don't know enough Lisp, Haskell or
> Qi/Prolog to know if the solutions so far can be modified to do this.
> The Python program is too slow for large sets.

In Haskell you can get this essentially for free, due to its laziness.

-- 
Aaron Denney
-><-

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 18:10:10 +0000
Message-ID: <1142705410.116668.33430@i39g2000cwa.googlegroups.com>

Nice! How to put it in a loop? I'm totally a newbie to Lisp myself,
just gettng into Graham and Touretzky. Let's create a problem. Suppose
after excluding I want to know if the digits sum to 12, say, like maybe
they're part of a partition.  S={0,..6}, S^5, excluding "*1*5*" and
"1*2*3*", say. How would I do that?

From: sa
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 22:41:14 +0000
Message-ID: <eQlSf.13353$S25.10554@newsread1.news.atl.earthlink.net>

in k:

cp:{[c;n;p]+(n#c)_vs(!_ c^n)_dvl,/{2_sv+(,/,/:\:)/(),··@[x;&x=-1;:[;!c]]}'p}

examples:

  cp[2;3;,0 -1 1]
(0 0 0
 0 1 0
 1 0 0
 1 0 1
 1 1 0
 1 1 1)

  cp[2;3;(0 -1 1;1 -1 0)]
(0 0 0
 0 1 0
 1 0 1
 1 1 1)

  cp[2;3;(0 -1 1;1 -1 1)]
(0 0 0
 0 1 0
 1 0 0
 1 1 0)

arguments of cp:

c = cardinality of the input set
n = power
p = list of patterns (-1 = wildcard)

the algorithm directly computes the target set.  in other words,
it does not generate the set, then filter the matches from the
target.

modifying cp to accept s instead of the cardinality of s,
patterns expressed in terms of elements of s, &c. adds nothing
of interest to the problem.

<·········@cox.net> wrote in message ·····························@v46g2000cwv.googlegroups.com...
> The python code below generates a cartesian product subject to any
> logical combination of wildcard exclusions. For example, suppose I want
> to generate a cartesian product S^n, n>=3, of [a,b,c,d] that excludes
> '*a*b*' and '*c*d*a*'. See below for details.
>
> CHALLENGE: generate an equivalent in ruby, lisp, haskell, ocaml, or in
> a CAS like maple or mathematica.
>
> #-------------------------------------------------------------------------------
> # Short algorithm description
> # using function _genAll the program generates
> # cartesian product without sets, which match
> # some wildcarts
> # Sets generation uses recursion ->
> # first of all sets will be generated with dimension 1 and than
> filtered through wildcarts
> # then sets will be generated with dimension 2 and filtered again
> # until the required set dimension is reached
> # Program avoids explicit generation of some part of CP sets
> # if the end of whildcart is asterics (*) and if the first part of
> whildcart (without astrics)
> # matches current set => then this set will be filtered out and won't
> be used in
> # higher dimension set generation
> # example *,1,*,2,* [1,2] dim = 10
> # by dimension 2 only arrays [1,1],[2,1],[2,2] are will be generated
> # => array [1,2] won't be used in next recursion levels
> #-------------------------------------------------------------------------------
> # To obtaine result use function
> # CPWithoutWC first parameter is a list of any elements
> (char,int,string,class exemplar ,.... any type)
> # secont param is CP dimension
> # other parameters are wildcarts => lists with any values then may
> include
> # special value ALL - asterics equivalent
> #Example of usage: command line
> # >>> import cartesianProduct as cp
> # >>> for i in cp.CPWithoutWC([1,2],3,[1,cp.ALL,2]):
> #         print i
> # [1, 1, 1]
> # [1, 2, 1]
> # [2, 1, 1]
> # [2, 1, 2]
> # [2, 2, 1]
> # [2, 2, 2]
> # >>> for i in
> cp.CPWithoutWC(['a','b'],3,['a',cp.ALL,'b'],['b',cp.ALL,'a']):
> #         print i
> # ['a', 'a', 'a']
> # ['a', 'b', 'a']
> # ['b', 'a', 'b']
> # ['b', 'b', 'b']
> # >>> for i in cp.CPWithoutWC([1,2],3,[1,cp.ALL,2],[2,cp.ALL,1]):
> #        print i
> # [1, 1, 1]
> # [1, 2, 1]
> # [2, 1, 2]
> # [2, 2, 2]
> # >>>
> # >>> for i in cp.CPWithoutWC([1,2],121212,[1,cp.ALL],[2,cp.ALL,1]):
> #        print i
> ##  execute immediately
> # >>>
> # if You don't want to print cp. before ALL and CPWithoutWC use import
> like this:
> # from cartesianProduct import ALL,CPWithoutWC
> # CPWithoutWC is a python generator. Which means that it returns values
>
> # immediately and generate next in next cycle.
> # Program example
> #
> ## from cartesianProduct import ALL,CPWithoutWC
> ## def main():
> ##     for i in
> cp.CPWithoutWC(['a','b'],3,['a',cp.ALL,'b'],['b',cp.ALL,'a']):
> ##         ## do what You want with current value
> ##         .........
> ##         ## go back to for statement and generate new
> ## if __name__ == "__main__":
> ##     main()
> #
> """
>  Using logical combinations of WC:
>  1) It's possible to pass on to the function CPWithoutWC
>    any number of wildcarts after first two parameters, for example:
>    CPWithoutWC([1,2],121212,[1,cp.ALL],[2,cp.ALL,1],...)
>    where ... - is any other wildcart's additional function parameters.
>    Number of additional WC is not limited.
>    Function will filter out all combinations, which match any passed on
> WC.
>    It's equal to WC1 | WC2 | .... , where | is python analog of OR
> logical operations.
>  2) To use more complex WC combinations follow these steps
>    a) First of all create all needed WC
>    b) Then use operators |, & and braces () to create combinations
> required and then pass it on to function
>    CPWithoutWCEx as the third parameter. Don't use "or" and "and"
> python statement, otherwise program will
>    work improper. First two parameters of this function are the same as
> of CPWithoutWC function - set of
>    elements and CP dimension. An example of what was described above in
> command line:
>    >>> from cartesianProduct import ALL,CPWithoutWC,CPWithoutWCEx,WC
>    >>> a = WC([ALL,1,ALL])
>    >>> b = WC([ALL,2,ALL])
>    >>> c = a & b #filter out all sets which match a and b
>    >>> for i in CPWithoutWCEx([1,2],3,c) : print i
>    [1, 1, 1]
>    [2, 2, 2]
>    >>> # all sets where both 1 and 2 are present will be filtered out
>    >>> d = a | b
>    >>> for i in CPWithoutWCEx([1,2],3,d) : print i
>    >>> # returns nothing
>    >>> for i in CPWithoutWCEx([1,2,3],3,d) : print i
>    [3, 3, 3]
>    >>> a = WC([2,1,ALL])
>    >>> b = WC([1,2,ALL])
>    >>> c = WC([ALL,2])
>    >>> d = ( a | b ) & c
>    >>> for i in CPWithoutWCEx([1,2],3,d) : print i
>    [1, 1, 1]
>    [1, 1, 2]
>    [1, 2, 1]
>    [2, 1, 1]
>    [2, 2, 1]
>    [2, 2, 2]
>    >>> # filters out all combinations which start with [1,2] or [2,1]
> and end with 2
>
>    Number of WC, which are used to form logical combinations is not
> limited.
> """
> """
> 13.02.2006
>     a)Two new function - CPWithoutWCEx_L and CPWithoutWC_L are added.
>     Their interface is the same as of CPWithoutWCEx and CPWithoutWC
>     accordingly, except that the third parameter is WC list and
>     they accept strictly three parameters.
>
>     As You can see these functions are very simple because
>     python is quite flexible =>
>     >>> def s(x,y): return x * y
>     >>> d = [3,2]
>     >>> s(*d) ## == s(3,2)
>     6
>
>     b)Now WC can take string as parameter, and You can use string
>     as parameters of functions CPWithoutWC and CPWithoutWC_L
>     instead of WC lists.
>       Strings transform into WC according to these rules
>       1)If first symbol in the string is
>      alphanumeric (a-z or A-Z or 0-9) or '*'
>       character the every character of the string will be recognized as
>     a distinct set element. Examples:
>        "ad*d*" == ['a','d',cp.ALL,'d',cp.ALL]
>        "*A*b3*%^('" == [cp.ALL,'A',cp.ALL.'b','3',cp.ALL,'%','(',"'"]
>       2)If first character is not (alphanumeric or '*')
>       it will be treated as a delimitator. Examples:
>        ":a:A:1:*" == ['a','A','1',cp.ALL]
>        ":aA1:*"   == ['aA1',cp.ALL]
>        it's not necessary to write delimitators around the asterics
>       ":aA1*"     == ['aA1',cp.ALL]
>       "%aA%1*"    == ['aA','1',cp.ALL]
>       3)If all non delimit and non asterics character in elements
>      are digits => they will be treated as numbers.Examples:
>        "123*"     == [1,2,3,cp.ALL]
>        ":12:3*"   == [12,3,cp.ALL]
>        but
>        ":12:a:3*" == ['12','a','3',cp.ALL]
>       Examples of use:
> >>> for i in cp.CPWithoutWC(['a','b'],3,'a*b','b*a'):
>         print i
> ['a', 'a', 'a']
> ['a', 'b', 'a']
> ['b', 'a', 'b']
> ['b', 'b', 'b']
> >>> for i in cp.CPWithoutWC_L(['a','b'],3,['a*b','b*a']):
>         print i
> ['a', 'a', 'a']
> ['a', 'b', 'a']
> ['b', 'a', 'b']
> ['b', 'b', 'b']
> #You can mixe strings and lists for wildcarts
> >>> for i in cp.CPWithoutWC_L(['a','b'],3,['a*b',['b',cp.ALL,'a']]):
>         print i
> ['a', 'a', 'a']
> ['a', 'b', 'a']
> ['b', 'a', 'b']
> ['b', 'b', 'b']
> >>> for i in cp.CPWithoutWC_L(['abc','xyz'],3,[':abc*xyz']):
>       print i
> ['abc', 'abc', 'abc']
> ['abc', 'xyz', 'abc']
> ['xyz', 'abc', 'abc']
> ['xyz', 'abc', 'xyz']
> ['xyz', 'xyz', 'abc']
> ['xyz', 'xyz', 'xyz']
> """
> #-------------------------------------------------------------------------------
> class ALL(object):pass
> #-------------------------------------------------------------------------------
> class NO_ONE(object):pass
> #-------------------------------------------------------------------------------
> class BFunctor(object):
>   def __init__(self,func):
>     self.func = func
>   def __call__(self,*dt,**mp):
>     return self.func(*dt,**mp)
>   @classmethod
>   def OR(cls,x,y):
>     return cls(lambda *dt,**mp : x(*dt,**mp) | y(*dt,**mp))
>   @classmethod
>   def AND(cls,x,y):
>     return cls(lambda *dt,**mp : x(*dt,**mp) & y(*dt,**mp))
>
> #-----------------------------------------------------------------------------
>   def __or__(self,x):
>     return BFunctor.OR(self,x)
>
> #-----------------------------------------------------------------------------
>   def __and__(self,x):
>     return BFunctor.AND(self,x)
> #-------------------------------------------------------------------------------
> def _genAll(head,n,WCF,curr):
>   if len(curr) != 0 and n != 0:
>     for i in curr:
>       nhead = head + [i]
>       if n != 1 :
>         # needed dimension are not reached
>         # -> we mast tell WC that some other values
>         # may concatenate in the end of nhead in next recursion levels
>         # but if WC is ended with asterics (ALL), than dosn't matter
>         # so i use special walue NO_ONE to resolve this problem
>         # if WC with final asterics like [1,2,3,ALL] are matched nhead
> =>
>         # they matched nhead + [NO_ONE] to
>         # but if WC is like [1,ALL,2,3] => they dont match
> [1,2,3,NO_ONE] =>
>         # they don't prevent to generate [1,2,3,4] on next recursion
> level
>         x = WCF(nhead + [NO_ONE],curr)
>       else :      x = WCF(nhead,curr)
>       if False == x:
>         if n == 1 : yield nhead
>         else:
>           for i in _genAll(nhead,n - 1,WCF,curr):
>             yield i
>   elif n == 0 :
>     yield head
> #-------------------------------------------------------------------------------
> class WC(object):
>   def __init__(self,wc):
>     self.wc = wc
>     self.transformWC()
>     self.num_els = 0
>     self.compress()
>     self.comphdr = None
>     self.findMaxHeader()
>     self.ln = len(self.wc)
>
> #-----------------------------------------------------------------------------
>   def transformWC(self):
>     if self.wc.__class__ not in (str,unicode) : return
>     if len(self.wc) == 0 : return
>     if self.wc[0].isalnum() or self.wc[0] == "*":
>       wc = self.wc
>     else:
>       wc = self.wc[1:].split(self.wc[0])
>     nwc = []
>     for i in wc:
>       if   i == '*' : nwc.append(ALL)
>       elif '*' in i :
>         for j in i.split('*'):
>           if j : nwc.append(j)
>           nwc.append(ALL)
>         del nwc[-1]
>       else : nwc.append(i)
>     #check if all elements are numbers or *
>     allnum = True
>     for i in nwc:
>       if i is ALL : continue
>       try : int(i)
>       except :
>         allnum = False
>         break
>     if allnum:
>       for i,j in enumerate(nwc):
>         if j is not ALL:
>           nwc[i] = int(j)
>     self.wc = nwc
>
> #-----------------------------------------------------------------------------
>   def findMaxHeader(self):
>     return
>
> #-----------------------------------------------------------------------------
>   def compress(self):
>     "delete dublicated * values"
>     if len(self.wc) == 0 : return
>     wc_ = self.wc[:1]
>     for i in self.wc[1:]:
>       if i == ALL and i == wc_[-1] : continue
>       wc_.append(i)
>     self.wc = wc_
>
> #-----------------------------------------------------------------------------
>   def matchExact(self,hd,pos = 0):
>     if pos == len(self.wc) : return len(hd) == 0
>     if self.wc[pos] == ALL :
>       if pos + 1 == len(self.wc) : return True
>       vl = self.wc[pos + 1]
>       cpos = -1
>       while True:
>         try    : cpos = hd.index(vl,cpos + 1)
>         except : return False
>         if self.matchExact(hd[cpos + 1:],pos + 2) : return True
>     else:
>       if len(hd) == 0 : return False
>       if hd[0] != self.wc[pos] : return False
>       return self.matchExact(hd[1:],pos + 1)
>
> #-----------------------------------------------------------------------------
>   def __or__(self,x):
>     return BFunctor.OR(self,x)
>
> #-----------------------------------------------------------------------------
>   def __and__(self,x):
>     return BFunctor.AND(self,x)
>
> #-----------------------------------------------------------------------------
>   def __call__(self,hd,st):
>     return self.matchExact(hd)
> #-------------------------------------------------------------------------------
> def CPWithoutWCEx(set,n,wc):
>   for i in _genAll([],n,wc,set) :
>     yield i
> #-------------------------------------------------------------------------------
> def CPWithoutWC(set,n,*dt):
>   if len(dt) == 0 :
>     wc = lambda hd,st : True
>   else:
>     wc = WC(dt[0])
>     #print wc.wc
>     for i in dt[1:]:
>       wc = wc | WC(i)
>   for i in _genAll([],n,wc,set) :
>     yield i
> #-------------------------------------------------------------------------------
> def CPWithoutWC_L(set,n,WCs):
>   for i in CPWithoutWC(set,n,*WCs):
>     yield i
> #-------------------------------------------------------------------------------
> def CPWithoutWCEx_L(set,n,WCs):
>   for i in CPWithoutWCEx(set,n,*WCs):
>     yield i
> #-------------------------------------------------------------------------------
> def main():
>   for i in CPWithoutWC_L(['abc','xyz'],3,[':abc*xyz']):
>     print i
> #-------------------------------------------------------------------------------
> if __name__ == "__main__" : main()
> #-------------------------------------------------------------------------------
>

From: Azolex
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 16:53:10 +0000
Message-ID: <441ae97e$1_1@news.bluewin.ch>

sa wrote:
> in k:
> 
> cp:{[c;n;p]+(n#c)_vs(!_ c^n)_dvl,/{2_sv+(,/,/:\:)/(),··@[x;&x=-1;:[;!c]]}'p}

That one goes a long way as a proof of eg evolution theory, you know, 
monkeys reproducing shakespeare with a typewriter k-board and all that :)

> 
> examples:
> 
>   cp[2;3;,0 -1 1]
> (0 0 0
>  0 1 0
>  1 0 0
>  1 0 1
>  1 1 0
>  1 1 1)
> 
>   cp[2;3;(0 -1 1;1 -1 0)]
> (0 0 0
>  0 1 0
>  1 0 1
>  1 1 1)
> 
>   cp[2;3;(0 -1 1;1 -1 1)]
> (0 0 0
>  0 1 0
>  1 0 0
>  1 1 0)
> 
> arguments of cp:
> 
> c = cardinality of the input set
> n = power
> p = list of patterns (-1 = wildcard)
> 
> the algorithm directly computes the target set.  in other words,
> it does not generate the set, then filter the matches from the
> target.
> 
> modifying cp to accept s instead of the cardinality of s,
> patterns expressed in terms of elements of s, &c. adds nothing
> of interest to the problem.

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 15:21:01 +0000
Message-ID: <1142695261.018806.58220@u72g2000cwu.googlegroups.com>

Here is an nice intro to K:

http://www.kuro5hin.org/?op=displaystory;sid=2002/11/14/22741/791

  "This is where K starts to set itself from apart from most of the
common programming languages in use today. You rarely write loops in K
(KDB is 100% loop-free), instead you use adverbs. An adverb modifies a
function, returning another function, changing the ways it operates
over its arguments and what it does with it's return values."

How about an interactive loop-like version? Generating the target set
is good for baby test cases but not if the cardinality of the target is
large. Does that make the problem more intersesting?

From: Joachim Durchholz
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 19:23:55 +0000
Message-ID: <dvhmoa$h15$1@online.de>

·········@cox.net schrieb:
>   "This is where K starts to set itself from apart from most of the
> common programming languages in use today. You rarely write loops in K
> (KDB is 100% loop-free), instead you use adverbs. An adverb modifies a
> function, returning another function, changing the ways it operates
> over its arguments and what it does with it's return values."

Doesn't sound too different from what closures do. Or lazy parameter 
passing.
<rant> I'm not sure whether the K designer actually fits that 
description, but there are too many language designers around 
reinventing the wheel, arguing whether it should have seven, eight or 
thirteen sides... </rant>

Regards,
Jo

From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 16 Mar 2006 23:21:25 +0000
Message-ID: <8764medlhm.fsf@qrnik.zagroda>

··········@cox.net" <·········@cox.net> writes:

> The python code below generates a cartesian product subject to any
> logical combination of wildcard exclusions. For example, suppose I want
> to generate a cartesian product S^n, n>=3, of [a,b,c,d] that excludes
> '*a*b*' and '*c*d*a*'. See below for details.

I'm afraid that different programs in this thread has understood the
asterisk differently: that it matches any single element, or that it
matches any sequence of elements.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 00:45:58 +0000
Message-ID: <1142556358.098924.59180@e56g2000cwe.googlegroups.com>

The asterisk '*' matches any sequence of elements, not just one
element. The wildcard '*1*2*' would then correspond to a tuple with a 1
preceding a 2 in any positions. The wc '1*2' would correspond to a
starting 1 and an ending 2 with anything in between. The wc *12* would
correspond to a 1 adjacent to a 2 with the pair in any position.
Possibilities like '*a*a*b*'  and '*a*a*a*' of any length are also
allowed. If n is the dimension, then any n-tuple wc is just a point. My
apologies for the confusion.

From: Dan Piponi
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 01:46:20 +0000
Message-ID: <1142559980.525680.282250@u72g2000cwu.googlegroups.com>

Is this Haskell implementation what you want? It does the wildcard
matching through a state machine and it essentially threads the
state machine through the cartesian product, switching to the
ordinary cartesian product when possible as an optimisation.
The execution of the state machine is shared by strings with the
same prefix making it reasonably efficient even though the state
machine itself isn't optimised.

If it doesn't work, I'm sure it's only a few typos away...

-- generate strings of length n from alphabet l such that
-- the state machine, with transition function t, is not on
-- a final state (determined by function f) at the
-- end of the string.
-- If the state is ever 'unmatchable' (as determined by u)
-- we just return the cartesian product as no rejection
-- can take place.
generate f u t s 0 l = if f s then [] else [[]]
generate f u t s n l | u s = sequence (replicate n l)
                     | otherwise =
                     [a:b | a <- l, let s' = t s a,
                        b <- generate f u t s' (n-1) l]

-- The states are lists of regular expressions
-- where [a,b,..] means match a or b or...

-- This is the transition function for our machine.
transition pat a = pat >>= d a where
    -- Brzozowski derivative
    d a [] = []
    d a ·@('*':pat) = p:d a pat
    d a (p:pat) | a==p = [pat]
                | otherwise = []

-- A terminal state is one that matches the null string
terminal p = or $ map terminal' p where
    terminal' "" = True
    terminal' ('*':p) = terminal' p
    terminal' _ = False

run n alphabet pat =
        generate terminal null transition [pat] n alphabet

test = run 3 "abc" "aa*a"

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 02:18:10 +0000
Message-ID: <1142561890.772349.17260@z34g2000cwc.googlegroups.com>

-- The states are lists of regular expressions
-- where [a,b,..] means match a or b or...

I haven't run or studied your program yet myself but what I had in mind
was that the list of wc's are *all* to be excluded, so the list
[wc1..wcn] is to correspond generating all tuples matching not(wc1 and
.. and wcn).  Maybe you're already doing that. The wc's themselves
could be logical statements among the 'primitive' wc's.  That's why I
named it the 'wildcard exclusion problem'. It's a lot easier to specify
a list of simpler wc's than create a long logical expression.

Thanks to all who are making this an interesting thread.

From: Dan Piponi
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 15:34:56 +0000
Message-ID: <1142609696.447347.226510@v46g2000cwv.googlegroups.com>

·········@cox.net said:
> I haven't run or studied your program yet myself but what I had in mind
> was that the list of wc's are *all* to be excluded

I think it already does what you want. You might want to change

run n alphabet pat =
        generate terminal null transition [pat] n alphabet to

to

run n alphabet pat =
        generate terminal null transition pat n alphabet

to allow lists of patterns where the patterns in a list are ORed
together.

From: Tomasz Zielonka
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 10:14:42 +0000
Message-ID: <slrne1l301.5on.tomasz.zielonka@localhost.localdomain>

·········@cox.net wrote:
> -- The states are lists of regular expressions
> -- where [a,b,..] means match a or b or...
>
> I haven't run or studied your program yet myself but what I had in mind
> was that the list of wc's are *all* to be excluded, so the list
> [wc1..wcn] is to correspond generating all tuples matching not(wc1 and
> .. and wcn).  Maybe you're already doing that. The wc's themselves
> could be logical statements among the 'primitive' wc's.  That's why I
> named it the 'wildcard exclusion problem'. It's a lot easier to specify
> a list of simpler wc's than create a long logical expression.

I missed "any logical combination" :-(

It would be quite easy to fix my first program, but I don't have the
time to do it right now.

Best regards
Tomasz

-- 
I am searching for programmers who are good at least in
(Haskell || ML) && (Linux || FreeBSD || math)
for work in Warsaw, Poland

From: Tomasz Zielonka
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 10:09:31 +0000
Message-ID: <slrne1l2m9.5on.tomasz.zielonka@localhost.localdomain>

Dan Piponi wrote:
> Is this Haskell implementation what you want? It does the wildcard
> matching through a state machine and it essentially threads the
> state machine through the cartesian product, switching to the
> ordinary cartesian product when possible as an optimisation.
> The execution of the state machine is shared by strings with the
> same prefix making it reasonably efficient even though the state
> machine itself isn't optimised.

I've implemented the same concept yesterday evening:

----8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<----
module WildCartesian where

import List

data Pat a = All | Lit a deriving (Show, Eq)

advancePattern :: Eq a => a -> [Pat a] -> [[Pat a]]
advancePattern y (Lit x : ps)
    | x == y    = [ps]
    | otherwise = []
advancePattern y (All : ps) = [All : ps] ++ [ps] ++ advancePattern y ps
advancePattern _ [] = []

generateNotMatching :: Eq a => [a] -> Int -> [[Pat a]] -> [[a]]
generateNotMatching alphabet = gen []
  where
    gen _   n pats
        | any (\ps -> all (== All) ps && (not (null ps) || n == 0)) pats = []
    gen acc 0 _
        = [reverse acc]
    gen acc n pats
        = [ w | x <- alphabet
              , let pats' = [ p' | p <- pats, p' <- advancePattern x p ]
              , w <- gen (x : acc) (n - 1) pats' ]

test :: IO ()
test = do
    t [1,2] 3 [[Lit 1, All, Lit 2]]
    t ['a','b'] 3 [[Lit 'a', All, Lit 'b'], [Lit 'b', All, Lit 'a']]
    t [1,2] 3 [[Lit 1, All, Lit 2], [Lit 2, All, Lit 1]]
  where
    t a b c = do
        putStrLn (concat (intersperse " " ["generateNotMatching", show a, show b, show c]))
        mapM_ (putStrLn . ("  "++) . show) (generateNotMatching a b c)
----8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<----

Best regards
Tomasz

-- 
I am searching for programmers who are good at least in
(Haskell || ML) && (Linux || FreeBSD || math)
for work in Warsaw, Poland

From: Dan Piponi
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 17:13:35 +0000
Message-ID: <1142615615.488491.210180@z34g2000cwc.googlegroups.com>

Tomasz Zielonka said:

> I've implemented the same concept yesterday evening...

It's uncanny reading such similar code coming from another person!

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 16:08:06 +0000
Message-ID: <1142698086.042262.161710@z34g2000cwc.googlegroups.com>

When I run this I get through ghc I get

C:\Documents and Settings\User\My Documents\wildcard>ghc
"./wc-zielonka.hs"
compilation IS NOT required
C:/Languages/ghc/ghc-6.4.1/libHSrts.a(Main.o)(.text+0x1d):Main.c:
undefined refe
rence to `__stginit_ZCMain'
C:/Languages/ghc/ghc-6.4.1/libHSrts.a(Main.o)(.text+0x43):Main.c:
undefined refe
rence to `ZCMain_main_closure'
collect2: ld returned 1 exit status

Unless there's a command line option I'm missing?

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sun, 19 Mar 2006 05:22:42 +0000
Message-ID: <1142745762.520932.275840@u72g2000cwu.googlegroups.com>

OK, a bad case of RTFM. I saved your file as WildCartesian.hs and then

1) command line: ghci WildCartesian.hs
2) Get some loading messages
3) command line: test

and it works! But how do I compile it to get a program with command
line arguments? I'm looking through Daume's tutorial right now.

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 10:05:45 +0000
Message-ID: <1142589945.078364.32870@i39g2000cwa.googlegroups.com>

Heh, here's a Prolog version:

==========================================================

gen( _, 0, [] ) :- !.
gen( S, N, [H | T] ) :- member( H, S ), M is N - 1, gen( S, M, T ).

==========================================================

Yep, that's it :)))

Here's how to test it:

==========================================================

1 ?- gen([a, b, c], 3, X), print(X), nl, fail.
[a, a, a]
[a, a, b]
[a, a, c]
[a, b, a]
[a, b, b]
[a, b, c]
[a, c, a]
[a, c, b]
[a, c, c]
[b, a, a]
[b, a, b]
[b, a, c]
[b, b, a]
[b, b, b]
[b, b, c]
[b, c, a]
[b, c, b]
[b, c, c]
[c, a, a]
[c, a, b]
[c, a, c]
[c, b, a]
[c, b, b]
[c, b, c]
[c, c, a]
[c, c, b]
[c, c, c]

No
2 ?- gen([a, b, c], 3, X), not(member(X, [[a, _, _], [_, b, _], [_, _,
c]])), print(X), nl, fail.
[b, a, a]
[b, a, b]
[b, c, a]
[b, c, b]
[c, a, a]
[c, a, b]
[c, c, a]
[c, c, b]

No

==========================================================

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 11:28:49 +0000
Message-ID: <1142594929.172231.133060@p10g2000cwp.googlegroups.com>

You may be interested in, or already know about

http://www.lambdassociates.org/
http://www.lambdassociates.org/aboutqi.htm
http://www.lambdassociates.org/webbook/contents.htm
http://www.lambdassociates.org/prolog.htm

Let me know what you think.

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 12:48:05 +0000
Message-ID: <1142599685.186209.177630@j33g2000cwa.googlegroups.com>

It would seem that your program is just filtering the full cartesian
product, right? The solution I'm looking for generates the elements
one-by-one so that it could be used in a loop.

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 17 Mar 2006 16:16:10 +0000
Message-ID: <1142612170.404220.23410@e56g2000cwe.googlegroups.com>

·········@cox.net wrote:
> It would seem that your program is just filtering the full cartesian
> product, right? The solution I'm looking for generates the elements
> one-by-one so that it could be used in a loop.

Oops...missed that part.

It took me a while to study the exchange on this topic more thoroughly,
and I now fully appreciate the fact that the problem calls for a much
more sophisticated approach.

Sorry for the hasty shot, I'll give it another shortly.

Cheers,

Dinko

From: Geoffrey Summerhayes
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 19:32:34 +0000
Message-ID: <ZrDTf.2479$qX6.56423@news20.bellglobal.com>

"Dinko Tenev" <···········@gmail.com> wrote in message 
····························@e56g2000cwe.googlegroups.com...
> ·········@cox.net wrote:
>> It would seem that your program is just filtering the full cartesian
>> product, right? The solution I'm looking for generates the elements
>> one-by-one so that it could be used in a loop.
>
> Oops...missed that part.
>
> It took me a while to study the exchange on this topic more thoroughly,
> and I now fully appreciate the fact that the problem calls for a much
> more sophisticated approach.
>
> Sorry for the hasty shot, I'll give it another shortly.

I wouldn't worry about it, Prolog generated the elements one-by-one.
The loop was the print,nl,fail line. Just beefing it up a bit, I
didn't take the time to clean it up though. :-)

gen(_,0,[]).
gen(S,N,[H|T]):- N > 0, N1 is N - 1, member(H,S), gen(S,N1,T).

filter([],[]).
filter([X|T],[X|T1]):- filter(T,T1).
filter([*|T],L):- filter(T,L).
filter([*|T],[_|T1]):- filter([*|T],T1).

filter_list(L,[[and|T]|_]):- filter_and(L,T), !.
filter_list(L,[[or|T]|_]):- filter_list(L,T), !.
filter_list(L,[H|_]):- H \= [and|_], H \= [or|_], filter(H,L),!.
filter_list(L,[H|T]):- H \= [and|_], H \= [or|_], filter_list(L,T).

filter_and(_,[]) :- !.
filter_and(L,[H|T]):- filter_list(L,[H]), filter_and(L,T).

generate_member(X,S,N,[]):-gen(S,N,X).
generate_member(X,S,N,[H|T]):-gen(S,N,X),\+ filter_list(X,[H|T]).

1 ?- generate_member(X,[a,b],3,[[a,*,b],[b,*,a]]).

X = [a, a, a] ;

X = [a, b, a] ;

X = [b, a, b] ;

X = [b, b, b] ;

No
2 ?- generate_member(X,[1,2],3,[[and, [*,2], [or, [2,1,*], [1,2,*]]]]).

X = [1, 1, 1] ;

X = [1, 1, 2] ;

X = [1, 2, 1] ;

X = [2, 1, 1] ;

X = [2, 2, 1] ;

X = [2, 2, 2] ;

No

---
Geoff

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Tue, 21 Mar 2006 07:14:19 +0000
Message-ID: <1142925259.468795.104500@t31g2000cwb.googlegroups.com>

After the basic fact of generating the exclusion - a considerable
achievement - the program should be interactive. What if the target set
has thousands or millions of elements? There should be a  loop-like way
('do' in Haskell, for example) to peel off the elements one-by-one and
then terminate.

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Tue, 21 Mar 2006 09:51:17 +0000
Message-ID: <1142934677.567529.272740@i40g2000cwc.googlegroups.com>

·········@cox.net wrote:
> After the basic fact of generating the exclusion - a considerable
> achievement - the program should be interactive. What if the target set
> has thousands or millions of elements? There should be a  loop-like way
> ('do' in Haskell, for example) to peel off the elements one-by-one and
> then terminate.

Um..."interactivity" is a bit tricky in Prolog ;)

As Geoffrey pointed out in his posting, the solutions are generated
effectively one at a time.  Following is a typical example of using the
generator:

    generate_member( X, ... ), do_something_with( X ), fail.

The underlying semantics is, roughly, 1) bind X, 2) do_something_with(
X ), 3) fail, meaning reject this binding of X and backtrack.  In this
particular case, backtracking is tantamount to going back to 1).

You can regard ( generate_member( X, ... ) ... fail ) as the equivalent
of a loop construct, and do_something_with( X ) as the loop body.  At
any given time that the goal is being evaluated, there is only one
binding for X in effect.

On a side note, Haskell's "do" notation isn't really about loops.  If
you're referring to Tomasz's code, it's rather mapM_ that can sort of
be thought of as looping through the list of values returned by
generateNotMatching :)

Cheers,

Dinko

From: Geoffrey Summerhayes
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Tue, 21 Mar 2006 19:02:43 +0000
Message-ID: <E5YTf.4833$ji6.274516@news20.bellglobal.com>

<·········@cox.net> wrote in message 
·····························@t31g2000cwb.googlegroups.com...
> After the basic fact of generating the exclusion - a considerable
> achievement - the program should be interactive. What if the target set
> has thousands or millions of elements? There should be a  loop-like way
> ('do' in Haskell, for example) to peel off the elements one-by-one and
> then terminate.

There is...(Q&D)

1 ?- generate_member(X,[1,2],3,[[and, [*,2], [or, [2,1,*], [1,2,*]]]]),
     write(X),nl,write('Is this the term you were looking for? (y/n):'),
     get(Y), ((Y is 121) -> true; fail). % 121 = 'y'

[1, 1, 1]
Is this the term you were looking for? (y/n):n
[1, 1, 2]
Is this the term you were looking for? (y/n):|: n
[1, 2, 1]
Is this the term you were looking for? (y/n):|: y

Yes
2 ?-

---
Geoff

From: funkyj
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 02:12:57 +0000
Message-ID: <1142647977.696295.124650@i40g2000cwc.googlegroups.com>

·········@cox.net wrote:
> It would seem that your program is just filtering the full cartesian
> product, right? The solution I'm looking for generates the elements
> one-by-one so that it could be used in a loop.

One advantage of a generator over filtering the full product is that I,
as the user of the generator, am not obligated to iterate over the
entire solution space.

Are there other _practical_ advantages of generators over mapping &
filtering complete sets?

From: Doug Quale
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 18 Mar 2006 07:05:00 +0000
Message-ID: <87u09w9qsj.fsf@shiva.mad.wi.charter.com>

"funkyj" <······@gmail.com> writes:

> One advantage of a generator over filtering the full product is that I,
> as the user of the generator, am not obligated to iterate over the
> entire solution space.
> 
> Are there other _practical_ advantages of generators over mapping &
> filtering complete sets?

Storage.  You can iterate over problem spaces much too large to fit in
memory.  Also generate + iterate can be faster because of reduced
memory pressure.

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 09:38:18 +0000
Message-ID: <1142847498.862462.57340@i40g2000cwc.googlegroups.com>

Doug Quale wrote:
> "funkyj" <······@gmail.com> writes:
>
> > One advantage of a generator over filtering the full product is that I,
> > as the user of the generator, am not obligated to iterate over the
> > entire solution space.
> >
> > Are there other _practical_ advantages of generators over mapping &
> > filtering complete sets?
>
> Storage.  You can iterate over problem spaces much too large to fit in
> memory.  Also generate + iterate can be faster because of reduced
> memory pressure.

Hmmm...storage is not an issue in the Prolog version.  It generates a
candidate solution, then checks membership in the wildcard set, then
backtracks (backtracking is caused by "fail" in the test goal.)  On
backtracking, it effectively "forgets" the last solution, so the memory
is freed up (or at least free to be reclaimed through GC.)

Cheers,

Dinko

From: funkyj
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Tue, 21 Mar 2006 23:51:34 +0000
Message-ID: <1142985094.180466.306460@i40g2000cwc.googlegroups.com>

Dinko Tenev wrote:
> Doug Quale wrote:

> Hmmm...storage is not an issue in the Prolog version.  It generates a
> candidate solution, then checks membership in the wildcard set, then
> backtracks (backtracking is caused by "fail" in the test goal.)  On
> backtracking, it effectively "forgets" the last solution, so the memory
> is freed up (or at least free to be reclaimed through GC.)

How about the other iterator characteristics?

when there is a huge solution space can I ask the prolog version to
give me the first 1000 solutions?  The next 1000 solutions (i.e. 1000 -
1999)?

If you can do that then it would appear that generators have no
advantage over your prolog solution.

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 10:24:12 +0000
Message-ID: <1143023052.390444.32380@u72g2000cwu.googlegroups.com>

funkyj wrote:
> How about the other iterator characteristics?
>
> when there is a huge solution space can I ask the prolog version to
> give me the first 1000 solutions?

Geoffrey's post above offers one way to do this from within a REPL.

Within a program, as soon as you accept a solution, you're potentially
out of the "loop" (it is possible to use cut (!) to make this certain.)

>  The next 1000 solutions (i.e. 1000 - 1999)?

I am not quite sure how exactly you propose to do this in e.g. Python,
but if you mean to enumerate and skip over the first 1000 and then
continue with the rest, yes, this is also possible.

> If you can do that then it would appear that generators have no
> advantage over your prolog solution.

To be fair, the Python implementation has one significant advantage --
it offers reduced runtime complexity for quite a number of practical
cases (of course, this improvement is achievable in Prolog too,) though
I'm still inclined to think that it can be made to run in exponential
time.

Also, my Prolog version is non-compliant in that it treats a wildcard
as matching a single position, rather than matching any sequence.
Geoffrey's implementation does the right thing though.

Cheers,

Dinko

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 10:46:05 +0000
Message-ID: <1142851565.678070.11970@v46g2000cwv.googlegroups.com>

·········@cox.net wrote:
> It would seem that your program is just filtering the full cartesian
> product, right? The solution I'm looking for generates the elements
> one-by-one so that it could be used in a loop.

OK, having read some of the comments so far, I have the feeling that I
may be missing the point in more than one way, so let's set this
straight:

If I understand correctly, for an alphabet S, and a subset W of S*
specified by the wildcards, you expect the enumeration of sequences of
length n to run in Theta( n*|S^n - W| ) instead of Theta( n*|S^n| ).

First, this doesn't seem to hold for your Python program.  Try, for
example, S = { a, b, c }, W = { *a*b*, *b*c*, *c*a*, *b*a*, *c*b*,
*a*c* }, with some large values of n.  Theta( n*|S^n - W| ) predicts
that the enumeration time should grow linearly with n, as |S^n - W| =
3, but if you take some measurements, you'd notice that it grows faster
than that.

Second, my current bet is that such an improvement in asymptotic
complexity is not possible, if we consider *both* pre-processing of the
wildcard set and subsequent enumeration.  Speculation: the time for
building-up a smart structure to speed-up enumeration, together with
the time for enumerating the set using that structure, should sum up to
roughly Theta( n*|S^n| ), even with a really smart algorithm.

Even if you're willing to pay up-front for tighter loop execution
later, and you build a suitable structure for this purpose, you would
have to consider the structure's size, so here's another speculation:
such structure would likely take up Theta( |S^n| ) space in memory, in
the worst case.

I would really appreciate it if you could pour some light into what
you're trying to do exactly, and possibly point out anything that I
might have missed so far.

Cheers,

Dinko

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Tue, 21 Mar 2006 10:18:22 +0000
Message-ID: <1142936302.624529.59330@i40g2000cwc.googlegroups.com>

Dinko Tenev wrote:
> Speculation: the time for
> building-up a smart structure to speed-up enumeration, together with
> the time for enumerating the set using that structure, should sum up to
> roughly Theta( n*|S^n| ), even with a really smart algorithm.

OK, maybe not.

This might be the worst case, looking at S^n - W only, but it's not
quite clear what "worst case" means in the context of concrete
implementations.  Surely, one can clog the program with zillions of
wildcards to test, so we can produce an arbitrarily "bad" case :) --
but such a case is obviously of little or no practical importance.  It
appears that, to make any sensible statements about performance in the
relevant cases, we have to take into account the size of the pattern
set used to specify W as well.

> [...] here's another speculation:
> such structure would likely take up Theta( |S^n| ) space in memory, in
> the worst case.

...and similarly for "worst case" here.

Cheers,

Dinko

From: Mark Tarver
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 16:47:49 +0000
Message-ID: <1142873269.395315.293590@v46g2000cwv.googlegroups.com>

Hi,

You wrote into the Qilang News group with your problem.
This is a solution in 17 lines of Qi for any n-product >= 2.
It falls short of your complete requirement since it uses
generate and then test, rather than interleaving the
two.

(define challenge
   Patterns N X -> (filter (/. Y (member Y Patterns)) (n-product N X)))

(define n-product
    2 X -> (cartesian-product l X X)
    N X -> (cartesian-product c X (n-product (- N 1) X)))

(define cartesian-product
   _ [ ] _ -> [ ]
   c [X | Y] Z -> (append (map (/. W [X | W]) Z) (cartesian-product c Y
Z))
   l  [X | Y] Z -> (append (map (/. W [X W]) Z) (cartesian-product l Y
Z)))

(define filter
    _ [] -> []
    F [X | Y] -> (filter F Y)	where (F X)
    F [X | Y] -> [X | (filter F Y)])

(define member
    _ [] -> false
    X [Pattern | _]  -> true	where (query-prolog [[= Pattern X]])
    X [_ | Patterns] -> (member X Patterns))

Notes:

Pattern filtering is done by a unification test within the member
function.  You
can do this most easily by calling Qi Prolog using query-prolog.
Here's a test.

(42 -) (n-product 3 [a b c])
[[a a a] [a a b] [a a c] [a b a] [a b b] [a b c] [a c a] [a c b] [a c
c]
 [b a a] [b a b] [b a c] [b b a] [b b b] [b b c] [b c a] [b c b] [b c
c]
 [c a a] [c a b] [c a c] [c b a] [c b b] [c b c] [c c a] [c c b] [c c
c]]

OK, remove all lists beginning [a a ....].

(43-) (challenge [[a a | X]] 3 [a b c])
[[a b a] [a b b] [a b c] [a c a] [a c b] [a c c] [b a a] [b a b] [b a
c]
 [b b a] [b b b] [b b c] [b c a] [b c b] [b c c] [c a a] [c a b] [c a
c]
 [c b a] [c b b] [c b c] [c c a] [c c b] [c c c]]

Remove all lists beginning with a or b.

(51-) (challenge [[a | X] [b | X]] 3 [a b c])
[[c a a] [c a b] [c a c] [c b a] [c b b] [c b c] [c c a] [c c b] [c c
c]]

Mark

From: Mark Carter
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 18:25:37 +0000
Message-ID: <441ef3a2$0$15789$14726298@news.sunsite.dk>

I'd like to propose a coding challenge of my own. The challenge is to 
reproduce the TEA (Tiny Encryption Algorith):
http://www.simonshepherd.supanet.com/tea.htm
in your language of choice.

Here's the code, just two simple functions:

void encipher(unsigned long *const v,unsigned long *const w,
    const unsigned long *const k)
{
    register unsigned long       y=v[0],z=v[1],sum=0,delta=0x9E3779B9,
				a=k[0],b=k[1],c=k[2],d=k[3],n=32;

    while(n-->0)
       {
       sum += delta;
       y += (z << 4)+a ^ z+sum ^ (z >> 5)+b;
       z += (y << 4)+c ^ y+sum ^ (y >> 5)+d;
       }

    w[0]=y; w[1]=z;
}

void decipher(unsigned long *const v,unsigned long *const w,
    const unsigned long *const k)
{
    register unsigned long       y=v[0],z=v[1],sum=0xC6EF3720,
				delta=0x9E3779B9,a=k[0],b=k[1],
				c=k[2],d=k[3],n=32;

    /* sum = delta<<5, in general sum = delta * n */

    while(n-->0)
       {
       z -= (y << 4)+c ^ y+sum ^ (y >> 5)+d;
       y -= (z << 4)+a ^ z+sum ^ (z >> 5)+b;
       sum -= delta;
       }

    w[0]=y; w[1]=z;
}

I had a crack at it in Lisp. My version doesn't work - but of greater 
concern to me is that it doesn't appear nearly as compact as the C 
version. Anyway, here's my Lisp code (no prizes for guessing that I'm a 
noob to Lisp):

(defconstant delta 2654435769 ) ; delta= 0x9E3779B9

(defun floorn (n) (nth-value 0 (floor n)))

(defun >> (val num-bytes)
   "Right-shift positive integer val by num-bytes"
   (let* (t1 t2)
     (setf t1 (expt 2 num-bytes))
     (setf t2 (/ val t1))
     (floor t2)))

(defun << (val num-bytes)
   "Left-shift positive integer v by num-bytes"
   (* val (expt 2 num-bytes)))

(defun <<4 (i) (<< i 4))

(defun byte-n (v n)
   "Return the nth byte of a value v"
   (let* ((bits-to-shift (* 8 (1- n)))
	 (shifted-value (>> v bits-to-shift)))
     (logand shifted-value 256)))
	

(defun transform (v1 v2 v3 v4)
   (let (t1 t2 t3)
     (setf t1 (<<4 v1))
     (setf t2 (expt v2 v1))
     (setf t3 (expt v3 (>> v2 5)))
     (+ t1 t2 t3 v4)))

(defun pack64 (b1 b2) (+ (<< b1 32) b2))

(defun encipher (v k)
   (let ((sum 0)
	(a (byte-n k 3))  ; a=k[0]
	(b (byte-n k 2))  ; b=k[1]	
	(c (byte-n k 1))  ; c=k[2]	
	(d (byte-n k 0))  ; d=k[3]	
	(y (byte-n v 1))  ; y=v[4]	
	(z (byte-n v 0))) ; z=v[1]


     (loop for n from 0 to 31 do  ;n=32, while(n-->0)
	  (incf sum delta)    ;sum += delta;
	  (incf y (transform z a sum b)) ; y += (z << 4)+a ^ z+sum ^ (z >> 5)+b
	  (incf z (transform y c sum d)) ;z += (y << 4)+c ^ y+sum ^ (y >> 5)+d;
	  )

     (pack64 y z) ; w[0]=y; w[1]=z;
     ))


(defun decipher (v k)
   (let ((sum 3337565984)  ; 0xC6EF3720
	(a (byte-n k 3))  ; a=k[0]
	(b (byte-n k 2))  ; b=k[1]	
	(c (byte-n k 1))  ; c=k[2]	
	(d (byte-n k 0))  ; d=k[3]	
	(y (byte-n v 1))  ; y=v[4]	
	(z (byte-n v 0))) ; z=v[1]

     (loop for n from 0 to 31 do  ;n=32, while(n-->0)
	  (decf z (transform y c sum d)) ;z -= (y << 4)+c ^ y+sum ^ (y >> 5)+d;
	  (decf y (transform z a sum b)) ;y -= (z << 4)+a ^ z+sum ^ (z >> 5)+b;
	  (decf sum delta)    ;sum -= delta;
	  )

     (pack64 y z) ; w[0]=y; w[1]=z;
     ))

From: Mark Tarver
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 18:32:41 +0000
Message-ID: <1142879561.174895.242180@g10g2000cwb.googlegroups.com>

Interesting.  But you probably need to post this as a new
message, since it is a distinctly different
problem from the one driving this thread.

Mark

From: Mark Carter
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 19:26:58 +0000
Message-ID: <441f020b$0$15783$14726298@news.sunsite.dk>

Mark Tarver wrote:
> Interesting.

At the risk of being labelled a troll, one thought that is occuring to 
me is that in Lisp it seems that sometimes it is difficult to achieve a 
simple thing in a simple way.  To clarify ... recently, I had been 
trying to obtain md5 hashes of the files we had on our server (a 
different exercise than the one I mentioned in my OP, just in case you 
thought that I didn't understand the difference between encryption and 
hashing). There is an md5 package for Lisp available on the web, which I 
used with CLISP. I had a file that contained a non-standard character, 
causing CLISP to throw an error when it tried to print it.

Well, I suppose I could have tried to figure out a way to cajole CLISP 
into printing something it didn't want to print, but I was keen to give 
Corman Lisp 2.5 a try-out anyway, so I tried the package on it. EXCEPT, 
for some reason when you try to read a file with an :element-type  of 
(unsigned-byte 8) (or something similar), Corman didn't like it.

In the end, I hacked together an md5 DLL from some sources I found on 
the internet. You can get the package here, together with Corman Lisp 
bindings:
http://www.markcarter.me.uk/computing/freeware/md5mc/md5mc.htm

In the past, I had also employed a similar technique in order to get 
access to some console functions that I was interested in.

My worry is that it seems to be a recurring theme with me ... get 
stumped in Lisp, realise that it is probably just plain easier in C, and 
then link the whole thing together in Lisp. Which is kinda less than 
expected.

From: ·······@gmail.com
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 21:49:45 +0000
Message-ID: <1142891385.567319.277720@v46g2000cwv.googlegroups.com>

Mark Carter wrote:
> In the end, I hacked together an md5 DLL from some sources I found on
> the internet. You can get the package here, together with Corman Lisp
> bindings:
> http://www.markcarter.me.uk/computing/freeware/md5mc/md5mc.htm
>
> In the past, I had also employed a similar technique in order to get
> access to some console functions that I was interested in.

At the risk of tooting my own horn, might I suggest Ironclad the next
time you need to do this sort of thing:

http://cliki.net/Ironclad

MD5, SHA-1, DES, AES, and more--all in 100% pure Common Lisp.  And if
it doesn't work in Corman (I don't have a Windows box for testing
Corman), please send me mail.  (The SHA family of digests doesn't work
with clisp in the current release, but that will be corrected with a
new release shortly.)

-Nathan

From: Mark Carter
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 14:16:36 +0000
Message-ID: <44215c42$0$15790$14726298@news.sunsite.dk>

Mark Carter wrote:

> At the risk of being labelled a troll

One thing I just discovered, and by which I mean *really* discovered ... 
is that Lisp is an interactive environment. I am working on trying to 
verify the contents of disks. I noticed that the input formats are 
slightly wrong, and needed correction. In fact, there's a whole host of 
jiggery pokery that I need to do in order to massage and build up 
everything the way it needs to be.

A programmers mindset is usually geared towards "writing applications". 
What I'm currently doing in Lisp is building up functions as I need 
them. Using emacs, I can just C-x C-e to make my functions "live", and 
when it's time to stop for the day, save my working image so that I can 
use it the next day.

It seems to me that only Forth or Scheme really matches this capability. 
Ruby and Python come kinda close - they do have a REPL, but it's kinda 
clunky to try to create functions on the fly, plus of course they don't 
support the idea of an image.

From: QCD Apprentice
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 15:36:19 +0000
Message-ID: <dvrqth$5q8$1@news.doit.wisc.edu>

Mark Carter wrote:
> Mark Carter wrote:
> 
>> At the risk of being labelled a troll
> 
> 
> One thing I just discovered, and by which I mean *really* discovered ... 
> is that Lisp is an interactive environment. I am working on trying to 
> verify the contents of disks. I noticed that the input formats are 
> slightly wrong, and needed correction. In fact, there's a whole host of 
> jiggery pokery that I need to do in order to massage and build up 
> everything the way it needs to be.
> 
> A programmers mindset is usually geared towards "writing applications". 
> What I'm currently doing in Lisp is building up functions as I need 
> them. Using emacs, I can just C-x C-e to make my functions "live", and 
> when it's time to stop for the day, save my working image so that I can 
> use it the next day.
> 
> It seems to me that only Forth or Scheme really matches this capability. 
> Ruby and Python come kinda close - they do have a REPL, but it's kinda 
> clunky to try to create functions on the fly, plus of course they don't 
> support the idea of an image.
(Note:  I trimmed the other newsgroups just because I'm 
asking about Lisp and Lisp only.)
Could someone provide an explanation of what a lisp image is?
That might be a stupid question, but apparently it means 
something more than just compiling the programs you're 
working on.

From: Alexander Schmolck
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 16:29:23 +0000
Message-ID: <yfsd5gefnoc.fsf@oc.ex.ac.uk>

QCD Apprentice <··············@gmail.com> writes:

> Could someone provide an explanation of what a lisp image is?
> That might be a stupid question, but apparently it means something more than
> just compiling the programs you're working on.

An image is just the state of your interactive lisp session and you can write
it all to disk and resume the same session in the future[1].

Unlike C++, Java and co where you write some code, compile all of it, run it,
find some bugs, quit the compiled program and start over again by writing some
more code to fix the bugs etc. you can develop your code piecemeal
interactively in lisp (and some other languages), even changing the *running*
program as you go (so rather than editing function foo and recompiling and
rerunning everything; you just edit function foo, compile and update it (and
nothing else) with a keypress in your editor in your interactive lisp session
in which your program runs. No need to start over from scratch).

'as

Footnotes: 
[1] Possibly modulo things like open network connections.

From: Lars Brinkhoff
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 16:53:38 +0000
Message-ID: <851wwuv2st.fsf@junk.nocrew.org>

QCD Apprentice <··············@gmail.com> writes:
> Could someone provide an explanation of what a lisp image is?

The CLHS (which is based on the ANSI Common Lisp standard) has this
explanation:

http://clhs.lisp.se/Body/26_glo_l.htm#lisp_image

From: Lars Brinkhoff
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 14:30:43 +0000
Message-ID: <85ek0uv9f0.fsf@junk.nocrew.org>

Mark Carter <··@privacy.net> writes:
> A programmers mindset is usually geared towards "writing
> applications". What I'm currently doing in Lisp is building up
> functions as I need them. Using emacs, I can just C-x C-e to make my
> functions "live", and when it's time to stop for the day, save my
> working image so that I can use it the next day.
>
> It seems to me that only Forth or Scheme really matches this
> capability. Ruby and Python come kinda close - they do have a REPL,
> but it's kinda clunky to try to create functions on the fly, plus of
> course they don't support the idea of an image.

I believe Smalltalk is also interactive and very image-oriented.

From: Alexander Schmolck
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 16:47:56 +0000
Message-ID: <yfs8xr2fmtf.fsf@oc.ex.ac.uk>

Mark Carter <··@privacy.net> writes:
> A programmers mindset is usually geared towards "writing applications". What
> I'm currently doing in Lisp is building up functions as I need them. Using
> emacs, I can just C-x C-e to make my functions "live", and when it's time to
> stop for the day, save my working image so that I can use it the next day.
> 
> 
> It seems to me that only Forth or Scheme really matches this capability. 

Not really. I'd say matlab's and squeak's (and presumably most smalltalks')
interactive environment is quite superior to cl+slime. Emacs lisp is also
better and I'd also suspect erlang to be better in some ways, but I haven't
used it yet. APL and descendants (J and K) also tend to have quite reasonable
interactive facilities and so do CAS systems.

> Ruby and Python come kinda close - they do have a REPL, but it's kinda
> clunky to try to create functions on the fly, plus of course they don't
> support the idea of an image.

I don't think interactively creating functions is much harder in python but
changing module and class definitions is.

Actually although cl+slime is rather nice it is inferior to the much less
sophisticated and capable ipython+emacs combo in some regards (runtime error
reporting in CL tends to be pretty crap; (i)python docstrings, ease of
interactive testing and shell integration are superior). It's also much
easier to serialize stuff in python than it is in common lisp (never mind
pickle; you can't even readably print hashtables -- and unless I'm missing
something this is not user-fixable which really sucks).

Of course it's *much* easier to supply a nice interactive experience for a
quite limited language such as matlab than it is for CL (mostly because CL is
more powerful and geared towards efficient code generation).

So CL might well have the best interactiveness/(expressiveness*speed) ratio.

'as

From: ······@corporate-world.lisp.de
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 20:25:22 +0000
Message-ID: <1142886321.996690.78600@t31g2000cwb.googlegroups.com>

> I had a crack at it in Lisp. My version doesn't work - but of greater
> concern to me is that it doesn't appear nearly as compact as the C
> version. Anyway, here's my Lisp code (no prizes for guessing that I'm a
> noob to Lisp):

Lot's of things you can write more compact.
But compact is not always the best way to
write source. For me the most important
criteria is that I can return to some source
after, say, a year absence and everything
is clear and readable again.

>
> (defconstant delta 2654435769 ) ; delta= 0x9E3779B9

(defconstant +delta+ #x9E3779B9)

> (defun floorn (n) (nth-value 0 (floor n)))

is above used?

> (defun >> (val num-bytes)
>    "Right-shift positive integer val by num-bytes"
>    (let* (t1 t2)
>      (setf t1 (expt 2 num-bytes))
>      (setf t2 (/ val t1))
>      (floor t2)))

(defun >> (val num-bytes)
  "Right-shift positive integer val by num-bytes"
  (floor (/ val (expt 2 num-bytes))))

> (defun transform (v1 v2 v3 v4)
>    (let (t1 t2 t3)
>      (setf t1 (<<4 v1))
>      (setf t2 (expt v2 v1))
>      (setf t3 (expt v3 (>> v2 5)))
>      (+ t1 t2 t3 v4)))
>

(defun transform (v1 v2 v3 v4)
  (+ (<<4 v1)
     (expt v2 v1)
     (expt v3 (>> v2 5))
     v4))

and so on...

From: Alexander Schmolck
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 21:17:07 +0000
Message-ID: <yfs7j6oyfxo.fsf@oc.ex.ac.uk>

······@corporate-world.lisp.de writes:

> (defun >> (val num-bytes)
>   "Right-shift positive integer val by num-bytes"
>   (floor (/ val (expt 2 num-bytes))))

or just (floor val (expt 2 num-bytes)) 

'as

From: Marcus Breiing
Subject: TEA (was: Re: Programming challenge: wildcard exclusion in cartesian products)
Date: Mon, 20 Mar 2006 19:17:41 +0000
Message-ID: <qm1opgs29rk53@breiing.com>

* Mark Carter

> I had a crack at it in Lisp. My version doesn't work 

I wrote a version some time ago that seemed to work. (Beware: I don't
use it for serious purposes and didn't test heavily.)

(defun tea (v k)
  "Tiny Encryption Algorithm.
See http://www.ftp.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html"
  (let ((delta #x9e3779b9)
        (y (ldb (byte 32 0) v))
        (z (ldb (byte 32 32) v))
        (k0 (ldb (byte 32 0) k))
        (k1 (ldb (byte 32 32) k))
        (k2 (ldb (byte 32 64) k))
        (k3 (ldb (byte 32 96) k))
        (sum 0))
    (loop for n from 0 below 32
          do (setf sum (ldb (byte 32 0) (+ sum delta))
                   y   (ldb (byte 32 0) (+ y (logxor (+ (* z 16) k0)
                                                     (+ z sum)
                                                     (+ (ldb (byte 27 5) z) k1))))
                   z   (ldb (byte 32 0) (+ z (logxor (+ (* y 16) k2)
                                                     (+ y sum)
                                                     (+ (ldb (byte 27 5) y) k3))))))
    (dpb z (byte 32 32) y)))
          
(defun tea-decode (v k)
  "Tiny Encryption Algorithm (Decryption Function)
See http://www.ftp.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html"
  (let* ((delta #x9e3779b9)
         (sum (dpb delta (byte 27 5) 0))
         (y (ldb (byte 32 0) v))
         (z (ldb (byte 32 32) v))
         (k0 (ldb (byte 32 0) k))
         (k1 (ldb (byte 32 32) k))
         (k2 (ldb (byte 32 64) k))
         (k3 (ldb (byte 32 96) k)))
    (loop for n from 0 below 32
          do (setf z   (ldb (byte 32 0) (- z (logxor (+ (dpb y (byte 28 4) 0) k2)
                                                     (+ y sum)
                                                     (+ (ldb (byte 27 5) y) k3))))
                   y   (ldb (byte 32 0) (- y (logxor (+ (dpb z (byte 28 4) 0) k0)
                                                     (+ z sum)
                                                     (+ (ldb (byte 27 5) z) k1))))
                   sum (ldb (byte 32 0) (- sum delta))))
    (dpb z (byte 32 32) y)))


-- 
Marcus Breiing

From: Pascal Bourguignon
Subject: Re: TEA
Date: Mon, 20 Mar 2006 19:57:02 +0000
Message-ID: <878xr451pt.fsf@thalassa.informatimago.com>

Marcus Breiing <······@2006w11.mail.breiing.com> writes:

> * Mark Carter
>
>> I had a crack at it in Lisp. My version doesn't work 
>
> I wrote a version some time ago that seemed to work. (Beware: I don't
> use it for serious purposes and didn't test heavily.)

And you can READ this?


> (defun tea (v k)
>   "Tiny Encryption Algorithm.
> See http://www.ftp.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html"
>   (let ((delta #x9e3779b9)
>         (y (ldb (byte 32 0) v))
>         (z (ldb (byte 32 32) v))
>         (k0 (ldb (byte 32 0) k))
>         (k1 (ldb (byte 32 32) k))
>         (k2 (ldb (byte 32 64) k))
>         (k3 (ldb (byte 32 96) k))
>         (sum 0))
>     (loop for n from 0 below 32
>           do (setf sum (ldb (byte 32 0) (+ sum delta))
>                    y   (ldb (byte 32 0) (+ y (logxor (+ (* z 16) k0)
>                                                      (+ z sum)
>                                                      (+ (ldb (byte 27 5) z) k1))))
>                    z   (ldb (byte 32 0) (+ z (logxor (+ (* y 16) k2)
>                                                      (+ y sum)
>                                                      (+ (ldb (byte 27 5) y) k3))))))
>     (dpb z (byte 32 32) y)))
>           
> (defun tea-decode (v k)
>   "Tiny Encryption Algorithm (Decryption Function)
> See http://www.ftp.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html"
>   (let* ((delta #x9e3779b9)
>          (sum (dpb delta (byte 27 5) 0))
>          (y (ldb (byte 32 0) v))
>          (z (ldb (byte 32 32) v))
>          (k0 (ldb (byte 32 0) k))
>          (k1 (ldb (byte 32 32) k))
>          (k2 (ldb (byte 32 64) k))
>          (k3 (ldb (byte 32 96) k)))
>     (loop for n from 0 below 32
>           do (setf z   (ldb (byte 32 0) (- z (logxor (+ (dpb y (byte 28 4) 0) k2)
>                                                      (+ y sum)
>                                                      (+ (ldb (byte 27 5) y) k3))))
>                    y   (ldb (byte 32 0) (- y (logxor (+ (dpb z (byte 28 4) 0) k0)
>                                                      (+ z sum)
>                                                      (+ (ldb (byte 27 5) z) k1))))
>                    sum (ldb (byte 32 0) (- sum delta))))
>     (dpb z (byte 32 32) y)))
>
>
> -- 
> Marcus Breiing

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Debugging?  Klingons do not debug! Our software does not coddle the
weak."

From: Marcus Breiing
Subject: Re: TEA
Date: Tue, 21 Mar 2006 12:47:53 +0000
Message-ID: <pt328rwil2miz@breiing.com>

* Pascal Bourguignon

> And you can READ this?

You mean the ldb/dpb?

IIRC, I got somewhat used to using those while writing something else,
where I found ldb/dpb more readable than shift/mask.

-- 
Marcus Breiing

From: Pascal Bourguignon
Subject: Re: TEA
Date: Tue, 21 Mar 2006 15:01:35 +0000
Message-ID: <87fylb3kq8.fsf@thalassa.informatimago.com>

Marcus Breiing <······@2006w12.mail.breiing.com> writes:

> * Pascal Bourguignon
>
>> And you can READ this?
>
> You mean the ldb/dpb?

No, I mean the long sequences of assembler style lisp.  
That is, the repeatition of the same forms again and again.
Abstraction, please!

If you have ten times (ldb (bytes 32 0) x) you should introduce some
abstraction:

(defmacro wref (x index) `(ldb (byte 32 ,(* 32 index)) ,x))
(defmacro lw  (x)  `(wref ,x 0))
(defmacro hw  (x)  `(wref ,x 1))

 (defun tea (v k)
   "Tiny Encryption Algorithm.
 See http://www.ftp.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html"
   (let ((delta #x9e3779b9)
         (y  (lw v))
         (z  (hw v))
         (k0 (wref k 0))
         (k1 (wref k 1))
         (k2 (wref k 2))
         (k3 (wref k 3))
         (sum 0))
    (flet ((op (a b p q) (lw (+ a (logxor (+ (* b 16) p)
                                          (+ b sum)
                                          (+ (ldb (byte 27 5) b) q))))))
     (declare (inline op))
     (loop repeat 32 ; n wasn't used!
           do (setf sum (lw (+ sum delta))
                    y   (op y z k0 k1)
                    z   (op z y k2 k3)))
     (setf (hw y) z))))

Now, really,  the additions will be done with more than 32 bits
anyways (more than a fixnum in all cases), so ldb on (+ sum delta)
[end sum = 32*delta, only 37 bits, the logxor will also work on 35 to
38 bits] and the ldb for the shifts are not necessary, and could be
less efficient than a shift.  But it depends on the processor and  the
optimizations the implementation can do.

> IIRC, I got somewhat used to using those while writing something else,
> where I found ldb/dpb more readable than shift/mask.

The point is to introduce abstractions, this is what makes the code readable.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

PUBLIC NOTICE AS REQUIRED BY LAW: Any use of this product, in any
manner whatsoever, will increase the amount of disorder in the
universe. Although no liability is implied herein, the consumer is
warned that this process will ultimately lead to the heat death of
the universe.

From: Marcus Breiing
Subject: Re: TEA
Date: Tue, 21 Mar 2006 15:50:02 +0000
Message-ID: <pry7shg7euep7@breiing.com>

* Pascal Bourguignon

[Efficiency]
> But it depends on the processor and  the
> optimizations the implementation can do.

Nor was efficiency relevant to my application.

> The point is to introduce abstractions, this is what makes the code
> readable.

Our taste buds differ. Specifically, I don't universally enjoy the
kind of abstraction that is really just code compression. IOW, I think
that too much factoring-out can reduce readability (probably by
introducing too many named entities, clobbering short term memory).
YMMV, obviously.

-- 
Marcus Breiing

From: Rob Warnock
Subject: Re: TEA
Date: Wed, 22 Mar 2006 03:17:10 +0000
Message-ID: <1KCdncNJeqcrXL3ZRVn-qw@speakeasy.net>

Marcus Breiing  <······@2006w12.mail.breiing.com> wrote:
+---------------
| * Pascal Bourguignon
| > The point is to introduce abstractions, this is what makes the code
| > readable.
| 
| Our taste buds differ. Specifically, I don't universally enjoy the
| kind of abstraction that is really just code compression. IOW, I think
| that too much factoring-out can reduce readability (probably by
| introducing too many named entities, clobbering short term memory).
+---------------

MACROLETs or FLETs with INLINE declarations can help here, by both
introducing abstraction and yet still keeping the abstraction "nearby".
Pascal already showed the FLET/INLINE part, but a MACROLET for "wref"
would have helped as well.

I don't understand why "lw" & "hw" weren't INLINE'd FLETs, too,
since unlike "wref" they had no macro-expansion-time arithmetic
in them. And they could have used some better names, too, like
maybe "low-word" and "high-word"! ;-} 


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Pascal Bourguignon
Subject: Re: TEA
Date: Wed, 22 Mar 2006 04:44:34 +0000
Message-ID: <871wwv1425.fsf@thalassa.informatimago.com>

····@rpw3.org (Rob Warnock) writes:

> Marcus Breiing  <······@2006w12.mail.breiing.com> wrote:
> +---------------
> | * Pascal Bourguignon
> | > The point is to introduce abstractions, this is what makes the code
> | > readable.
> | 
> | Our taste buds differ. Specifically, I don't universally enjoy the
> | kind of abstraction that is really just code compression. IOW, I think
> | that too much factoring-out can reduce readability (probably by
> | introducing too many named entities, clobbering short term memory).
> +---------------
>
> MACROLETs or FLETs with INLINE declarations can help here, by both
> introducing abstraction and yet still keeping the abstraction "nearby".
> Pascal already showed the FLET/INLINE part, but a MACROLET for "wref"
> would have helped as well.
>
> I don't understand why "lw" & "hw" weren't INLINE'd FLETs, too,
> since unlike "wref" they had no macro-expansion-time arithmetic
> in them. And they could have used some better names, too, like
> maybe "low-word" and "high-word"! ;-} 

There's a reason: you will need these macros outside of the TEA
functions to build their arguments.  If you have a look at my own
code, you'll see that I had to do something similar to build the words
in the vectors from the bytes.  Since usually you don't get your data
as bignums, you'll have to do the same here.

Actually, it would be better to define them as setf expanders than
macros, but we can use macros as Q&D RAD versions of the setf
expanders.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

PUBLIC NOTICE AS REQUIRED BY LAW: Any use of this product, in any
manner whatsoever, will increase the amount of disorder in the
universe. Although no liability is implied herein, the consumer is
warned that this process will ultimately lead to the heat death of
the universe.

From: Marcus Breiing
Subject: Re: TEA
Date: Wed, 22 Mar 2006 07:46:42 +0000
Message-ID: <fzi5xyg431813@breiing.com>

* Rob Warnock

> Marcus Breiing  <······@2006w12.mail.breiing.com> wrote:
> +---------------
> | * Pascal Bourguignon
> | > The point is to introduce abstractions, this is what makes the code
> | > readable.
> | 
> | Our taste buds differ. Specifically, I don't universally enjoy the
> | kind of abstraction that is really just code compression. IOW, I think
> | that too much factoring-out can reduce readability (probably by
> | introducing too many named entities, clobbering short term memory).
> +---------------

> MACROLETs or FLETs with INLINE declarations can help here, by both
> introducing abstraction and yet still keeping the abstraction
> "nearby".

I don't disagree at all - _when_ abstracting locally, FLETs etc. are a
good idea.

> And they could have used some better names, too, like maybe
> "low-word" and "high-word"! ;-}

Absolutely. Or maybe ... not at all:-) WORD is not a universally
agreed-upon unit. For example, people inculturated into Windos/Intel
might take it as sixteen bits, instead of Pascal's thirty-two.

-- 
Marcus Breiing

From: Pascal Bourguignon
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 19:51:18 +0000
Message-ID: <87d5gg51zd.fsf@thalassa.informatimago.com>

Mark Carter <··@privacy.net> writes:

> I'd like to propose a coding challenge of my own. The challenge is to
> reproduce the TEA (Tiny Encryption Algorith):
> http://www.simonshepherd.supanet.com/tea.htm
> in your language of choice.
>
> Here's the code, just two simple functions:
>
> void encipher(unsigned long *const v,unsigned long *const w,
>     const unsigned long *const k)
> {
>     register unsigned long       y=v[0],z=v[1],sum=0,delta=0x9E3779B9,
> 				a=k[0],b=k[1],c=k[2],d=k[3],n=32;
>
>     while(n-->0)
>        {
>        sum += delta;
>        y += (z << 4)+a ^ z+sum ^ (z >> 5)+b;
>        z += (y << 4)+c ^ y+sum ^ (y >> 5)+d;
>        }
>
>     w[0]=y; w[1]=z;
> }
>
> void decipher(unsigned long *const v,unsigned long *const w,
>     const unsigned long *const k)
> {
>     register unsigned long       y=v[0],z=v[1],sum=0xC6EF3720,
> 				delta=0x9E3779B9,a=k[0],b=k[1],
> 				c=k[2],d=k[3],n=32;
>
>     /* sum = delta<<5, in general sum = delta * n */
>
>     while(n-->0)
>        {
>        z -= (y << 4)+c ^ y+sum ^ (y >> 5)+d;
>        y -= (z << 4)+a ^ z+sum ^ (z >> 5)+b;
>        sum -= delta;
>        }
>
>     w[0]=y; w[1]=z;
> }

I get it shorter than in C:

(defun op (x a b sum) (logxor (+ (ash x 4) a) (+ x sum) (+ (ash x -5) b)))
(declaim (inline op))
(defmacro ciploop ((v w k y z a b c d (sum init-sum) delta) &body body)
  `(let ((,y  (aref ,v 0)) (,z  (aref ,v 1)) (,sum  ,init-sum) (,delta  #x9E3779B9)
         (,a  (aref ,k 0)) (,b  (aref ,k 1)) (,c  (aref ,k 2)) (,d  (aref ,k 3)))
     (loop repeat 32 do ,@body finally (setf (aref ,w 0) ,y (aref ,w 1) ,z))))
(defmacro c-incf (var expr) `(setf ,var (mod (+ ,var ,expr) #x100000000)))
(defmacro c-decf (var expr) `(setf ,var (mod (- ,var ,expr) #x100000000)))
(defun encipher (v w k)
  (ciploop (v w k y z a b c d (sum 0) delta)
           (c-incf sum delta) (c-incf y (op z a b sum)) (c-incf z (op y c d sum))))
(defun decipher (v w k)
  (ciploop (v w k y z a b c d (sum #xC6EF3720) delta)
           (c-decf z (op y c d sum)) (c-decf y (op z a b sum)) (c-decf sum delta)))


You can also easily modify it to implement the improved version of TEA...
Note that this Lisp programs will work equally well on a 16-bit,
32-bit or 64-bit Common Lisp implementation.  The same cannot be said
of the C program above.



;; Let's add a testbed:

(defun word (a b c d) 
  (dpb a (byte 8 24) (dpb b (byte 8 16) (dpb c (byte 8 8) d))))

(defun read-words (bits what)
  (loop
     for bytes = (progn (format *query-io* "Please enter ~D bits of ~A: " 
                                bits what)
                        (ext:convert-string-to-bytes
                         (read-line *query-io* nil nil) ext:*TERMINAL-ENCODING*))
     while (< (* 8 (length bytes)) bits)
     finally (return
               (loop for i from 0 by 4 below (truncate (+ 7 bits) 8)
                  collect (word (aref bytes (+ i 0))
                                (aref bytes (+ i 1))
                                (aref bytes (+ i 2))
                                (aref bytes (+ i 3))) into words
                  finally (return (coerce words 'vector))))))

(defun test ()
    (loop 
       with code = (vector 0 0)
       with decr = (vector 0 0)
       for clear = (read-words  64 "clear text")
       for key   = (read-words 128 "key")
       do (progn (encipher clear code key)
                 (format t "(encipher ~S ~S)~% -->      ~S~%" clear key code)
                 (decipher code decr key)      
                 (format t "(decipher ~S ~S)~% -->      ~S~%" code key decr)
                 (unless (equalp clear decr) (format t "!!! ERROR !!!~%")))))


[11]> (test)
Please enter 64 bits of clear text: Hello World!
Please enter 128 bits of key: John McCarthy invented LISP.
(encipher #(1214606444 1864390511) #(1248815214 541942595 1634890856 2032167278))
 -->      #(913593965 183139965)
(decipher #(913593965 183139965) #(1248815214 541942595 1634890856 2032167278))
 -->      #(1214606444 1864390511)
Please enter 64 bits of clear text: Big Secret: LISP!
Please enter 128 bits of key: A very very secure key.
(encipher #(1114203936 1399153522) #(1092646501 1920540790 1702000928 1936024437))
 -->      #(3198111104 1851109064)
(decipher #(3198111104 1851109064) #(1092646501 1920540790 1702000928 1936024437))
 -->      #(1114203936 1399153522)
Please enter 64 bits of clear text: ^C





-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

ATTENTION: Despite any other listing of product contents found
herein, the consumer is advised that, in actuality, this product
consists of 99.9999999999% empty space.

From: Christophe Rhodes
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 20 Mar 2006 21:27:54 +0000
Message-ID: <squ09skdr9.fsf@cam.ac.uk>

[ note followups ]

Mark Carter <··@privacy.net> writes:

> I'd like to propose a coding challenge of my own. The challenge is to
> reproduce the TEA (Tiny Encryption Algorith):
> http://www.simonshepherd.supanet.com/tea.htm
> in your language of choice.

Here's mine, in Common Lisp.

(defmacro define-tea-pair ((encrypt decrypt) (delta n) (i1 j1 i2 j2))
  `(macrolet ((32bitize (form) `(logand ,form #xffffffff))
              (tea-kernel (x i j)
               `(logxor (+ (ash ,x 4) (aref key ,i)) (+ ,x sum)
                        (+ (ash ,x -5) (aref key ,j))))
              (tea-loop ((text n sum) &body body)
               `(let ((result (make-array 2 :element-type '(unsigned-byte 32)))
                      (y (aref ,text 0))
                      (z (aref ,text 1)))
                 (do ((n ,n (- n 1)) (sum ,sum))
                     ((<= n 0)
                      (prog1 result
                        (setf (aref result 0) y (aref result 1) z)))
                   ,@body))))
    (defun ,encrypt (plaintext key &aux (delta ,delta))
      (declare (type (simple-array (unsigned-byte 32) (2)) plaintext)
               (type (simple-array (unsigned-byte 32) (4)) key))
      (tea-loop (plaintext ,n 0)
        (setq sum (32bitize (+ sum delta))
              y (32bitize (+ y (tea-kernel z ,i1 ,j1)))
              z (32bitize (+ z (tea-kernel y ,i2 ,j2))))))
    (defun ,decrypt (ciphertext key &aux (delta ,delta))
      (declare (type (simple-array (unsigned-byte 32) (2)) ciphertext)
               (type (simple-array (unsigned-byte 32) (4)) key))
      (tea-loop (ciphertext ,n (32bitize (* ,n delta)))
        (setq z (32bitize (- z (tea-kernel y ,i2 ,j2)))
              y (32bitize (- y (tea-kernel z ,i1 ,j1)))
              sum (32bitize (- sum delta)))))))

(define-tea-pair (encipher decipher) (#x9e3779b9 32) (0 1 2 3))

So far, so ordinary; only marginally shorter than the C version; I'm
certainly nowhere near Pascal's 14 lines, although my version has the
advantage over his that each constant is mentioned only once, and all
quantities derived from it are computed at compile-time; I can define
a different pair with

  (define-tea-pair (eprime dprime) (#xabcdef01) (3 1 2 0))

and the new functions are inverses of each other as before.

The other thing that might be of interest is the inner loop.  There
are no declarations other than the argument declarations, and all the
code that I have written is portable Common Lisp, and should work in
any conforming implemenation.  In SBCL (on the PowerPC; other
platforms are similar), the inner loop for ENCIPHER is

  addis $nl0,$nl3,-25033
  addi $nl3,$nl0,31161
  rlwinm $nl5,$nl2,4,0,27
  lwz $nl6,1($fdefn)
  add $nl6,$nl5,$nl6
  add $nl0,$nl2,$nl3
  xor $cfunc,$nl6,$nl0
  rlwinm $nl0,$nl2,27,5,31
  mr $nl5,$nl0
  lwz $nl6,5($fdefn)
  add $nl6,$nl5,$nl6
  xor $nl6,$cfunc,$nl6
  add $nl1,$nl1,$nl6
  rlwinm $nl5,$nl1,4,0,27
  lwz $nl6,9($fdefn)
  add $nl6,$nl5,$nl6
  add $nl0,$nl1,$nl3
  xor $cfunc,$nl6,$nl0
  rlwinm $nl0,$nl1,27,5,31
  mr $nl5,$nl0
  lwz $nl6,13($fdefn)
  add $nl6,$nl5,$nl6
  xor $nl6,$cfunc,$nl6
  add $nl2,$nl2,$nl6
  addi $nl4,$nl4,-4
  cmpwi cr0,$nl4,0
  bt gt,l0

and while this may be opaque to some readers, the point is that it is
pretty much comparable to the C code in performance (the differences
between this disassembly and gcc -O2 lie in the fact that SBCL's
instruction scheduler is pretty much nonexistent on the PowerPC
architecture).

Christophe

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 10:55:37 +0000
Message-ID: <1143024937.827420.104100@i39g2000cwa.googlegroups.com>

OK, here's a case that will make your program run in exponential time:
S = { a, b }, W = { *a*b, *b*a } -- on my machine, it starts getting
ugly as soon as n is 15 or so.  Note that S^n - W = { a^n, b^n }.

In general, whenever all the patterns in the set match against the last
position, your current implementation is guaranteed to have to sift
through all of S^n.  I'd say the very idea of checking against a
blacklist is fundamentally flawed, as far as performance is concerned.

From: Dirk Thierbach
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 14:14:00 +0000
Message-ID: <20060322141400.1A00.1.NOFFLE@dthierbach.news.arcor.de>

[Had to drop alt.comp.lang.haskell, otherwise my newsserver doesn't accept it]

Dinko Tenev <···········@gmail.com> wrote:
> OK, here's a case that will make your program run in exponential time:
> S = { a, b }, W = { *a*b, *b*a } -- on my machine, it starts getting
> ugly as soon as n is 15 or so.  Note that S^n - W = { a^n, b^n }.

> In general, whenever all the patterns in the set match against the last
> position, your current implementation is guaranteed to have to sift
> through all of S^n.  I'd say the very idea of checking against a
> blacklist is fundamentally flawed, as far as performance is concerned.

If more time during preprocessing is allowed, another idea is to
treat the wildcard expressions as regular expressions, convert
each into a finite state machine, construct the "intersection" of
all these state machines, minimize it and then swap final and non-final
states. Then you can use the resulting automaton to efficiently 
enumerate S^n - W. In the above case, the resulting FSM would have just 
three states.

And it doesn't really matter what language you use to implement this
algorithm, it's the idea that counts. Notation aside, all
implementations will be quite similar.

- Dirk

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Wed, 22 Mar 2006 22:32:16 +0000
Message-ID: <1143066736.857495.203350@i40g2000cwc.googlegroups.com>

"And it doesn't really matter what language you use to implement this
algorithm, it's the idea that counts. Notation aside, all
implementations will be quite similar."

I'll guess I'll get out my Turing tape. ;)

What are some good references for finite state machines? Minimization?

From: Dirk Thierbach
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 07:12:37 +0000
Message-ID: <20060323071237.52A.0.NOFFLE@dthierbach.news.arcor.de>

·········@cox.net <·········@cox.net> wrote:
> What are some good references for finite state machines? Minimization?

A classic is "Introduction to automata theory, languages and computation"
by Hopcroft and Ullman. But any other book about finite state machines
should cover these topics, too. There are good chances that you can just
google for a detailed explanation.

- Dirk

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 10:47:52 +0000
Message-ID: <1143110872.284192.325230@u72g2000cwu.googlegroups.com>

Dirk Thierbach wrote:
> If more time during preprocessing is allowed, another idea is to
> treat the wildcard expressions as regular expressions, convert
> each into a finite state machine, construct the "intersection" of
> all these state machines, minimize it and then swap final and non-final
> states.

Given the requirements, did you mean taking the *union* and swapping
states?  Or maybe swapping states first, and then taking the
intersection?

> Then you can use the resulting automaton to efficiently
> enumerate S^n - W. In the above case, the resulting FSM would have just
> three states.

I don't see immediately how exactly this is going to work.  Unless I'm
very much mistaken, a FSA in the classical sense will accept or reject
only after the whole sequence has been consumed, and this spells
exponential times.  For improved asymptotic complexity in this case,
you need to be able to at least reject in mid-sequence, and that calls
for a slightly different concept of a FSA -- is this what you meant?

Cheers,

Dinko

From: Dirk Thierbach
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 13:43:26 +0000
Message-ID: <20060323134326.C29.0.NOFFLE@dthierbach.news.arcor.de>

Dinko Tenev <···········@gmail.com> wrote:
> Dirk Thierbach wrote:
>> If more time during preprocessing is allowed, another idea is to
>> treat the wildcard expressions as regular expressions, convert
>> each into a finite state machine, construct the "intersection" of
>> all these state machines, minimize it and then swap final and non-final
>> states.

> Given the requirements, did you mean taking the *union* and swapping
> states?  Or maybe swapping states first, and then taking the
> intersection?

Whatever the requirements were. Take your pick. :-)

>> Then you can use the resulting automaton to efficiently
>> enumerate S^n - W. In the above case, the resulting FSM would have just
>> three states.

> I don't see immediately how exactly this is going to work.  Unless I'm
> very much mistaken, a FSA in the classical sense will accept or reject
> only after the whole sequence has been consumed, and this spells
> exponential times.  

Exponential in respect to what? You just recursively walk the
automaton for every word of length n, so at most you'll have to check
every word in S^n once. Hence, the complexity is not worse than the
"generate and test" approach (it's in fact better, since testing is
trivial).

However, if the result set is simple (as in the example), then the
result FSM will have states that won't have transitions for every
letters, so I guess the average case will be a lot better.

> For improved asymptotic complexity in this case,
> you need to be able to at least reject in mid-sequence, 

One can do that if there is no valid transition out of some state.

I guess one could even optimize on that: In the minimal automaton,
every state has some transition sequence that will end up in a final
state. Annotate each state with the minimum number of steps for
such a sequence. While walking the automaton, keep the maximum
number of letters to produce in a variable. If this number is small
then the number in the annotation, stop and backtrace.

This should guarantee that only those states are visited that really
produce some output. 

- Dirk

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 17:26:15 +0000
Message-ID: <1143134775.887696.5000@z34g2000cwc.googlegroups.com>

Dirk Thierbach wrote:
> Dinko Tenev <···········@gmail.com> wrote:
> > Dirk Thierbach wrote:
> >> If more time during preprocessing is allowed, another idea is to
> >> treat the wildcard expressions as regular expressions, convert
> >> each into a finite state machine, construct the "intersection" of
> >> all these state machines, minimize it and then swap final and non-final
> >> states.
>
> > Given the requirements, did you mean taking the *union* and swapping
> > states?  Or maybe swapping states first, and then taking the
> > intersection?
>
> Whatever the requirements were. Take your pick. :-)

OK :)

> >> Then you can use the resulting automaton to efficiently
> >> enumerate S^n - W. In the above case, the resulting FSM would have just
> >> three states.
>
> > I don't see immediately how exactly this is going to work.  Unless I'm
> > very much mistaken, a FSA in the classical sense will accept or reject
> > only after the whole sequence has been consumed, and this spells
> > exponential times.
>
> Exponential in respect to what?

With respect to sequence length.  In the example above (S = {a, b}, W =
{*a*b, *b*a}) you have to enumerate |S^n| potential sequences to get to
the couple (a^n, b^n) that are of particular interest.

> You just recursively walk the
> automaton for every word of length n, so at most you'll have to check
> every word in S^n once.

That's right -- the sky is the limit ;)

> Hence, the complexity is not worse than the
> "generate and test" approach (it's in fact better, since testing is
> trivial).

Testing per sequence will be faster, yes, but the overall running time
will still be a disaster, just like with the "generate and test"
solutions.  The Python program tries to do better than that, and it
succeeds some of the time.

> However, if the result set is simple (as in the example), then the
> result FSM will have states that won't have transitions for every
> letters, so I guess the average case will be a lot better.

I don't believe this case should ever arise out of the current
definition of the problem: labels are specified explicitly only for the
excluded subset, and you have to accept everything else by default.

> > For improved asymptotic complexity in this case,
> > you need to be able to at least reject in mid-sequence,
>
> One can do that if there is no valid transition out of some state.

One possibly can, but such states should never be encountered (see
above.)

Looking at the minimal DFA for the above example, it may be more
appropriate to detect being in a state from which there's no path to a
final state:

S: a -> A, b -> B
A: a -> A, b -> F
B: a -> F, b -> B
F: a -> F, b -> F

Here, S is the starting state, {S, A, B} are final, and F is non-final
(after swapping.)  Obviously, the smart thing to do is to bail out as
soon as you're in F.  The point is, though, are things guaranteed to be
as simple in the general case? :)

> I guess one could even optimize on that: In the minimal automaton,
> every state has some transition sequence that will end up in a final
> state. Annotate each state with the minimum number of steps for
> such a sequence. While walking the automaton, keep the maximum
> number of letters to produce in a variable. If this number is small
> then the number in the annotation, stop and backtrace.

I don't think this could cut efficiently for patterns like *a*b.


Cheers,

Dinko

From: ·········@cox.net
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 18:53:07 +0000
Message-ID: <1143139987.254145.267980@g10g2000cwb.googlegroups.com>

Call a wc 'free' if it satisfies the propery that every letter 'a' in
it appears only in the form '*a*', and 'anchored' otherwise. What if
all wc's are free? How does this affect the DFA? Does it minimize
nontrivially? Keep in mind I'm new to DFA theory.

Walter Kehowski

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 24 Mar 2006 10:39:40 +0000
Message-ID: <1143196780.127188.289060@v46g2000cwv.googlegroups.com>

·········@cox.net wrote:
> Call a wc 'free' if it satisfies the propery that every letter 'a' in
> it appears only in the form '*a*', and 'anchored' otherwise. What if
> all wc's are free? How does this affect the DFA? Does it minimize
> nontrivially? Keep in mind I'm new to DFA theory.

There would be no difference for single patterns, but I'm not sure into
how large a DFA a set of those would combine.


Cheers,

Dinko

From: Dirk Thierbach
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 24 Mar 2006 15:50:11 +0000
Message-ID: <20060324155011.14D0.0.NOFFLE@dthierbach.news.arcor.de>

·········@cox.net <·········@cox.net> wrote:
> Call a wc 'free' if it satisfies the propery that every letter 'a' in
> it appears only in the form '*a*', and 'anchored' otherwise. 

Would '*ab*' be free or anchored?

> What if all wc's are free? How does this affect the DFA?

I don't know. The important point here is the interaction of all
the wc's. I don't think properties like this do reduce the complexity
of interaction in an obvious way.

> Does it minimize nontrivially?

I am not sure what you mean by this. Every DFA minimizes to some
other DFA, which is unique up to renaming of states. Now the
question is if that minimization reduces the complexity enough
to be noticeable (maybe that's what you mean by "nontrivially").
In general, I don't think one can say much about that without
looking at concrete examples.

- Dirk

From: Dirk Thierbach
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Thu, 23 Mar 2006 21:28:00 +0000
Message-ID: <20060323212800.79D.0.NOFFLE@dthierbach.news.arcor.de>

Dinko Tenev <···········@gmail.com> wrote:
>> > I don't see immediately how exactly this is going to work.  Unless I'm
>> > very much mistaken, a FSA in the classical sense will accept or reject
>> > only after the whole sequence has been consumed, and this spells
>> > exponential times.

>> Exponential in respect to what?

> With respect to sequence length.  

But you cannot get rid of this. Consider S = {a, b}, W = {a}.
Then there are |S|^n result elements for n > 1, and you have to enumerate 
all of them.

The best thing one can hope for is to just actually process those
elements that are really in the result set. With the FSM, you're
getting at least close to that.

>> However, if the result set is simple (as in the example), then the
>> result FSM will have states that won't have transitions for every
>> letters, so I guess the average case will be a lot better.

> I don't believe this case should ever arise out of the current
> definition of the problem: labels are specified explicitly only for the
> excluded subset, and you have to accept everything else by default.

If the result set is {a^n, b^n | n \in N}, then you have an automaton
where exactly this happens. Since minimum automatons are unique
up to renaming of states, that's the result you'll get.

>> > For improved asymptotic complexity in this case,
>> > you need to be able to at least reject in mid-sequence,

>> One can do that if there is no valid transition out of some state.

> One possibly can, but such states should never be encountered (see
> above.)

The automaton is:

S: a -> A, b -> B
A: a -> A
B: b -> B

All the states are final. A and B have just one transition, so
you'll be able to generate either a^n or b^n efficiently.

> Looking at the minimal DFA for the above example, it may be more
> appropriate to detect being in a state from which there's no path to a
> final state:

This cannot happen, because minimizing removes all those states.
(Or, in case you have a different definition of automaton in mind:
There'll be just one "stuck" state, where all transitions that never
go to a final state end up. Remove this state, and you'll end up
with the other definition).

>> I guess one could even optimize on that: In the minimal automaton,
>> every state has some transition sequence that will end up in a final
>> state. Annotate each state with the minimum number of steps for
>> such a sequence. While walking the automaton, keep the maximum
>> number of letters to produce in a variable. If this number is small
>> then the number in the annotation, stop and backtrace.

> I don't think this could cut efficiently for patterns like *a*b.

The point is not to "cut efficiently", the point is to enumerate
only those words that are actually in the result set. If you are
enumerating all words shorter than a given length, the above method
should guarantee this.

- Dirk

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 24 Mar 2006 09:42:14 +0000
Message-ID: <1143193333.963153.23850@g10g2000cwb.googlegroups.com>

Dirk Thierbach wrote:
> Dinko Tenev <···········@gmail.com> wrote:
> >> > I don't see immediately how exactly this is going to work.  Unless I'm
> >> > very much mistaken, a FSA in the classical sense will accept or reject
> >> > only after the whole sequence has been consumed, and this spells
> >> > exponential times.
>
> >> Exponential in respect to what?
>
> > With respect to sequence length.
>
> But you cannot get rid of this. Consider S = {a, b}, W = {a}.
> Then there are |S|^n result elements for n > 1, and you have to enumerate
> all of them.

Yes, but then, they are in the target set.  The point here is whether
you can generate S^n - W in Theta( n * |S^n - W| ), which may be
dramatically different from Theta( n * |S^n| ).

> The best thing one can hope for is to just actually process those
> elements that are really in the result set. With the FSM, you're
> getting at least close to that.

If the FSA has to consume the whole sequence, you're only impoving
performace by a constant factor.

> The automaton is:
>
> S: a -> A, b -> B
> A: a -> A
> B: b -> B

The target set is specified as S^n - W, where W is everything matching
(.*a.*b|.*b.*a).  Following the construction procedure in point, this
exclusion set is matched exactly by my DFA with S initial and F final.
Then, swapping final and non-final states makes {S, A, B} final, and F
non-final.  Your DFA above may be equivalent, but to me it is far from
clear exactly what algorithm would build it from the given data.

> > Looking at the minimal DFA for the above example, it may be more
> > appropriate to detect being in a state from which there's no path to a
> > final state:
>
> This cannot happen, because minimizing removes all those states.
> (Or, in case you have a different definition of automaton in mind:
> There'll be just one "stuck" state, where all transitions that never
> go to a final state end up. Remove this state, and you'll end up
> with the other definition).

I have to admit that I've never considered this before, but as I'm
thinking of it now, a block of such states will never split by
minimization, so they're bound to end up together as a single state,
which can then be eliminated.  I can see your point now.

> >> I guess one could even optimize on that: In the minimal automaton,
> >> every state has some transition sequence that will end up in a final
> >> state. Annotate each state with the minimum number of steps for
> >> such a sequence. While walking the automaton, keep the maximum
> >> number of letters to produce in a variable. If this number is small
> >> then the number in the annotation, stop and backtrace.
>
> > I don't think this could cut efficiently for patterns like *a*b.
>
> The point is not to "cut efficiently", the point is to enumerate
> only those words that are actually in the result set.

No, this is not the point.  Naive filtering already does that.  By
"cutting efficiently" I mean skipping over search sub-trees that don't
contain any results from the target set.  If you can do this, you can
enumerate the target set in time proportional to its size, not to the
size of the search space, which is the point.

Cheers,

Dinko

From: Dirk Thierbach
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 24 Mar 2006 10:53:46 +0000
Message-ID: <20060324105346.10AF.0.NOFFLE@dthierbach.news.arcor.de>

Dinko Tenev <···········@gmail.com> wrote:
> Dirk Thierbach wrote:

[One cannot escape exponential behaviour]

>> But you cannot get rid of this. Consider S = {a, b}, W = {a}.
>> Then there are |S|^n result elements for n > 1, and you have to enumerate
>> all of them.

> Yes, but then, they are in the target set.  

Which is the point. If they are in the target set, you have to enumerate
them. If the target set is of exponential size with respect to n,
then you'll need exponential time to do that.

> The point here is whether
> you can generate S^n - W in Theta( n * |S^n - W| ), which may be
> dramatically different from Theta( n * |S^n| ).

Exactly. Hence, you use a construction that guarantees that the time
needed is proportional to n*|S^n - W|: Every step you do will be
enecessary to produce at least one word in the output set. Now, if 
|S^n - W| is still exponential, then you'll still need exponential
time. But nevertheless, that's the best you can hope for.

>> The automaton is:
>>
>> S: a -> A, b -> B
>> A: a -> A
>> B: b -> B

> The target set is specified as S^n - W, where W is everything matching
> (.*a.*b|.*b.*a).  Following the construction procedure in point, this
> exclusion set is matched exactly by my DFA with S initial and F final.
> Then, swapping final and non-final states makes {S, A, B} final, and F
> non-final.  Your DFA above may be equivalent, but to me it is far from
> clear exactly what algorithm would build it from the given data.

Well, it's just the result from the minimazation algorithm, where my
variant of the algorithm just prunes away the "stuck" state which can
never produce any output.

>> The point is not to "cut efficiently", the point is to enumerate
>> only those words that are actually in the result set.

> No, this is not the point.  Naive filtering already does that. 

No, it doesn't. Naive filtering always will look at the complete
input set, so, no matter what size |S^n - W| actually is, it will
always take time in proportion to |S^n|.

>  By "cutting efficiently" I mean skipping over search sub-trees that
> don't contain any results from the target set.

Yes. Consider a different example: With the wildcard expressions
W = { b*, aa*, ab* }, you'll get S^* - W = { a }. The resulting
minimum FSM will just accept 'a' (start state, one final state, and
the "stuck" state if you insist on it), so you skip over every
other subtree when enumerating results from that automaton.

And for the previous example, you'll need something like 2*n time
to enumerate the output set instead of 2^n, because once you're in
the "a-branch", you're producing only a's, and you're pruning
away all the subtrees that start with a "b". Similarly in the 
"b-branch".

Now clearer?

- Dirk

From: Dinko Tenev
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Fri, 24 Mar 2006 17:32:45 +0000
Message-ID: <1143221565.921043.312080@z34g2000cwc.googlegroups.com>

Dirk Thierbach wrote:

[A lot of stuff]

>
> Now clearer?
>
> - Dirk

Actually, it's getting all the messier, and we seem to be running
around in circles.  I've already lost track of the point you're trying
to make, and it seems that you're missing most of my points.

Let's leave it there, and take a break.

From: Dirk Thierbach
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Sat, 25 Mar 2006 09:29:54 +0000
Message-ID: <20060325092954.60C.0.NOFFLE@dthierbach.news.arcor.de>

Dinko Tenev <···········@gmail.com> wrote:
> Dirk Thierbach wrote:

> [A lot of stuff]

>> Now clearer?

> Let's leave it there, and take a break.

Maybe it would help to just take a concrete example, and work through
it. Then you'll see exactly what happens.

- Dirk

From: funkyj
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 27 Mar 2006 18:26:31 +0000
Message-ID: <1143483991.216107.143010@g10g2000cwb.googlegroups.com>

Going in a slightly different direction ...

There has been lots of published work on how to create efficient FSMs
from regexps.  Generally these FSMs are used for pattern matching (i.e.
"does string 's' match regexp 'e'?").

Is there any corresponding literature on the topic addressed by the
OP's challenge of generating the languaged defined by a regexp (or the
complement of that regexp)?

  --jfc

From: Chris F Clark
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Mon, 27 Mar 2006 20:01:35 +0000
Message-ID: <sddlkuvd5cw.fsf@shell01.TheWorld.com>

Yes, there is literature on the generating side of the regular
expression/FSM model.  In fact, the matching problem and the
generating problems are exactly equivalent.  A slight variation of the
definition of how a matcher works, turns it into a generator and vice
versa.  To directly generate (rather than match) from an FSM, one
simply does a walk (e.g. depth first search or breath first search)
over the machine writing out (rather than matching) the symbols used
as labels.  Thus, all the theorems about matching apply to generating
and also in reverse.

You can see this to some extent form the problem posed.  If one
generates Sigma* and subtracts out the elements from some regular
language (say by matching them), that is exactly equivalent (in
strings generated) to generating the complement of the regular
language.  

In fact, it is quite easy (with the correct regular expression tool,
i.e. one that handles regular expression differences) to take the
problems posed and generate provably minimal (i.e. provably maximally
efficient) generation (or matching) programs as FSMs.  The provably
minimal FSM won't go down any paths that don't have some sequence that
generates an "accept" value.  It is worth noting the regular languages
are closed under the difference operator, so that resulting language
from substracting one regular expression from another is still a
regular language, which can be used to prove that the machine is
minimal.

Therefore, while the output can be exponentially larger than the
input, one should expect that implementations should be able to
generate the output in linear time (relative to the size of the
output), since FSMs run in linear time relative to the string they are
processing (whether generating or matching).  Under a reasonable model
of computation, one cannot do better than linear in the size of string
processed.

I'm sure if you asked under comp.theory, you would get tons of other
relevant facts from someone who understands automata theory better
than I.  Note, if one does ask there, one should correct the notation.
The "*" symbol was used as in globbing, not as commonly used in
regular expressions.  The notation ".*" (as someone corrected in one
of their replies) is the normal notation for what the original poster
wanted by "*".

Hope this helps,
-Chris

*****************************************************************************
Chris Clark                    Internet   :  ·······@world.std.com
Compiler Resources, Inc.       Web Site   :  http://world.std.com/~compres  
23 Bailey Rd                   voice      :  (508) 435-5016
Berlin, MA  01503  USA         fax        :  (978) 838-0263  (24 hours)
------------------------------------------------------------------------------

From: funkyj
Subject: Re: Programming challenge: wildcard exclusion in cartesian products
Date: Tue, 28 Mar 2006 17:32:18 +0000
Message-ID: <1143567138.440847.34180@v46g2000cwv.googlegroups.com>

Chris F Clark wrote:
> Yes, there is literature on the generating side of the regular
> expression/FSM model.  In fact, the matching problem and the
> generating problems are exactly equivalent.  A slight variation of the
> definition of how a matcher works, turns it into a generator and vice
> versa.  To directly generate (rather than match) from an FSM, one
> simply does a walk (e.g. depth first search or breath first search)
> over the machine writing out (rather than matching) the symbols used
> as labels.  Thus, all the theorems about matching apply to generating
> and also in reverse.

 If the language is Sigma* (rather than Sigma^n in the original post)
then doing a depth first search over the FSM requires a stack to
maintain context, right?  I find it interesting that you suggest a PDA
(push down automata) is required to enumerate the language accepted by
an FSM since PDAs are strongly associated with CFGs (context free
grammars).  Does it follow then that a Turing machine is required to
enumerate the language defined by a CFG?  (that was a joke, I think).

> It is worth noting the regular languages
> are closed under the difference operator, so that resulting language
> from substracting one regular expression from another is still a
> regular language,

I was pretty sure this was the case but it has been more than a decade
since I studied computational models in school so I wasn't absolutely
certain.  While I do remember homework that called for creating FSMs
that accepted languages defined by a regexp, I don't recall having
solved the "enumeration" problem.

Your clear and concise comments did a nice job of jogging my memory.

  ===

It seems to me that the purpose of the original challenge was to
compare how different languages implement the solution to the problem.
For such a well understood problem as this the goal of the challenge
(to contrast the languages) is best served when participants have a
good understanding of the 'state of the art' abstract solutions (e.g.
using a stack to traverse a FSM in DFS fashion).