From: HStearns
Subject: Re: complete URL sample code
Date: 
Message-ID: <4g5kdb$7iv@newsbf02.news.aol.com>
     Subject: complete URL sample code
     From: Eric Lavarde <·····@hpbbrd.hp.com>
     Date: Wed, 07 Feb 1996 10:32:33 +0100
     Message-ID: <·············@hpbbrd.hp.com>

     Hi everybody,

     has someone sample code (in LISP of course) to complete URL, i.e.
     given the complete URL of a document A and a link contained in A to
     B, find out the complete link to B.

     Thanks in advance,
     Eric

     -- 
     ***    If you're interested in Europe, ask me what AEGEE is.     ***

     Freiburger Allee 65 * D-71034 B<oe>blingen * Tel: +49-7031/289094
       (<oe> means 'o' with 2 points above it)
     HP: ·····@hpbbrd.hp.com
  Tel: +49-7031/14-1555  Fax: +49-7031/14-4631

     **  Without Sex wouldn't the censors be born; at least, one good  **
     **  reason to suppress Sex on Internet!                           **

Perhaps cl-http has something?

I'm working on a (very large) re-implementation of pathnames which:
  - supports ANSI Common Lisp pathnames properly (!)
  - handles versions for unversioned file systems (Unix), and perhaps
    SCCS/RCS as well.
  - Provides hooks for extending support to other pathname types such as
URL's.

With this, you might have:

  (merge-pathnames "/foo/bar.html"
                   "http://www.cool.com/baz/home/welcom.html")
    => #s(pathname host "http"
                   device "www.cool.com"
                   directory (:absolute "foo")
                   name "bar"
                   type "html"
                   version :unspecific)

You could write your own extension code which would actually enable
you to open such a file for reading.  Otherwise, you could just get
its namestring and do whatever you were going to do anyway.

Does this sound like the right thing?  Would anybody be interested in
this?

From: David J. Fiander
Subject: Re: complete URL sample code
Date: 
Message-ID: <4g9r5n$70e@deci.mks.com>
According to ········@aol.com (HStearns):
>
>  (merge-pathnames "/foo/bar.html"
>                   "http://www.cool.com/baz/home/welcom.html")
>    => #s(pathname host "http"
>                   device "www.cool.com"
>                   directory (:absolute "foo")
>                   name "bar"
>                   type "html"
>                   version :unspecific)

This looks pretty cool, but I'd recommend host be www.cool.com and
device be http.  It makes sense that ftp files and http files on
the same computer should have the same host, but different "devices",
don't you think?

- David
From: HStearns
Subject: Re: complete URL sample code
Date: 
Message-ID: <4ggo40$p73@newsbf02.news.aol.com>
     Subject: Re: complete URL sample code
     From: ······@deci.mks.com (David J. Fiander)
     Date: 19 Feb 1996 07:41:27 -0500
     Message-ID: <··········@deci.mks.com>

     According to ········@aol.com (HStearns):
     >
     >  (merge-pathnames "/foo/bar.html"
     >                   "http://www.cool.com/baz/home/welcome.html")
     >    => #s(pathname host "http"
     >                   device "www.cool.com"
     >                   directory (:absolute "foo")
     >                   name "bar"
     >                   type "html"
     >                   version :unspecific)

     This looks pretty cool, but I'd recommend host be www.cool.com and
     device be http.  It makes sense that ftp files and http files on
     the same computer should have the same host, but different "devices",
     don't you think?

     - David

Good point. My thoughts are:

- Argument for "device://host/path/file.type"
  1. "Everybody" thinks of things like www.cool.com as the host, not
     the device.  It will cause no end of confusion to label them
     otherwise. 
  2. pathname-merge will only supply the device from the defaults when
     the hosts match.  Otherwise the "default device for the host" is
     used.  One wants: (merge-pathnames "ftp:file.tar.gz"
     "http://www.cool.com/pub/welcome.html") =>
     #p"ftp://www.cool.com/pub/file.tar.gz" 

- Argument for "host://device/path/file.type"
  1. It could be realy hard to parse otherwise.  Do I recall correctly
     that news pathnames do not have the double slash:
     "news:rec.gardening"?  It is fairly straightforward under my
     system to make a single URL pathname class which will parse all
     URLs, including news, given that the current
     *default-pathname-defaults* is itself a URL pathname.  I'm not
     sure I could do this for news using the other scheme.
  2. The "default-device" can be as klugy as we want.  For example, it
     can be dynamic based on the "current" URL (very bad idea), or it
     can even be based on the device of the defaults for some pathname
     classes. Thus we can still get the merging behavior we want.
  3. The actual undlerlying access used by OPEN and friends are generic
     functions specialized on the pathname class.  We can cause the
     right specialized thing to happen with either explicit
     dispatching on the pathname-host or implicitly based on
     subclassing URL pathnames based on "host" (i.e. ftp-url-pathname,
     http-url-pathname, etc).  Either way, the guts need to be keyed
     off something to do with the property called "host", and for URL's
     this means "ftp", "http", etc. The rest are just parameters with
     no further specialization necessary.  Thus "www.cool.com" really
     should be thought of as a "device".

If I haven't grossed you out yet, read on.

Another David was kind enough to e-mail me his comments, which I
have taken the liberty of reproducing here: 
     I am not sure if pathnames are the right "base class" for this kind
of
     job.  The most general seems to be some kind of distributed object
     naming scheme, of which much has been written in the literature.

     Also, there is the issue of what kinds of objects the world should be
     divvied into, and what protocols they should follow.  I don't have
any
     answers to that, but here's an example of one way of doing it that
     works pretty well:  In the pathname system for Symbolics Genera,
     pathname classes correspond to differnt operating systems.  There are
     pathanme classes for the Lisp Machine File System, MacOS, MS DOS, 4.2
     BSD UNIX, and lots of others.  When you want to do something to a
file
     named by one of these pathnames you (usually implicitly) create a
file
     access path object (FAP).  FAPs are of different classes depending on
     how files are accessed (local file system, FTP, NFS, etc.)  FAPs
     mediate non-single naming related issues like versioning:  The NFS
     access path class figures out on file creation operations when and
how
     to bump version numbers, and takes care of translating between
     pathnames with version numbers and a convention for representing them
     in UNIX file namestrings.

     --David Gadbois

The guts of my code has various (user definable) methods specialized
on different pathname classes:
 - parsing the host from a namestring
 - parsing an entire namestring and stuffing it into the pathname
 - resolving the pathname into a truename (including version resolution)
 - opening the pathname and returing the right kinds of stream.

One difference in implementation versus FAPs may be that Ia have the
benefit of multi-methods and EQL specialization, rather than just SEND
style method dispatch.  (It's been years since I looked at Genera, so
I may be wrong. I know generic functions were included in Genera 8,
but I assume that pathnames were implemented earlier.)

Clearly, trying to force pathnames to do everything is bound to lead
to kludges.  [Hey why not have pathnames represent sequences,
generators, and gathers, too! ;) ]

For what I'm doing, I just need the underlying pathname flexibility
and not the actual implementation of URL pathnames classes. I just
want to know that my pathnames will be sufficiently powerful to be
extended, and, if someone wants to play with it, offer it to others to
do that extension.
From: Fernando D. Mato Mira
Subject: Re: complete URL sample code
Date: 
Message-ID: <4gik3l$b21@info.epfl.ch>
What about support for several network interfaces (i.e. devices)?
What about different ports at the host?

I think it would be better to extend the pathname spec with
 :port (eg: 80) and :protocol (eg: http, news, ftp, email)


-- 
Fernando D. Mato Mira			 http://ligwww.epfl.ch/matomira.html
Computer Graphics Lab                         	
Swiss Federal Institute of Technology (EPFL)  Phone    : +41 (21) 693 - 5248
CH-1015 Lausanne			      FAX      : +41 (21) 693 - 5328
Switzerland				      E-mail   : ········@di.epfl.ch
                                           
From: John C. Mallery
Subject: Re: complete URL sample code
Date: 
Message-ID: <3136A38C.1544@ai.mit.edu>
http://wilson.ai.mit.edu/cl-http/sources/common-lisp/url.lisp

This implementation implements the URL standard
and has been tested extensively.

It could use a completion facility that is fast enough
for server use.