Subject: complete URL sample code
From: Eric Lavarde <·····@hpbbrd.hp.com>
Date: Wed, 07 Feb 1996 10:32:33 +0100
Message-ID: <·············@hpbbrd.hp.com>
Hi everybody,
has someone sample code (in LISP of course) to complete URL, i.e.
given the complete URL of a document A and a link contained in A to
B, find out the complete link to B.
Thanks in advance,
Eric
--
*** If you're interested in Europe, ask me what AEGEE is. ***
Freiburger Allee 65 * D-71034 B<oe>blingen * Tel: +49-7031/289094
(<oe> means 'o' with 2 points above it)
HP: ·····@hpbbrd.hp.com
Tel: +49-7031/14-1555 Fax: +49-7031/14-4631
** Without Sex wouldn't the censors be born; at least, one good **
** reason to suppress Sex on Internet! **
Perhaps cl-http has something?
I'm working on a (very large) re-implementation of pathnames which:
- supports ANSI Common Lisp pathnames properly (!)
- handles versions for unversioned file systems (Unix), and perhaps
SCCS/RCS as well.
- Provides hooks for extending support to other pathname types such as
URL's.
With this, you might have:
(merge-pathnames "/foo/bar.html"
"http://www.cool.com/baz/home/welcom.html")
=> #s(pathname host "http"
device "www.cool.com"
directory (:absolute "foo")
name "bar"
type "html"
version :unspecific)
You could write your own extension code which would actually enable
you to open such a file for reading. Otherwise, you could just get
its namestring and do whatever you were going to do anyway.
Does this sound like the right thing? Would anybody be interested in
this?
From: David J. Fiander
Subject: Re: complete URL sample code
Date:
Message-ID: <4g9r5n$70e@deci.mks.com>
According to ········@aol.com (HStearns):
>
> (merge-pathnames "/foo/bar.html"
> "http://www.cool.com/baz/home/welcom.html")
> => #s(pathname host "http"
> device "www.cool.com"
> directory (:absolute "foo")
> name "bar"
> type "html"
> version :unspecific)
This looks pretty cool, but I'd recommend host be www.cool.com and
device be http. It makes sense that ftp files and http files on
the same computer should have the same host, but different "devices",
don't you think?
- David
Subject: Re: complete URL sample code
From: ······@deci.mks.com (David J. Fiander)
Date: 19 Feb 1996 07:41:27 -0500
Message-ID: <··········@deci.mks.com>
According to ········@aol.com (HStearns):
>
> (merge-pathnames "/foo/bar.html"
> "http://www.cool.com/baz/home/welcome.html")
> => #s(pathname host "http"
> device "www.cool.com"
> directory (:absolute "foo")
> name "bar"
> type "html"
> version :unspecific)
This looks pretty cool, but I'd recommend host be www.cool.com and
device be http. It makes sense that ftp files and http files on
the same computer should have the same host, but different "devices",
don't you think?
- David
Good point. My thoughts are:
- Argument for "device://host/path/file.type"
1. "Everybody" thinks of things like www.cool.com as the host, not
the device. It will cause no end of confusion to label them
otherwise.
2. pathname-merge will only supply the device from the defaults when
the hosts match. Otherwise the "default device for the host" is
used. One wants: (merge-pathnames "ftp:file.tar.gz"
"http://www.cool.com/pub/welcome.html") =>
#p"ftp://www.cool.com/pub/file.tar.gz"
- Argument for "host://device/path/file.type"
1. It could be realy hard to parse otherwise. Do I recall correctly
that news pathnames do not have the double slash:
"news:rec.gardening"? It is fairly straightforward under my
system to make a single URL pathname class which will parse all
URLs, including news, given that the current
*default-pathname-defaults* is itself a URL pathname. I'm not
sure I could do this for news using the other scheme.
2. The "default-device" can be as klugy as we want. For example, it
can be dynamic based on the "current" URL (very bad idea), or it
can even be based on the device of the defaults for some pathname
classes. Thus we can still get the merging behavior we want.
3. The actual undlerlying access used by OPEN and friends are generic
functions specialized on the pathname class. We can cause the
right specialized thing to happen with either explicit
dispatching on the pathname-host or implicitly based on
subclassing URL pathnames based on "host" (i.e. ftp-url-pathname,
http-url-pathname, etc). Either way, the guts need to be keyed
off something to do with the property called "host", and for URL's
this means "ftp", "http", etc. The rest are just parameters with
no further specialization necessary. Thus "www.cool.com" really
should be thought of as a "device".
If I haven't grossed you out yet, read on.
Another David was kind enough to e-mail me his comments, which I
have taken the liberty of reproducing here:
I am not sure if pathnames are the right "base class" for this kind
of
job. The most general seems to be some kind of distributed object
naming scheme, of which much has been written in the literature.
Also, there is the issue of what kinds of objects the world should be
divvied into, and what protocols they should follow. I don't have
any
answers to that, but here's an example of one way of doing it that
works pretty well: In the pathname system for Symbolics Genera,
pathname classes correspond to differnt operating systems. There are
pathanme classes for the Lisp Machine File System, MacOS, MS DOS, 4.2
BSD UNIX, and lots of others. When you want to do something to a
file
named by one of these pathnames you (usually implicitly) create a
file
access path object (FAP). FAPs are of different classes depending on
how files are accessed (local file system, FTP, NFS, etc.) FAPs
mediate non-single naming related issues like versioning: The NFS
access path class figures out on file creation operations when and
how
to bump version numbers, and takes care of translating between
pathnames with version numbers and a convention for representing them
in UNIX file namestrings.
--David Gadbois
The guts of my code has various (user definable) methods specialized
on different pathname classes:
- parsing the host from a namestring
- parsing an entire namestring and stuffing it into the pathname
- resolving the pathname into a truename (including version resolution)
- opening the pathname and returing the right kinds of stream.
One difference in implementation versus FAPs may be that Ia have the
benefit of multi-methods and EQL specialization, rather than just SEND
style method dispatch. (It's been years since I looked at Genera, so
I may be wrong. I know generic functions were included in Genera 8,
but I assume that pathnames were implemented earlier.)
Clearly, trying to force pathnames to do everything is bound to lead
to kludges. [Hey why not have pathnames represent sequences,
generators, and gathers, too! ;) ]
For what I'm doing, I just need the underlying pathname flexibility
and not the actual implementation of URL pathnames classes. I just
want to know that my pathnames will be sufficiently powerful to be
extended, and, if someone wants to play with it, offer it to others to
do that extension.
From: Fernando D. Mato Mira
Subject: Re: complete URL sample code
Date:
Message-ID: <4gik3l$b21@info.epfl.ch>
What about support for several network interfaces (i.e. devices)?
What about different ports at the host?
I think it would be better to extend the pathname spec with
:port (eg: 80) and :protocol (eg: http, news, ftp, email)
--
Fernando D. Mato Mira http://ligwww.epfl.ch/matomira.html
Computer Graphics Lab
Swiss Federal Institute of Technology (EPFL) Phone : +41 (21) 693 - 5248
CH-1015 Lausanne FAX : +41 (21) 693 - 5328
Switzerland E-mail : ········@di.epfl.ch
From: John C. Mallery
Subject: Re: complete URL sample code
Date:
Message-ID: <3136A38C.1544@ai.mit.edu>
http://wilson.ai.mit.edu/cl-http/sources/common-lisp/url.lisp
This implementation implements the URL standard
and has been tested extensively.
It could use a completion facility that is fast enough
for server use.