misc: elisp text processing, modernization of elisp, wikipedia, xahlee.org

From: Xah Lee
Subject: misc: elisp text processing, modernization of elisp, wikipedia, 	xahlee.org
Date: Mon, 07 Jan 2008 01:37:54 +0000
Message-ID: <570c8a26-cdc9-4612-86ba-e4914aa7bee3@q77g2000hsh.googlegroups.com>

just generate a file of all links from xahlee.org to wikipedia.

See
http://xahlee.org/wikipedia_links.html

This is generated using emacs lisp, in a style typical with Perl or
Python.

The code is here:
http://xahlee.org/emacs/elisp_link_report.el

Few worthy notes:

$B!z(B I'm a professional Perl programer since 1998. I find emacs lisp as a
text processing system, quite a lot more powerful and convient than
doing it using Perl. Mainly, because, typically with a programing
language, one reads in a file and use regex. (here we'll not consider
parsing files as text processing, since that's quite specialized task)
However, in emacs, since it's a text editing system, it provides a
buffer representation of files, which allows one to move a cursor
about. This fact, coupled with the fact that emacs is designed for
text editing, comes with literally few thousand functions for this
purpose, makes it far more powerful and convenient. Also, it is rather
interactive, if one wished visually see the programe in action, or
interrupt the program for debugging with all intermediate data
intact...etc. In short, emacs elisp is truely a text-processing
_system_, and perhaps practically the only one in the world.

(Side Note: over the past decade, before i really got my hands down
with elisp in the last couple of years, i have heard many times of
wishes and projects that want to replace elisp with Common Lisp or
Scheme lisp. At those times with my ignorance of deep elisp, i wished
alone the same line, blithely hoping that one day we'll have a emacs
with a better lisp (without me needing to put anything in). (this is
the "i want to believe" syndronme typical with Free Software
Foundation and OpenSource younsters) However, in the past 2 years i
studied elisp, and chance to actually spend maybe 30 minutes about
this issue of modernizing elisp and looked at some websites about such
projects that uses Scheme lisp or Common Lisp. I think now that these
efforts are largely tech geekingly delusional and impractial. To write
in full would take another essay... but basically because, to create a
emacs with Common Lisp or Scheme Lisp as extension language, is not
going to work as one might thought. Because, the basis of such wish is
that a programer would just learn one language (and a better one at
that) and not to have to learn one for X lisp and one for Emacs lisp.
This line of thinking is reasonable, but have hidden, unexpected
flaws. Because, to create such a extension language in a editing
environment, you necessarily have to need a lot concepts and data
types that is extraneous to the language. Namely, emac's buffers,
point, frame, kill-ring...etc and so one. So, even if one day we have
a emacs with Scheme Lisp as the extension lang, but the deeper fact is
that it will actually have perhaps hundreds, of functions inherent in
the system that has nothing to do with Scheme the lang or general
programing. (and, there will be needs or complexities, to separate
what's editing-related function from the language's core functions)
So, the final effect is that, to write any program for text processing
or editor-extension for the New-Lisp Emac, the programer is really,
factually, spending huge amount of time learning about the working of
these functions in this new system, even if he is already a Scheme
expert. In short, given a Scheme Emacs (or Common Lisp emacs), it will
not make that much of a difference as with current Elisp Emacs. There
will be, a major practical impact however, if such system come into
being and widely used. And that is, the benefits of a unification in
general sense of that word; and, the benefit of a lunchbox-packaged,
coherent, IDE for lisp dev. (Note: the fact a somewhat technically
better lisp (CL,Scheme) replacing elisp, has absolutely no practical
benefits, in my opinion. (The "in my opinion" clause i used in the
previous sentence, is only used for its idiom effect for ease of
reading. Its side effect connotation of uncertainty, is not
intended.)  (also note: here we bypassed the consideration of the
effort needed to create such a system. In actuality, it is a fact that
the effort needed to create such as system basically hampered all such
projects (although such system with CL and Scheme already exists (and
to my knowledga, usable to some extent).). And, the need to advertise
such a system to get people to switch from existing system (emacs) is
nigh impossible. The 2 facts that the effort needed for such system
and the effort needed to cover legacy elisp so people can switch,
basically puts a death knell to such systems (or, long, long time down
the road, maybe 5, 10 years. (and with such long expected duration, in
today's technological speeding, i do not think lisp lang itself (in
particular, CL, Scheme, or anything with the cons business), are
suitable or competitive as a high-level general lang of the
2010s)).)))

$B!z(B This is probably the most extensive elisp programing i've done to-
date. One thing i learned is that, if one needs to process thousands
of files, one better create a temp buffer, and insert text into it.
This avoids a bunch processing that are typically associated with
opening files for editing, such as colorization (aka font-locking),
displaying it to the user, etc. Yet, still invoke functions that do
file encoding/decoding, proper line-ending use. (since all my files
are in utf8) In particluar, i learned about the idiom of:

; a elisp idiom for processing a file without user interaction
(save-current-buffer
  (set-buffer (get-buffer-create " xahTemp"))
  (insert-file-contents filePath nil nil nil t)
  ; process it ...
)

(i think this is a idiom. Correct me if i'm wrong pls.)

$B!z(B The cons business in lisp is really a pain in the ass. car, cdr,
caar, cddar!! I am a expert of Mathematica language since about 1995,
and consider myself perhaps the world's top 100. The Mathematica
language is, in a way, a modern lisp. (although, i think to mention
lisp with Mathematica would be thought of like a insult to its creator
Stephen Wolfram) In Mathematica, the language heavily uses lists and
nested lists. In fact, the whole source code of any source code is one
single, deeply nested list. Since i code in Mathematica for over a
decade, i'm throughly familiar and have deep knowledge of the nested
syntax and its abstract tree concept. Since Mathematica source code
are all nested lists, the language provide many facilities to
manipulate such a form. I can, for example, get all the nodes at level
n. Map a function to level n. Map a function to just leafs (which is
considered level -1 in Mathematica). Or, get a particular leaf (e.g.
elisp's "(caadr mylist)" would be "Part[mylist,{2,1,1}]"). Or, getting
several elements at a particular level. (elisp cannot do this, nor do
i think can CL or Scheme in any simple way) I can also get any part(s)
at any level(s) that match a particular type. (this aspect, concerning
pattern matching, as far as i know Mathematica is far, far complete
and flexible than any computer language that exist out there)

Dispite being a expert in trees, the lisp's cons business is truely a
pain to deal with. A large part of my time spent on this text-
processing program is on the debugging the cons business.

In this aspect, it is similar to Perl's list business. In perl, to
work with nested list is a extreme pain in the ass, due to its very
screwed up syntax that are in part spurn from its semantics for nested
lists. (by what's so-called "references". (Xah's quote of the day: $B!V(BIn
any high-level language that involves concepts like "references,
memory allocation, garbage collection, pointers, stacks, cons" in
order to use it, is a low-level, moronic language.$B!W(B))

To work with anything slightly nested, is a pain in Perl. And, it is
why, in Perl programs, there is not much of nested data structures
except maybe at just 2 or 3 levels deep. I think lisp's cons business
severely puts a limit on what people actually use it for. Kinda a joke
consider the origin of the name LISt Processing.

$B!z(B In a few days, i'm gonna put a Lisp Lesson explaining details of
this code. If you have been programing emacs lisp and have any comment
on the code, please let me know. (general lisp comment will also be
appreciated)

$B!z(B Thanks to Rainer Joswig, John Thingstad, Alex Mizrahi, and few
others in the past few days who have answered my lisp questions about
sort, hashmap. Also thanks to Eli Zaretskii, Barry Margolin, and other
people who answered my other posts...

$B!z(B Oh shit, i can really write like a river flows! Originally i also
wanted to write something that gives a overall view of the
significance of the links and mysite. But now this post seems too nice
and cuddly, maybe i shalln't spoil it. But i think i should write on
anyway.... after all, what's the modern blog spirit for!

(second Xah's quote of the day: $B!V(BWhen a person's sanity is at balance,
when human passion is raging, no etiquette must get in the way.$B!W(B--Xah
Lee, 2001.  )

$B!z(B Ok. The over 3 thousand links from my site to Wikipedia, and the
over 3 thousand pages on my site xahlee.org ... If anyone study them
all, and reading all the linked Wikipedia articles, might take a few
years, and it would be roughly equivalent to a Bachelor Degree
obtained in a 4-year college, roughly with geometry and practical
computer programing as a focuse of study so-called "double-major", and
human animal ethology as minor. (these, seemingly disparate subjects,
is what's today called inter-disciplinary study, and becoming more
popular in the last decades, which, in my opinion is more important
than typical single-subject study focus)

  Xah
  ···@xahlee.org
$B-t(B http://xahlee.org/

Re: misc: elisp text processing, modernization of elisp, wikipedia, xahlee.org Xah Lee
- Re: misc: elisp text processing, modernization of elisp, wikipedia, xahlee.org Xah Lee

From: Xah Lee
Subject: Re: misc: elisp text processing, modernization of elisp, wikipedia, 	xahlee.org
Date: Mon, 07 Jan 2008 02:17:24 +0000
Message-ID: <3a6a1347-fec4-4961-939d-244e9d6fe26d@e6g2000prf.googlegroups.com>

Sorry, a quick correction.

Xah wrote:
e.g. elisp's $B!H(B(caadr mylist)$B!I(B would be $B!H(BPart[mylist,{2,1,1}]$B!I(B

It should be:

e.g. elisp's $B!H(B(caadr mylist)$B!I(B would be $B!H(·····@Part[mylist,2,1]$B!I(B

for particular shaped's tree. This example is not a the perfect
example i wanted to use to express the flexibity of mathematica's tree-
parts access paradigm. (it is based on indexes of each node. And
whenever a index position is specified as a list intead, a slice of
that level is created. Also, negative indexes counts from the leafs.

(
For more detail, see:
http://reference.wolfram.com/mathematica/ref/Part.html
http://reference.wolfram.com/mathematica/guide/ListManipulation.html
http://reference.wolfram.com/mathematica/ref/Position.html
http://reference.wolfram.com/mathematica/tutorial/LevelsInExpressions.html
)

the caadr, cddar, etc sublist extraction functions in lisp actually do
not have a corresponding mathematica function using Part. This is
because, they actually just mean a sequence of First/Rest function.
From a general list extraction point of view, a sequence of first/rest
is too idiosyncratic to form a paradigm. So, in mathematica, the
corresponding form would be literally such sequence, for example:

(car mylist) = ·····@mylist
(cdr mylist) = ····@mylist
(caar mylist) = ·····@·····@mylist
(cadr mylist) = ·····@····@mylist
(cdar mylist) = ····@·····@mylist
... etc.

and only in special cases, such as caaar, they can be like Part[mylist,
1,1,1].

(am not found of latched-on corrections to newsgroups posts. (in
general, too silly and insignificant) No more corrections will be
posted.)

  Xah
  ···@xahlee.org
$B-t(B http://xahlee.org/

From: Xah Lee
Subject: Re: misc: elisp text processing, modernization of elisp, wikipedia, 	xahlee.org
Date: Mon, 07 Jan 2008 22:02:59 +0000
Message-ID: <14ef76d1-1749-401e-ab3f-0f1438fa9c13@f47g2000hsd.googlegroups.com>

I have just finished writing the tutorial. Pls see:

$B!z(B Elisp Lesson: Process A Thousand Files
http://xahlee.org/emacs/elisp_link_report.html

  Xah
  ···@xahlee.org
$B-t(B http://xahlee.org/