Computing Industry shams

From: Xah Lee
Subject: Computing Industry shams
Date: Sat, 07 May 2005 23:40:01 +0000
Message-ID: <1115509201.680334.140840@f14g2000cwb.googlegroups.com>

Let me expose one another fucking incompetent part of Python doc, in
illustration of the Info Tech industry's masturbation and ignorant
nature.

The official Python doc on regex syntax (
http://python.org/doc/2.4/lib/re-syntax.html ) says:

--begin quote--

"|"
A|B, where A and B can be arbitrary REs, creates a regular expression
that will match either A or B. An arbitrary number of REs can be
separated by the "|" in this way. This can be used inside groups (see
below) as well. As the target string is scanned, REs separated by "|"
are tried from left to right. When one pattern completely matches, that
branch is accepted. This means that once A matches, B will not be
tested further, even if it would produce a longer overall match. In
other words, the "|" operator is never greedy. To match a literal "|",
use \|, or enclose it inside a character class, as in [|].

--end quote--

Note: “In other words, the "|" operator is never greedy.”

Note the need to inject the high-brow jargon “greedy” here as a
latch on sentence.

“never greedy”? What is greedy anyway?

“Greedy”, when used in the context of computing, describes a
certain characteristics of algorithms. When a algorithm for a
minimizing/maximizing problem is such that, whenever it faced a choice
it simply chose the shortest path, without considering whether that
choice actually results in a optimal solution.

The rub is that such stratedgy will often not obtain optimal result in
most problems. If you go from New York to San Francisco and always
choose the road most directly facing your destination, you'll never get
on.

For a algorithm to be greedy, it is implied that it faces choices. In
the case of alternatives in regex "regex1|regex2|regex3", there is
really no selection involved, but following a given sequence.

What the writer were thinking when he latched on about greediness, is
that the result may not be from the pattern that matches the most
substring, therefore it is not “greedy”. It's not greedy Python
docer's ass.

Such blind jargon throwing, as found everywhere in tech docs, is a
significant reason why the computing industry is filled with shams the
likes of unix, Perl, Programing Patterns, eXtreme Programing,
“Universal Modeling Language”, fucking shits.

----
A better writen doc for the complete regex module is at:
http://xahlee.org/perl-python/python_re-write/lib/module-re.html

See also: Responsible Software Licensing
http://xahlee.org/UnixResource_dir/writ/responsible_license.html

 Xah
 ···@xahlee.org
∑ http://xahlee.org/

Language documentation ( was Re: Computing Industry shams) vermicule
- Re: Language documentation ( was Re: Computing Industry shams) Måns Rullgård
- Re: Language documentation ( was Re: Computing Industry shams) alex goldman
  - Re: Language documentation ( was Re: Computing Industry shams) Sean Burke
    - Re: Language documentation ( was Re: Computing Industry shams) alex goldman
      - Re: Language documentation ( was Re: Computing Industry shams) Lawrence Kirby
        Re: Language documentation ( was Re: Computing Industry shams) alex goldman
        Re: Language documentation ( was Re: Computing Industry shams) Lawrence Kirby
        Re: Language documentation ( was Re: Computing Industry shams) alex goldman
        Re: Language documentation ( was Re: Computing Industry shams) Keith Thompson
Re: Computing Industry shams Keith Thompson
Re: Computing Industry shams Mark McIntyre
Re: Computing Industry shams ···@xahlee.org
- Re: Computing Industry shams Keith Thompson
- Re: Computing Industry shams CBFalconer

From: vermicule
Subject: Language documentation ( was Re: Computing Industry shams)
Date: Sun, 08 May 2005 00:23:35 +0000
Message-ID: <874qdezh76.fsf@kafka.homenet>

"Xah Lee" <···@xahlee.org> writes:

> A|B, where A and B can be arbitrary REs, creates a regular expression
> that will match either A or B. An arbitrary number of REs can be
> separated by the "|" in this way. This can be used inside groups (see
> below) as well. As the target string is scanned, REs separated by "|"
> are tried from left to right. When one pattern completely matches, that
> branch is accepted. This means that once A matches, B will not be
> tested further, even if it would produce a longer overall match. In
> other words, the "|" operator is never greedy. To match a literal "|",
> use \|, or enclose it inside a character class, as in [|].
>
> --end quote--
>
> Note: In other words, the "|" operator is never greedy.
>
> Note the need to inject the high-brow jargon "greedy"here as a
> latch on sentence.

What is so hard to understand ?
Should be perfectly clear even to a first year undergraduate.

As for "greedy" even a minimal exposure to Djikstra's shortest path
algorithm would have made the concept intuitive. And from memory,
that is the sort of thing done in Computing 101 and in  Data Structures and
Algorithms 101

It seems to me that you want the Python doc to be written for morons.
And that is not a valid complaint.

From: Måns Rullgård
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Sun, 08 May 2005 00:35:12 +0000
Message-ID: <yw1xr7gih7wf.fsf@ford.inprovide.com>

vermicule <······@bigpond.net.au> writes:

> "Xah Lee" <···@xahlee.org> writes:

[...]

>
> It seems to me that you want the Python doc to be written for morons.

Not for morons, but for trolls.  Don't feed them.

-- 
M�ns Rullg�rd
···@inprovide.com

From: alex goldman
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Sun, 08 May 2005 18:53:10 +0000
Message-ID: <1415322.AzhuxSBQdk@yahoo.com>

vermicule wrote:

> 
> What is so hard to understand ?
> Should be perfectly clear even to a first year undergraduate.
> 
> As for "greedy" even a minimal exposure to Djikstra's shortest path
> algorithm would have made the concept intuitive. And from memory,
> that is the sort of thing done in Computing 101 and in  Data Structures
> and Algorithms 101
> 
> It seems to me that you want the Python doc to be written for morons.
> And that is not a valid complaint.

He's right actually. If we understand the term "greedy" as it's used in
graph search and optimization algorithms, Python's RE matching actually IS
greedy.

From: Sean Burke
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Mon, 09 May 2005 22:43:37 +0000
Message-ID: <x78y2ot3zq.fsf@bolo.xenadyne.com>

alex goldman <·····@spamm.er> writes:

> vermicule wrote:
> 
> > 
> > What is so hard to understand ?
> > Should be perfectly clear even to a first year undergraduate.
> > 
> > As for "greedy" even a minimal exposure to Djikstra's shortest path
> > algorithm would have made the concept intuitive. And from memory,
> > that is the sort of thing done in Computing 101 and in  Data Structures
> > and Algorithms 101
> > 
> > It seems to me that you want the Python doc to be written for morons.
> > And that is not a valid complaint.
> 
> He's right actually. If we understand the term "greedy" as it's used in
> graph search and optimization algorithms, Python's RE matching actually IS
> greedy.

No, you're just confused about the optimization metric.
In regexes, "greedy" match optimizes for the longest match,
not the fastest.

And this is common regex terminology - man perlre and you will
find discussion of "greedy" vs. "stingy" matching.

-SEan

From: alex goldman
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Tue, 10 May 2005 11:58:48 +0000
Message-ID: <1955897.My2EmLpO4s@yahoo.com>

Sean Burke wrote:

> 
> alex goldman <·····@spamm.er> writes:
> 
>> vermicule wrote:
>> 
>> > 
>> > What is so hard to understand ?
>> > Should be perfectly clear even to a first year undergraduate.
>> > 
>> > As for "greedy" even a minimal exposure to Djikstra's shortest path
>> > algorithm would have made the concept intuitive. And from memory,
>> > that is the sort of thing done in Computing 101 and in  Data Structures
>> > and Algorithms 101
>> > 
>> > It seems to me that you want the Python doc to be written for morons.
>> > And that is not a valid complaint.
>> 
>> He's right actually. If we understand the term "greedy" as it's used in
>> graph search and optimization algorithms, Python's RE matching actually
>> IS greedy.
> 
> No, you're just confused about the optimization metric.
> In regexes, "greedy" match optimizes for the longest match,
> not the fastest.
> 
> And this is common regex terminology - man perlre and you will
> find discussion of "greedy" vs. "stingy" matching.

Read what you quoted again. Everyone (Xah, vermicule, myself) was talking
about "greedy" as it's used in graph search and optimization algorithms.

From: Lawrence Kirby
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Tue, 10 May 2005 14:07:12 +0000
Message-ID: <pan.2005.05.10.14.08.14.485000@netactive.co.uk>

On Tue, 10 May 2005 04:58:48 -0700, alex goldman wrote:

> Sean Burke wrote:

...

>> No, you're just confused about the optimization metric.
>> In regexes, "greedy" match optimizes for the longest match,
>> not the fastest.
>> 
>> And this is common regex terminology - man perlre and you will
>> find discussion of "greedy" vs. "stingy" matching.
> 
> Read what you quoted again. Everyone (Xah, vermicule, myself) was talking
> about "greedy" as it's used in graph search and optimization algorithms.

However the original quote was in the context of regular expressions, so
discussion of the terminology used in regular expressions is far more
relevant than the terminology used in graph search and optimisation
algorithms.

Lawrence

From: alex goldman
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Tue, 10 May 2005 13:52:18 +0000
Message-ID: <1212926.3xzUY5Pt1A@yahoo.com>

Lawrence Kirby wrote:

> On Tue, 10 May 2005 04:58:48 -0700, alex goldman wrote:
> 
>> Sean Burke wrote:
> 
> ...
> 
>>> No, you're just confused about the optimization metric.
>>> In regexes, "greedy" match optimizes for the longest match,
>>> not the fastest.
>>> 
>>> And this is common regex terminology - man perlre and you will
>>> find discussion of "greedy" vs. "stingy" matching.
>> 
>> Read what you quoted again. Everyone (Xah, vermicule, myself) was talking
>> about "greedy" as it's used in graph search and optimization algorithms.
> 
> However the original quote was in the context of regular expressions, so
> discussion of the terminology used in regular expressions is far more
> relevant than the terminology used in graph search and optimisation
> algorithms.

I replied to "And from memory, that is the sort of thing done in Computing
101 and in  Data Structures and Algorithms 101", and I fully explained what
I meant by "greedy" as well. There was no ambiguity.

From: Lawrence Kirby
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Tue, 10 May 2005 17:46:33 +0000
Message-ID: <pan.2005.05.10.17.47.38.704000@netactive.co.uk>

On Tue, 10 May 2005 06:52:18 -0700, alex goldman wrote:

> Lawrence Kirby wrote:

...

>> However the original quote was in the context of regular expressions, so
>> discussion of the terminology used in regular expressions is far more
>> relevant than the terminology used in graph search and optimisation
>> algorithms.
> 
> I replied to "And from memory, that is the sort of thing done in Computing
> 101 and in  Data Structures and Algorithms 101", and I fully explained what
> I meant by "greedy" as well. There was no ambiguity.

My response talks about relevance, not ambiguity.

Lawrence

From: alex goldman
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Tue, 10 May 2005 18:02:19 +0000
Message-ID: <1562690.uMvC404mSK@yahoo.com>

Lawrence Kirby wrote:

> On Tue, 10 May 2005 06:52:18 -0700, alex goldman wrote:
> 
>> Lawrence Kirby wrote:
> 
> ...
> 
>>> However the original quote was in the context of regular expressions, so
>>> discussion of the terminology used in regular expressions is far more
>>> relevant than the terminology used in graph search and optimisation
>>> algorithms.
>> 
>> I replied to "And from memory, that is the sort of thing done in
>> Computing
>> 101 and in  Data Structures and Algorithms 101", and I fully explained
>> what I meant by "greedy" as well. There was no ambiguity.
> 
> My response talks about relevance, not ambiguity.

Well, your response was irrelevant.

From: Keith Thompson
Subject: Re: Language documentation ( was Re: Computing Industry shams)
Date: Tue, 10 May 2005 18:58:34 +0000
Message-ID: <lnzmv2nc14.fsf@nuthaus.mib.org>

alex goldman <·····@spamm.er> writes:
> Lawrence Kirby wrote:
[snip]
>> My response talks about relevance, not ambiguity.
>
> Well, your response was irrelevant.

This entire discussion is irrelevant to most, if not all, of the
newsgroups to which it's being posted.  comp.lang.c, where I'm reading
this, is for discussion of the C programming language; I see nothing
about C.

-- 
Keith Thompson (The_Other_Keith) ·····@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.

From: Keith Thompson
Subject: Re: Computing Industry shams
Date: Sun, 08 May 2005 00:50:37 +0000
Message-ID: <lnekcipmlf.fsf@nuthaus.mib.org>

"Xah Lee" <···@xahlee.org> writes:
[snip]

There's probably no point in asking Xah Lee not to cross-post his
rants, but if you feel the need to post a followup, *please* post only
to appropriate newsgroups.  I read this in comp.lang.c, where neither
the original post nor any of the followups are even vaguely topical.

If you must post a followup, limit it to newsgroups where it's
appropriate.  If there are none, just don't post.

I've directed followups on this article to /dev/null.

-- 
Keith Thompson (The_Other_Keith) ·····@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.

From: Mark McIntyre
Subject: Re: Computing Industry shams
Date: Sun, 08 May 2005 07:08:34 +0000
Message-ID: <bler71lhl9q9m8pncmv3i6enfmvb9gtq3j@4ax.com>

On 7 May 2005 16:40:01 -0700, in comp.lang.c , "Xah Lee"
<···@xahlee.org> wrote:

>Let me expose one another fucking incompetent part of Python doc,

if you really must speak in tongues, at least do it in private. 

now fsck off already.

-- 
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

From: ···@xahlee.org
Subject: Re: Computing Industry shams
Date: Tue, 10 May 2005 21:18:29 +0000
Message-ID: <1115759909.364853.37840@f14g2000cwb.googlegroups.com>

HTML Problems in Python Doc

I don't know what kind of system is used to generate the Python docs,
but it is quite unpleasant to work with manually, as there are
egregious errors and inconsistencies.

For example, on the “Module Contents” page (
http://python.org/doc/2.4/lib/node111.html ), the closing tags for <dd>
are never used, and all the tags are in lower case. However, on the
regex syntax page ( http://python.org/doc/2.4/lib/re-syntax.html ), the
closing tages for <dd> are given, and all tages are in caps.

The doc's first lines declare a type of:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

yet in the files they uses "/>" to close image tags, which is a XHTML
syntax.

the doc litters <p> and never closes them, making it a illegal
XML/XHTML by breaking the minimal requirement of well-formedness.

Asides from correctness, the code is quite bloated as is generally true
of generated HTML. For example, it is littered with: <tt id='l2h-853'
xml:id='l2h-853'> which isn't used in the style sheet, and i don't
think those ids can serve any purpose other than in style sheet.

Although the doc uses a huge style sheet and almost every tag comes
with a class or id attribute, but it also profusively uses hard-coded
style tags like <b>, <big> and Netcsape's <nobr>.

It also abuse tables that effectively does nothing. Here's a typical
line:
<table cellpadding="0" cellspacing="0"><tr valign="baseline">
  <td><nobr><b><tt id='l2h-851' xml:id='l2h-851'
class="function">compile</tt></b>(</nobr></td>
  <td><var>pattern</var><big>[</big><var>,
flags</var><big>]</big><var></var>)</td></tr></table>


If Python is supposed to be a quality language, then its
documentation's content and code seems to indicate otherwise.

This post is archived at:
http://xahlee.org/perl-python/re-write_notes.html

 Xah
 ···@xahlee.org
∑ http://xahlee.org/

From: Keith Thompson
Subject: Re: Computing Industry shams
Date: Tue, 10 May 2005 21:47:00 +0000
Message-ID: <lnacn2n48d.fsf@nuthaus.mib.org>

···@xahlee.org writes:
> HTML Problems in Python Doc
[snip]

         +-------------------+             .:\:\:/:/:.
         |   PLEASE DO NOT   |            :.:\:\:/:/:.:
         |  FEED THE TROLLS  |           :=.' -   - '.=:
         |                   |           '=(\ 9   9 /)='
         |   Thank you,      |              (  (_)  )
         |       Management  |              /`-vvv-'\
         +-------------------+             /         \
                 |  |        @@@          / /|,,,,,|\ \
                 |  |        @@@         /_//  /^\  \\_\
   @·@@·@        |  |         |/         WW(  (   )  )WW
   \||||/        |  |        \|           __\,,\ /,,/__
    \||/         |  |         |      jgs (______Y______)
/\/\/\/\/\/\/\/\//\/\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
==============================================================

-- 
Keith Thompson (The_Other_Keith) ·····@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.

From: CBFalconer
Subject: Re: Computing Industry shams
Date: Wed, 11 May 2005 00:23:29 +0000
Message-ID: <42812F29.FB708F6D@yahoo.com>

···@xahlee.org wrote:
> 
> HTML Problems in Python Doc
> 
> I don't know what kind of system is used to generate the Python docs,
> but it is quite unpleasant to work with manually, as there are
> egregious errors and inconsistencies.

PLONK for egregious cross-posting of off-topic nonsense.

-- 
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson