From: deech
Subject: Convert html diacritics to unicode
Date: 
Message-ID: <f5f96f0d-7e0a-489e-8255-88017cf1f0d1@g61g2000hsf.googlegroups.com>
Hi all,
I am trying to convert an HTML page that includes accent characters
into unicode. Is there a way to do this is Common Lisp?

Here is a small snippet:
<br>sat&eacute

This snippet prints the word saté.

Thanks ,
Deech

From: Jens Teich
Subject: Re: Convert html diacritics to unicode
Date: 
Message-ID: <u3aiduaku.fsf@jensteich.de>
deech <············@gmail.com> writes:

> Hi all,
> I am trying to convert an HTML page that includes accent characters
> into unicode. Is there a way to do this is Common Lisp?
>
> Here is a small snippet:
> <br>sat&eacute
>
> This snippet prints the word sat�.
>
> Thanks ,
> Deech

CL-USER 1 > 
#<The CL-WHO package, 58/128 internal, 28/64 external>

CL-WHO 2 > (with-html-output (*standard-output*)
             (esc "�"))
&#xE9;
"&#xE9;"

-jens
From: deech
Subject: Re: Convert html diacritics to unicode
Date: 
Message-ID: <e61f0faf-bcfd-4324-9cd5-ee6fcb2f464f@j22g2000hsf.googlegroups.com>
I'm looking to convert HTML to Unicode. CL-WHO seems to convert
Unicode to HTML.

Thanks for your quick response,
Deech

Jens Teich wrote:
> deech <············@gmail.com> writes:
>
> > Hi all,
> > I am trying to convert an HTML page that includes accent characters
> > into unicode. Is there a way to do this is Common Lisp?
> >
> > Here is a small snippet:
> > <br>sat&eacute
> >
> > This snippet prints the word sat�.
> >
> > Thanks ,
> > Deech
>
> CL-USER 1 >
> #<The CL-WHO package, 58/128 internal, 28/64 external>
>
> CL-WHO 2 > (with-html-output (*standard-output*)
>              (esc "�"))
> &#xE9;
> "&#xE9;"
>
> -jens
From: Tamas K Papp
Subject: Re: Convert html diacritics to unicode
Date: 
Message-ID: <6mumnlFitlr4U1@mid.individual.net>
On Thu, 30 Oct 2008 12:33:51 -0700, deech wrote:

> Hi all,
> I am trying to convert an HTML page that includes accent characters into
> unicode. Is there a way to do this is Common Lisp?

Yes.  Unless you need to verify the correctness of the input or need some 
output format other than HTML, a simple algorithm that replaces strings 
using a table (eg "&eacute" -> "é") should suffice.

Search for the terms "replace string" in the c.l.l archives (eg using 
Google groups).

HTH,

Tamas
From: deech
Subject: Re: Convert html diacritics to unicode
Date: 
Message-ID: <1261571d-b657-4a7c-a50f-2c01c00de1f9@u65g2000hsc.googlegroups.com>
Running html-entities:decode HTML does the trick. Thanks!
-deech

On Oct 30, 4:19 pm, Tamas K Papp <······@gmail.com> wrote:
> On Thu, 30 Oct 2008 12:33:51 -0700, deech wrote:
> > Hi all,
> > I am trying to convert an HTML page that includes accent characters into
> > unicode. Is there a way to do this is Common Lisp?
>
> Yes.  Unless you need to verify the correctness of the input or need some
> output format other than HTML, a simple algorithm that replaces strings
> using a table (eg "&eacute" -> "é") should suffice.
>
> Search for the terms "replace string" in the c.l.l archives (eg using
> Google groups).
>
> HTH,
>
> Tamas
From: Harald Hanche-Olsen
Subject: Re: Convert html diacritics to unicode
Date: 
Message-ID: <pcobpx1wz6m.fsf@math.ntnu.no>
+ deech <············@gmail.com>:

> I am trying to convert an HTML page that includes accent characters
> into unicode. Is there a way to do this is Common Lisp?
>
> Here is a small snippet:
> <br>sat&eacute
>
> This snippet prints the word sat�.

Look at http://www.cliki.net/html-entities for example.
There may be more options at http://www.cliki.net/web
(where I found that one).

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell