Rndzl Problem

From: Mikael
Subject: Rndzl Problem
Date: Fri, 01 Feb 2008 17:06:14 +0000
Message-ID: <e7ca380d-d2f1-4ebb-b7a7-4df67f66899a@s13g2000prd.googlegroups.com>

Hello,
 I'm having a problem with using rndzl. I'm using it to send a string
to a simple .net application I am using as an interface. The program
works in that all the code runs and does something, but the string I
send to rndzl is turned into what looks to be chinese or japanese
characters.

An example:
The string my lisp program produces:
Cyan: 37.50 %
Magenta: 37.50 %
Yellow: 37.50 %
Black: 12.50 %
Total Byte Count: 32

The string as printed by .net:
说礏 礌㠸礌㮐礐

The lisp I am using is ECL. For .net I am using Visual Studio 2005 C#,
producing code for .net version 2.

I look forward to any help that the experts here can provide, and if
my current approach is unworkable, I would like some advise on which
windows graphics library to use.

Re: Rndzl Problem Maciej Katafiasz
- Re: Rndzl Problem Edi Weitz
  - Re: Rndzl Problem Maciej Katafiasz
    - Re: Rndzl Problem Edi Weitz
      - Re: Rndzl Problem Maciej Katafiasz
        Re: Rndzl Problem Maciej Katafiasz
        Re: Rndzl Problem John Thingstad
        Re: Rndzl Problem Alex Mizrahi
        Re: Rndzl Problem Maciej Katafiasz
        Re: Rndzl Problem Alex Mizrahi
  - Re: Rndzl Problem Øyvin Halfdan Thuv
  - Re: Rndzl Problem ···@telent.net
    - Re: Rndzl Problem Edi Weitz
      - Re: Rndzl Problem Kent M Pitman
- Re: Rndzl Problem George Neuner
Re: Rndzl Problem Edi Weitz
- Re: Rndzl Problem Mikael
Re: Rndzl Problem John Thingstad

From: Maciej Katafiasz
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 17:38:24 +0000
Message-ID: <fnvleg$aqs$1@news.net.uni-c.dk>

Den Fri, 01 Feb 2008 09:06:14 -0800 skrev Mikael:

> An example:
> The string my lisp program produces:
> Cyan: 37.50 %
> Magenta: 37.50 %
> Yellow: 37.50 %
> Black: 12.50 %
> Total Byte Count: 32
> 
> The string as printed by .net:
> 说礏 礌㠸礌㮐礐
> 
> The lisp I am using is ECL. For .net I am using Visual Studio 2005 C#,
> producing code for .net version 2.

As Edi said, this belongs to RNDZL's mailing list, but just as a hint, 
this kind of corruption suggests .NET is expecting UTF-16 strings. 
Windows is rather braindamaged about using Unicode, so that wouldn't be 
totally unexpected.

Cheers,
Maciej

From: Edi Weitz
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 19:42:30 +0000
Message-ID: <ulk64tqe1.fsf@agharta.de>

On Fri, 1 Feb 2008 17:38:24 +0000 (UTC), Maciej Katafiasz <········@gmail.com> wrote:

> As Edi said, this belongs to RNDZL's mailing list

RDNZL.

> but just as a hint, this kind of corruption suggests .NET is
> expecting UTF-16 strings.  Windows is rather braindamaged about
> using Unicode, so that wouldn't be totally unexpected.

AFAIK the API functions usually have defined and documented semantics
w.r.t. character encodings.  What's the brain damaged part of it?

Edi.

-- 

European Common Lisp Meeting, Amsterdam, April 19/20, 2008

  http://weitz.de/eclm2008/

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Maciej Katafiasz
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 20:03:17 +0000
Message-ID: <fnvtu5$bel$5@news.net.uni-c.dk>

Den Fri, 01 Feb 2008 20:42:30 +0100 skrev Edi Weitz:

>> As Edi said, this belongs to RNDZL's mailing list
> 
> RDNZL.

Sorry, I tried three times to spell it properly :(

>> but just as a hint, this kind of corruption suggests .NET is expecting
>> UTF-16 strings.  Windows is rather braindamaged about using Unicode, so
>> that wouldn't be totally unexpected.
> 
> AFAIK the API functions usually have defined and documented semantics
> w.r.t. character encodings.  What's the brain damaged part of it?

Using UTF-16 as externally visible format.

Cheers,
Maciej

From: Edi Weitz
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 20:38:31 +0000
Message-ID: <uwspos988.fsf@agharta.de>

On Fri, 1 Feb 2008 20:03:17 +0000 (UTC), Maciej Katafiasz <········@gmail.com> wrote:

> Den Fri, 01 Feb 2008 20:42:30 +0100 skrev Edi Weitz:
>
>> What's the brain damaged part of it?
>
> Using UTF-16 as externally visible format.

Why?

-- 

European Common Lisp Meeting, Amsterdam, April 19/20, 2008

  http://weitz.de/eclm2008/

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Maciej Katafiasz
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 21:49:00 +0000
Message-ID: <fo044c$bel$8@news.net.uni-c.dk>

Den Fri, 01 Feb 2008 21:38:31 +0100 skrev Edi Weitz:

>>> What's the brain damaged part of it?
>>
>> Using UTF-16 as externally visible format.
> 
> Why?

Because it's a stupid format with all the problems of UTF-8 (and then 
some more) and none of its advantages. It's generally inefficient for 
text that stays within ASCII and offers no space advantage outside of it, 
won't pass cleanly through 8-bit systems (and vice versa, see the OP), 
has compatibility issues with earlier UCS-2 systems[1] and formats that 
don't understand surrogates, can't address codepoints above BMP directly, 
so it has no advantage of fixed width, introduces endianness headaches 
and doesn't synchronise the way UTF-8 does, so a corrupted UTF-16 stream 
is in principle completely unrecoverable.

Cheers,
Maciej

[1] Compatibility with UCS-2 is really the only reason anyone could even 
consider UTF-16. But I'm of a strong opinion that MSFT should've phased 
out all UTF-16 primary APIs and only left them for compatibility. But 
they didn't do that, in fact, notepad.exe will actually write out UTF-16 
when you tell it to save "unicode text", which is just ridiculous.

From: Maciej Katafiasz
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 23:25:05 +0000
Message-ID: <fo09oh$dc0$1@news.net.uni-c.dk>

Den Fri, 01 Feb 2008 21:49:00 +0000 skrev Maciej Katafiasz:

> (...) has compatibility issues with earlier UCS-2 systems[1] and formats
> (...)
> [1] Compatibility with UCS-2 is really the only reason anyone could even
> consider UTF-16. (...)

Hmm, I now realise it came out a bit incoherent. What I meant to say is: 
UTF-16 has UCS-2 compatibility problems, in that it's generally 
impossible to determine whether you're dealing with UTF-16 or UCS-2, both 
in data formats and in capabilities of systems that process them. Which 
is yet another reason to move away from UTF-16 as a non-deprecated 
format. However, despite the aforementioned issues, compatibility with 
pre-existing UCS-2 systems is the only reason anyone would ever want to 
deal with UTF-16 at all, picking it up for a system that had no such 
consideration would be a terribly bad idea.

Cheers,
Maciej

From: John Thingstad
Subject: Re: Rndzl Problem
Date: Sat, 02 Feb 2008 09:44:21 +0000
Message-ID: <op.t5v5b71vut4oq5@pandora.alfanett.no>

P� Sat, 02 Feb 2008 00:25:05 +0100, skrev Maciej Katafiasz  
<········@gmail.com>:

> Den Fri, 01 Feb 2008 21:49:00 +0000 skrev Maciej Katafiasz:
>
>> (...) has compatibility issues with earlier UCS-2 systems[1] and formats
>> (...)
>> [1] Compatibility with UCS-2 is really the only reason anyone could even
>> consider UTF-16. (...)
>
> Hmm, I now realise it came out a bit incoherent. What I meant to say is:
> UTF-16 has UCS-2 compatibility problems, in that it's generally
> impossible to determine whether you're dealing with UTF-16 or UCS-2, both
> in data formats and in capabilities of systems that process them. Which
> is yet another reason to move away from UTF-16 as a non-deprecated
> format. However, despite the aforementioned issues, compatibility with
> pre-existing UCS-2 systems is the only reason anyone would ever want to
> deal with UTF-16 at all, picking it up for a system that had no such
> consideration would be a terribly bad idea.
>
> Cheers,
> Maciej

Whay do you assume most of the world deals with ASCII?
In Norway I would use a ISO-8859-1 or ISO_1159-15 encoding.
For UTF8 the first 7 bits are ASCII for UTF16 the first 8 bits are  
ISO-8859-1.
Anyhow it isnt like Windows systems can't deal with ASCII. In fact to  
convert ASCII to UTF16 just add a zero filled octet to every character  
before displaying the string. As it happens this works for ISO-8859-1  
(ISO-LATIN-1) as well. As sutch UTF8 is LESS efficient.
For Hebrew, Arabian or Japanese users the problems are even bigger.

--------------
John Thingstad

From: Alex Mizrahi
Subject: Re: Rndzl Problem
Date: Sat, 02 Feb 2008 10:39:28 +0000
Message-ID: <47a44862$0$90262$14726298@news.sunsite.dk>

 MK> impossible to determine whether you're dealing with UTF-16 or UCS-2,

UCS-2 is deprecated  (10 years ago or so) subset of UTF-16. why, for hell, 
you need to distinguish between them?
just use UTF-16

From: Maciej Katafiasz
Subject: Re: Rndzl Problem
Date: Sat, 02 Feb 2008 13:15:04 +0000
Message-ID: <fo1qcn$dc0$2@news.net.uni-c.dk>

Den Sat, 02 Feb 2008 12:39:28 +0200 skrev Alex Mizrahi:

> MK> impossible to determine whether you're dealing with UTF-16 or UCS-2,
> 
> UCS-2 is deprecated  (10 years ago or so) subset of UTF-16. why, for
> hell, you need to distinguish between them? just use UTF-16

Because some systems/formats expect UCS-2. And you can't tell that until 
you actually run into codepoints from outside BMP, at which point things 
either break or not. But yes, in general it's no worse than 
distinguishing UTF-8 from ASCII.

Cheers,
Maciej

From: Alex Mizrahi
Subject: Re: Rndzl Problem
Date: Sat, 02 Feb 2008 14:20:34 +0000
Message-ID: <47a47c34$0$90266$14726298@news.sunsite.dk>

 MK>>> impossible to determine whether you're dealing with UTF-16 or UCS-2,
 ??>>
 ??>> UCS-2 is deprecated  (10 years ago or so) subset of UTF-16. why, for
 ??>> hell, you need to distinguish between them? just use UTF-16

 MK> Because some systems/formats expect UCS-2. And you can't tell that
 MK> until you actually run into codepoints from outside BMP, at which point
 MK> things either break or not.

so, system or format do not support stuff like Han Ideographs or Ugaritic, 
and you want to use those symbols with it.
if you really need to, it's a problem of incompleteness of those 
systems/formats.

why do you think it's a problem of UCS-2 or UTF-16?

 MK>  But yes, in general it's no worse than distinguishing UTF-8 from
 MK> ASCII.

pure ASCII is rarely used nowadays, if we already have 8-bit transmission, 
something like latin1 is used by default, and confusing charset like latin1 
with UTF-8 is much bigger problem

From: Øyvin Halfdan Thuv
Subject: Re: Rndzl Problem
Date: Mon, 04 Feb 2008 12:37:17 +0000
Message-ID: <slrnfqe1nt.8i1.oyvinht@decibel.pvv.ntnu.no>

On 2008-02-01, Edi Weitz <········@agharta.de> wrote:
> On Fri, 1 Feb 2008 17:38:24 +0000 (UTC), Maciej Katafiasz <········@gmail.com> wrote:
>
>> As Edi said, this belongs to RNDZL's mailing list
>
> RDNZL.
>
>> but just as a hint, this kind of corruption suggests .NET is
>> expecting UTF-16 strings.  Windows is rather braindamaged about
>> using Unicode, so that wouldn't be totally unexpected.
>
> AFAIK the API functions usually have defined and documented semantics
> w.r.t. character encodings.  What's the brain damaged part of it?

Hmm, I ran into a similar problems with .Net once (where .Net quietly outputs
UTF-16 even when told to do something else). This supposedly happens because
.Net use UTF-16 internally. I do not know if this has anything
to with what the OP experienced, but have a look at

http://www.thescripts.com/forum/thread177860.html

to get an idea of the problem.

-- 
Oyvin

From: ···@telent.net
Subject: Re: Rndzl Problem
Date: Tue, 05 Feb 2008 02:24:51 +0000
Message-ID: <47a7c8f3$0$13939$fa0fcedb@news.zen.co.uk>

Edi Weitz wrote:
> On Fri, 1 Feb 2008 17:38:24 +0000 (UTC), Maciej Katafiasz <········@gmail.com> wrote:
> 
>> As Edi said, this belongs to RNDZL's mailing list
> 
> RDNZL.

Let down your hair.


I'm sorry, that was a particularly grimm joke.

-dan

From: Edi Weitz
Subject: Re: Rndzl Problem
Date: Tue, 05 Feb 2008 03:05:37 +0000
Message-ID: <uejbsksqm.fsf@agharta.de>

On Tue, 05 Feb 2008 02:24:51 +0000, ···@telent.net wrote:

> Edi Weitz wrote:
>
>> RDNZL.
>
> Let down your hair.
>
> I'm sorry, that was a particularly grimm joke.

Not bad, not bad... :)

But you weren't the first one:

  http://www.arf.ru/Notes/Stan/rdnzl.html

Edi.

-- 

European Common Lisp Meeting, Amsterdam, April 19/20, 2008

  http://weitz.de/eclm2008/

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Kent M Pitman
Subject: Re: Rndzl Problem
Date: Tue, 05 Feb 2008 05:42:15 +0000
Message-ID: <uhcgoc62w.fsf@nhplace.com>

Edi Weitz <········@agharta.de> writes:

> On Tue, 05 Feb 2008 02:24:51 +0000, ···@telent.net wrote:
> 
> > Edi Weitz wrote:
> >
> >> RDNZL.
> >
> > Let down your hair.
> >
> > I'm sorry, that was a particularly grimm joke.
> 
> Not bad, not bad... :)
> 
> But you weren't the first one:
> 
>   http://www.arf.ru/Notes/Stan/rdnzl.html

Yeah, Dan should apologize for the repundancy of his
suggestion, no doubt made in a fit of redundant zeal.

From: George Neuner
Subject: Re: Rndzl Problem
Date: Sat, 02 Feb 2008 21:33:50 +0000
Message-ID: <lok9q3pvt537jl6ukub3j1419opg2ous3m@4ax.com>

On Fri, 1 Feb 2008 17:38:24 +0000 (UTC), Maciej Katafiasz
<········@gmail.com> wrote:

>Den Fri, 01 Feb 2008 09:06:14 -0800 skrev Mikael:
>
>> An example:
>> The string my lisp program produces:
>> Cyan: 37.50 %
>> Magenta: 37.50 %
>> Yellow: 37.50 %
>> Black: 12.50 %
>> Total Byte Count: 32
>> 
>> The string as printed by .net:
>> ????????
>> 
>> The lisp I am using is ECL. For .net I am using Visual Studio 2005 C#,
>> producing code for .net version 2.
>
>As Edi said, this belongs to RNDZL's mailing list, but just as a hint, 
>this kind of corruption suggests .NET is expecting UTF-16 strings. 
>Windows is rather braindamaged about using Unicode, so that wouldn't be 
>totally unexpected.
>
>Cheers,
>Maciej

Windows unicode handling has nothing to do with .NET applications -
.NET operates in parallel with its own set of encoding/decoding
filters.

The .NET application is in charge of what character set it accepts,
with the default being the local code page.  If the Lisp and .NET code
are out of sync as it appears, it should be easily fixable on the .NET
side by chaining the proper stream reader to the input and writer to
the output.

ECL can be built with internal support for unicode.  AFAIK, it does
not also have external formatting support, but something like Edi's
flexi-streams might be used to cobble up something that works.

George
--
for email reply remove "/" from address

From: Edi Weitz
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 17:14:01 +0000
Message-ID: <u3ascvbty.fsf@agharta.de>

On Fri, 1 Feb 2008 09:06:14 -0800 (PST), Mikael <···········@gmail.com> wrote:

> I'm having a problem with using rndzl. I'm using it to send a string
> to a simple .net application I am using as an interface. The program
> works in that all the code runs and does something, but the string I
> send to rndzl is turned into what looks to be chinese or japanese
> characters.
>
> An example:
> The string my lisp program produces:
> Cyan: 37.50 %
> Magenta: 37.50 %
> Yellow: 37.50 %
> Black: 12.50 %
> Total Byte Count: 32
>
> The string as printed by .net:
> 说礏 礌㠸礌㮐礐
>
> The lisp I am using is ECL. For .net I am using Visual Studio 2005
> C#, producing code for .net version 2.
>
> I look forward to any help that the experts here can provide, and if
> my current approach is unworkable, I would like some advise on which
> windows graphics library to use.

Send your request to the RDNZL (not "Rndzl" by the way) mailing list,
possibly with a reproducible example, and we'll try to help you.

Edi.

-- 

European Common Lisp Meeting, Amsterdam, April 19/20, 2008

  http://weitz.de/eclm2008/

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Mikael
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 22:34:28 +0000
Message-ID: <d1b09d06-d3a5-4f17-8fc7-1464bf112696@v67g2000hse.googlegroups.com>

On Feb 1, 12:14 pm, Edi Weitz <········@agharta.de> wrote:
> On Fri, 1 Feb 2008 09:06:14 -0800 (PST), Mikael <···········@gmail.com> wrote:
> > I'm having a problem with using rndzl. I'm using it to send a string
> > to a simple .net application I am using as an interface. The program
> > works in that all the code runs and does something, but the string I
> > send to rndzl is turned into what looks to be chinese or japanese
> > characters.
>
> > An example:
> > The string my lisp program produces:
> > Cyan: 37.50 %
> > Magenta: 37.50 %
> > Yellow: 37.50 %
> > Black: 12.50 %
> > Total Byte Count: 32
>
> > The string as printed by .net:
> > 说礏 礌㠸礌㮐礐
>
> > The lisp I am using is ECL. For .net I am using Visual Studio 2005
> > C#, producing code for .net version 2.
>
> > I look forward to any help that the experts here can provide, and if
> > my current approach is unworkable, I would like some advise on which
> > windows graphics library to use.
>
> Send your request to the RDNZL (not "Rndzl" by the way) mailing list,
> possibly with a reproducible example, and we'll try to help you.
>
> Edi.
>
> --
>
> European Common Lisp Meeting, Amsterdam, April 19/20, 2008
>
>  http://weitz.de/eclm2008/
>
> Real email: (replace (subseq ·········@agharta.de" 5) "edi")

My mistake. I'm posting this to the RDNZL (darn my dyslexia) mailing
list.

From: John Thingstad
Subject: Re: Rndzl Problem
Date: Fri, 01 Feb 2008 18:00:30 +0000
Message-ID: <op.t5uxm4g7ut4oq5@pandora.alfanett.no>

På Fri, 01 Feb 2008 18:06:14 +0100, skrev Mikael <···········@gmail.com>:

> Hello,
>  I'm having a problem with using rndzl. I'm using it to send a string
> to a simple .net application I am using as an interface. The program
> works in that all the code runs and does something, but the string I
> send to rndzl is turned into what looks to be chinese or japanese
> characters.
>
> An example:
> The string my lisp program produces:
> Cyan: 37.50 %
> Magenta: 37.50 %
> Yellow: 37.50 %
> Black: 12.50 %
> Total Byte Count: 32
>
> The string as printed by .net:
> 说礏 礌㠸礌㮐礐
>
> The lisp I am using is ECL. For .net I am using Visual Studio 2005 C#,
> producing code for .net version 2.
>
> I look forward to any help that the experts here can provide, and if
> my current approach is unworkable, I would like some advise on which
> windows graphics library to use.

Run into this one. Maybe this helps.
It depends on flexi-streams and cffi.
It uses the hack that the 8 lower bits of UTF16 is Latin-1.
So if you only use the characters in latin-1 you can just throw the upper  
byte away and get iso-latin-1

(defun utf16-string-to-byte-array (wstr length)
   (let* ((size (* 2 length))
          (seq (make-array size :element-type '(unsigned-byte 8))))
     (dotimes (i size)
       (setf (aref seq i) (mem-aref wstr ':uchar i)))
     seq))

(defconstant +double-zero+
   (make-array 2 :element-type '(unsigned-byte 8) :initial-element 0))

(defun utf16-string-to-latin1 (wstr length)
   (let* ((seq (utf16-string-to-byte-array wstr length))
          (size (1+ (search +double-zero+ seq))))
     (octets-to-string seq :external-format :utf16 :end size)))



--------------
John Thingstad