Hello,
I'm having a problem with using rndzl. I'm using it to send a string
to a simple .net application I am using as an interface. The program
works in that all the code runs and does something, but the string I
send to rndzl is turned into what looks to be chinese or japanese
characters.
An example:
The string my lisp program produces:
Cyan: 37.50 %
Magenta: 37.50 %
Yellow: 37.50 %
Black: 12.50 %
Total Byte Count: 32
The string as printed by .net:
说礏 礌㠸礌㮐礐
The lisp I am using is ECL. For .net I am using Visual Studio 2005 C#,
producing code for .net version 2.
I look forward to any help that the experts here can provide, and if
my current approach is unworkable, I would like some advise on which
windows graphics library to use.
Den Fri, 01 Feb 2008 09:06:14 -0800 skrev Mikael:
> An example:
> The string my lisp program produces:
> Cyan: 37.50 %
> Magenta: 37.50 %
> Yellow: 37.50 %
> Black: 12.50 %
> Total Byte Count: 32
>
> The string as printed by .net:
> 说礏 礌㠸礌㮐礐
>
> The lisp I am using is ECL. For .net I am using Visual Studio 2005 C#,
> producing code for .net version 2.
As Edi said, this belongs to RNDZL's mailing list, but just as a hint,
this kind of corruption suggests .NET is expecting UTF-16 strings.
Windows is rather braindamaged about using Unicode, so that wouldn't be
totally unexpected.
Cheers,
Maciej
On Fri, 1 Feb 2008 17:38:24 +0000 (UTC), Maciej Katafiasz <········@gmail.com> wrote:
> As Edi said, this belongs to RNDZL's mailing list
RDNZL.
> but just as a hint, this kind of corruption suggests .NET is
> expecting UTF-16 strings. Windows is rather braindamaged about
> using Unicode, so that wouldn't be totally unexpected.
AFAIK the API functions usually have defined and documented semantics
w.r.t. character encodings. What's the brain damaged part of it?
Edi.
--
European Common Lisp Meeting, Amsterdam, April 19/20, 2008
http://weitz.de/eclm2008/
Real email: (replace (subseq ·········@agharta.de" 5) "edi")
Den Fri, 01 Feb 2008 20:42:30 +0100 skrev Edi Weitz:
>> As Edi said, this belongs to RNDZL's mailing list
>
> RDNZL.
Sorry, I tried three times to spell it properly :(
>> but just as a hint, this kind of corruption suggests .NET is expecting
>> UTF-16 strings. Windows is rather braindamaged about using Unicode, so
>> that wouldn't be totally unexpected.
>
> AFAIK the API functions usually have defined and documented semantics
> w.r.t. character encodings. What's the brain damaged part of it?
Using UTF-16 as externally visible format.
Cheers,
Maciej
On Fri, 1 Feb 2008 20:03:17 +0000 (UTC), Maciej Katafiasz <········@gmail.com> wrote:
> Den Fri, 01 Feb 2008 20:42:30 +0100 skrev Edi Weitz:
>
>> What's the brain damaged part of it?
>
> Using UTF-16 as externally visible format.
Why?
--
European Common Lisp Meeting, Amsterdam, April 19/20, 2008
http://weitz.de/eclm2008/
Real email: (replace (subseq ·········@agharta.de" 5) "edi")
Den Fri, 01 Feb 2008 21:38:31 +0100 skrev Edi Weitz:
>>> What's the brain damaged part of it?
>>
>> Using UTF-16 as externally visible format.
>
> Why?
Because it's a stupid format with all the problems of UTF-8 (and then
some more) and none of its advantages. It's generally inefficient for
text that stays within ASCII and offers no space advantage outside of it,
won't pass cleanly through 8-bit systems (and vice versa, see the OP),
has compatibility issues with earlier UCS-2 systems[1] and formats that
don't understand surrogates, can't address codepoints above BMP directly,
so it has no advantage of fixed width, introduces endianness headaches
and doesn't synchronise the way UTF-8 does, so a corrupted UTF-16 stream
is in principle completely unrecoverable.
Cheers,
Maciej
[1] Compatibility with UCS-2 is really the only reason anyone could even
consider UTF-16. But I'm of a strong opinion that MSFT should've phased
out all UTF-16 primary APIs and only left them for compatibility. But
they didn't do that, in fact, notepad.exe will actually write out UTF-16
when you tell it to save "unicode text", which is just ridiculous.
Den Fri, 01 Feb 2008 21:49:00 +0000 skrev Maciej Katafiasz:
> (...) has compatibility issues with earlier UCS-2 systems[1] and formats
> (...)
> [1] Compatibility with UCS-2 is really the only reason anyone could even
> consider UTF-16. (...)
Hmm, I now realise it came out a bit incoherent. What I meant to say is:
UTF-16 has UCS-2 compatibility problems, in that it's generally
impossible to determine whether you're dealing with UTF-16 or UCS-2, both
in data formats and in capabilities of systems that process them. Which
is yet another reason to move away from UTF-16 as a non-deprecated
format. However, despite the aforementioned issues, compatibility with
pre-existing UCS-2 systems is the only reason anyone would ever want to
deal with UTF-16 at all, picking it up for a system that had no such
consideration would be a terribly bad idea.
Cheers,
Maciej
P� Sat, 02 Feb 2008 00:25:05 +0100, skrev Maciej Katafiasz
<········@gmail.com>:
> Den Fri, 01 Feb 2008 21:49:00 +0000 skrev Maciej Katafiasz:
>
>> (...) has compatibility issues with earlier UCS-2 systems[1] and formats
>> (...)
>> [1] Compatibility with UCS-2 is really the only reason anyone could even
>> consider UTF-16. (...)
>
> Hmm, I now realise it came out a bit incoherent. What I meant to say is:
> UTF-16 has UCS-2 compatibility problems, in that it's generally
> impossible to determine whether you're dealing with UTF-16 or UCS-2, both
> in data formats and in capabilities of systems that process them. Which
> is yet another reason to move away from UTF-16 as a non-deprecated
> format. However, despite the aforementioned issues, compatibility with
> pre-existing UCS-2 systems is the only reason anyone would ever want to
> deal with UTF-16 at all, picking it up for a system that had no such
> consideration would be a terribly bad idea.
>
> Cheers,
> Maciej
Whay do you assume most of the world deals with ASCII?
In Norway I would use a ISO-8859-1 or ISO_1159-15 encoding.
For UTF8 the first 7 bits are ASCII for UTF16 the first 8 bits are
ISO-8859-1.
Anyhow it isnt like Windows systems can't deal with ASCII. In fact to
convert ASCII to UTF16 just add a zero filled octet to every character
before displaying the string. As it happens this works for ISO-8859-1
(ISO-LATIN-1) as well. As sutch UTF8 is LESS efficient.
For Hebrew, Arabian or Japanese users the problems are even bigger.
--------------
John Thingstad
MK> impossible to determine whether you're dealing with UTF-16 or UCS-2,
UCS-2 is deprecated (10 years ago or so) subset of UTF-16. why, for hell,
you need to distinguish between them?
just use UTF-16
Den Sat, 02 Feb 2008 12:39:28 +0200 skrev Alex Mizrahi:
> MK> impossible to determine whether you're dealing with UTF-16 or UCS-2,
>
> UCS-2 is deprecated (10 years ago or so) subset of UTF-16. why, for
> hell, you need to distinguish between them? just use UTF-16
Because some systems/formats expect UCS-2. And you can't tell that until
you actually run into codepoints from outside BMP, at which point things
either break or not. But yes, in general it's no worse than
distinguishing UTF-8 from ASCII.
Cheers,
Maciej
MK>>> impossible to determine whether you're dealing with UTF-16 or UCS-2,
??>>
??>> UCS-2 is deprecated (10 years ago or so) subset of UTF-16. why, for
??>> hell, you need to distinguish between them? just use UTF-16
MK> Because some systems/formats expect UCS-2. And you can't tell that
MK> until you actually run into codepoints from outside BMP, at which point
MK> things either break or not.
so, system or format do not support stuff like Han Ideographs or Ugaritic,
and you want to use those symbols with it.
if you really need to, it's a problem of incompleteness of those
systems/formats.
why do you think it's a problem of UCS-2 or UTF-16?
MK> But yes, in general it's no worse than distinguishing UTF-8 from
MK> ASCII.
pure ASCII is rarely used nowadays, if we already have 8-bit transmission,
something like latin1 is used by default, and confusing charset like latin1
with UTF-8 is much bigger problem
On 2008-02-01, Edi Weitz <········@agharta.de> wrote:
> On Fri, 1 Feb 2008 17:38:24 +0000 (UTC), Maciej Katafiasz <········@gmail.com> wrote:
>
>> As Edi said, this belongs to RNDZL's mailing list
>
> RDNZL.
>
>> but just as a hint, this kind of corruption suggests .NET is
>> expecting UTF-16 strings. Windows is rather braindamaged about
>> using Unicode, so that wouldn't be totally unexpected.
>
> AFAIK the API functions usually have defined and documented semantics
> w.r.t. character encodings. What's the brain damaged part of it?
Hmm, I ran into a similar problems with .Net once (where .Net quietly outputs
UTF-16 even when told to do something else). This supposedly happens because
.Net use UTF-16 internally. I do not know if this has anything
to with what the OP experienced, but have a look at
http://www.thescripts.com/forum/thread177860.html
to get an idea of the problem.
--
Oyvin
Edi Weitz wrote:
> On Fri, 1 Feb 2008 17:38:24 +0000 (UTC), Maciej Katafiasz <········@gmail.com> wrote:
>
>> As Edi said, this belongs to RNDZL's mailing list
>
> RDNZL.
Let down your hair.
I'm sorry, that was a particularly grimm joke.
-dan
On Tue, 05 Feb 2008 02:24:51 +0000, ···@telent.net wrote:
> Edi Weitz wrote:
>
>> RDNZL.
>
> Let down your hair.
>
> I'm sorry, that was a particularly grimm joke.
Not bad, not bad... :)
But you weren't the first one:
http://www.arf.ru/Notes/Stan/rdnzl.html
Edi.
--
European Common Lisp Meeting, Amsterdam, April 19/20, 2008
http://weitz.de/eclm2008/
Real email: (replace (subseq ·········@agharta.de" 5) "edi")
Edi Weitz <········@agharta.de> writes:
> On Tue, 05 Feb 2008 02:24:51 +0000, ···@telent.net wrote:
>
> > Edi Weitz wrote:
> >
> >> RDNZL.
> >
> > Let down your hair.
> >
> > I'm sorry, that was a particularly grimm joke.
>
> Not bad, not bad... :)
>
> But you weren't the first one:
>
> http://www.arf.ru/Notes/Stan/rdnzl.html
Yeah, Dan should apologize for the repundancy of his
suggestion, no doubt made in a fit of redundant zeal.
On Fri, 1 Feb 2008 17:38:24 +0000 (UTC), Maciej Katafiasz
<········@gmail.com> wrote:
>Den Fri, 01 Feb 2008 09:06:14 -0800 skrev Mikael:
>
>> An example:
>> The string my lisp program produces:
>> Cyan: 37.50 %
>> Magenta: 37.50 %
>> Yellow: 37.50 %
>> Black: 12.50 %
>> Total Byte Count: 32
>>
>> The string as printed by .net:
>> ????????
>>
>> The lisp I am using is ECL. For .net I am using Visual Studio 2005 C#,
>> producing code for .net version 2.
>
>As Edi said, this belongs to RNDZL's mailing list, but just as a hint,
>this kind of corruption suggests .NET is expecting UTF-16 strings.
>Windows is rather braindamaged about using Unicode, so that wouldn't be
>totally unexpected.
>
>Cheers,
>Maciej
Windows unicode handling has nothing to do with .NET applications -
.NET operates in parallel with its own set of encoding/decoding
filters.
The .NET application is in charge of what character set it accepts,
with the default being the local code page. If the Lisp and .NET code
are out of sync as it appears, it should be easily fixable on the .NET
side by chaining the proper stream reader to the input and writer to
the output.
ECL can be built with internal support for unicode. AFAIK, it does
not also have external formatting support, but something like Edi's
flexi-streams might be used to cobble up something that works.
George
--
for email reply remove "/" from address
On Fri, 1 Feb 2008 09:06:14 -0800 (PST), Mikael <···········@gmail.com> wrote:
> I'm having a problem with using rndzl. I'm using it to send a string
> to a simple .net application I am using as an interface. The program
> works in that all the code runs and does something, but the string I
> send to rndzl is turned into what looks to be chinese or japanese
> characters.
>
> An example:
> The string my lisp program produces:
> Cyan: 37.50 %
> Magenta: 37.50 %
> Yellow: 37.50 %
> Black: 12.50 %
> Total Byte Count: 32
>
> The string as printed by .net:
> 说礏 礌㠸礌㮐礐
>
> The lisp I am using is ECL. For .net I am using Visual Studio 2005
> C#, producing code for .net version 2.
>
> I look forward to any help that the experts here can provide, and if
> my current approach is unworkable, I would like some advise on which
> windows graphics library to use.
Send your request to the RDNZL (not "Rndzl" by the way) mailing list,
possibly with a reproducible example, and we'll try to help you.
Edi.
--
European Common Lisp Meeting, Amsterdam, April 19/20, 2008
http://weitz.de/eclm2008/
Real email: (replace (subseq ·········@agharta.de" 5) "edi")
On Feb 1, 12:14 pm, Edi Weitz <········@agharta.de> wrote:
> On Fri, 1 Feb 2008 09:06:14 -0800 (PST), Mikael <···········@gmail.com> wrote:
> > I'm having a problem with using rndzl. I'm using it to send a string
> > to a simple .net application I am using as an interface. The program
> > works in that all the code runs and does something, but the string I
> > send to rndzl is turned into what looks to be chinese or japanese
> > characters.
>
> > An example:
> > The string my lisp program produces:
> > Cyan: 37.50 %
> > Magenta: 37.50 %
> > Yellow: 37.50 %
> > Black: 12.50 %
> > Total Byte Count: 32
>
> > The string as printed by .net:
> > 说礏 礌㠸礌㮐礐
>
> > The lisp I am using is ECL. For .net I am using Visual Studio 2005
> > C#, producing code for .net version 2.
>
> > I look forward to any help that the experts here can provide, and if
> > my current approach is unworkable, I would like some advise on which
> > windows graphics library to use.
>
> Send your request to the RDNZL (not "Rndzl" by the way) mailing list,
> possibly with a reproducible example, and we'll try to help you.
>
> Edi.
>
> --
>
> European Common Lisp Meeting, Amsterdam, April 19/20, 2008
>
> http://weitz.de/eclm2008/
>
> Real email: (replace (subseq ·········@agharta.de" 5) "edi")
My mistake. I'm posting this to the RDNZL (darn my dyslexia) mailing
list.
På Fri, 01 Feb 2008 18:06:14 +0100, skrev Mikael <···········@gmail.com>:
> Hello,
> I'm having a problem with using rndzl. I'm using it to send a string
> to a simple .net application I am using as an interface. The program
> works in that all the code runs and does something, but the string I
> send to rndzl is turned into what looks to be chinese or japanese
> characters.
>
> An example:
> The string my lisp program produces:
> Cyan: 37.50 %
> Magenta: 37.50 %
> Yellow: 37.50 %
> Black: 12.50 %
> Total Byte Count: 32
>
> The string as printed by .net:
> 说礏 礌㠸礌㮐礐
>
> The lisp I am using is ECL. For .net I am using Visual Studio 2005 C#,
> producing code for .net version 2.
>
> I look forward to any help that the experts here can provide, and if
> my current approach is unworkable, I would like some advise on which
> windows graphics library to use.
Run into this one. Maybe this helps.
It depends on flexi-streams and cffi.
It uses the hack that the 8 lower bits of UTF16 is Latin-1.
So if you only use the characters in latin-1 you can just throw the upper
byte away and get iso-latin-1
(defun utf16-string-to-byte-array (wstr length)
(let* ((size (* 2 length))
(seq (make-array size :element-type '(unsigned-byte 8))))
(dotimes (i size)
(setf (aref seq i) (mem-aref wstr ':uchar i)))
seq))
(defconstant +double-zero+
(make-array 2 :element-type '(unsigned-byte 8) :initial-element 0))
(defun utf16-string-to-latin1 (wstr length)
(let* ((seq (utf16-string-to-byte-array wstr length))
(size (1+ (search +double-zero+ seq))))
(octets-to-string seq :external-format :utf16 :end size)))
--------------
John Thingstad