Our development work on CLforJava (see ILC 2005 proceedings) is at a
point where we are looking into how to embed escaped Unicode chars in
strings. In the current std, the only escape char is '\' used largely
to escape '\'. We have a couple of ideas for extending the escape
mechanism to handle Unicode 4 characters.
In Unicode documentation, characters can be represented as U+ followed
by the hex representation of its code point. In CLforJava we already
have Reader support in the form of #\U (followed by 4 hex digits) or
#\U+(followed by 4-6 hex digits). We have 2 ideas for extending this
into strings:
1. Use '\' as the escape character as it is now. But if the character
following the '\' is U (or U+) it will be followed by 4 or 6 hex digits
respectively. Since U (or u) needs no escape, this would have little
effect on existing code.
2. Add the '#' character as an escape character that currently only
supports #U and #U+ followed by 4 or 6 digits as above. This would
require escaping the '#' character (\#) to add it to the string. This
has a greater probability of breaking existing code. On the other hand,
the definition of '\' is not changed.
Comments and suggestions please! thanks
On Tue, 04 Oct 2005 07:13:31 -0700, Jerry Boetje wrote:
> 1. Use '\' as the escape character as it is now. But if the character
> following the '\' is U (or U+) it will be followed by 4 or 6 hex digits
> respectively. Since U (or u) needs no escape, this would have little
> effect on existing code.
I'd suggest the same notation as Java 5.0+:
<http://java.sun.com/developer/technicalArticles/Intl/Supplementary/>
For text input, the Java 2 SDK provides a code point input method which
accepts strings of the form "\Uxxxxxx", where the uppercase "U"
indicates that the escape sequence contains six hexadecimal digits,
thus allowing for supplementary characters. A lowercase "u" indicates
the original form of the escape sequences, "\uxxxx". You can find this
input method and its documentation in the directory
demo/jfc/CodePointIM of the J2SDK.
Regards,
Adam