byte order

From: Rainer Joswig
Subject: byte order
Date: Mon, 07 Jan 2008 17:00:00 +0000
Message-ID: <joswig-22572D.18000007012008@news-europe.giganews.com>

Hi,

is there a way to determine the byte order
of the underlying 'machine'? Portable?

Regards,

Rainer Joswig

-- 
http://lispm.dyndns.org/

Re: byte order Pekka Niiranen
- Re: byte order ·······@eurogaran.com
Re: byte order Vassil Nikolov
- Re: byte order Rainer Joswig
  - Re: byte order Vassil Nikolov
    - Re: byte order verec
- Re: byte order Pascal Bourguignon
Re: byte order Pascal Bourguignon
- Re: byte order Jeff M.
  - Re: byte order Raymond Wiker
    - Re: byte order Jeff M.
      - Re: byte order Sohail Somani
      - Re: byte order Pascal Bourguignon
        Re: byte order George Neuner
      - Re: byte order Raymond Wiker
    - Re: byte order Maciej Katafiasz
      - Re: byte order Vassil Nikolov
      - Re: byte order Raymond Toy (RT/EUS)
    - Re: byte order George Neuner
Re: byte order ·······@eurogaran.com
- Re: byte order Rainer Joswig
  - Re: byte order ·······@eurogaran.com
    - Re: byte order George Neuner
    - Re: byte order Rainer Joswig
      - Re: byte order ·······@eurogaran.com
        Re: byte order Rainer Joswig
        Re: byte order Vesa Karvonen
        Re: byte order ·······@eurogaran.com
        Re: byte order Zach Beane
        Re: byte order Rainer Joswig
        Re: byte order Thomas F. Burdick
        Re: byte order Rainer Joswig
        Re: byte order Zach Beane
        Re: byte order Thomas F. Burdick
        Re: byte order John Thingstad
        Re: byte order Duane Rettig
        Re: byte order ···············@gmail.com
        Re: byte order Victor Anyakin
        Re: byte order Casper H.S. Dik
        Re: byte order Vesa Karvonen
        Re: byte order Madhu
        Re: byte order Vassil Nikolov
        Re: byte order Pascal Bourguignon
    - Re: byte order Rob Warnock
  - Re: byte order Pascal Bourguignon
- Re: byte order Pascal Bourguignon
Re: byte order Duane Rettig
- Re: byte order Rainer Joswig

From: Pekka Niiranen
Subject: Re: byte order
Date: Mon, 07 Jan 2008 17:09:21 +0000
Message-ID: <47825E09.5050004@pp5.inet.fi>

Rainer Joswig wrote:
> Hi,
> 
> is there a way to determine the byte order
> of the underlying 'machine'? Portable?
> 
> Regards,
> 
> Rainer Joswig
> 
PythonWin 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit 
(Intel)] on win32.
Portions Copyright 1994-2006 Mark Hammond - see 'Help/About PythonWin' 
for further copyright information.
 >>> import sys
 >>> sys.byteorder
'little'
 >>>

Just read the source and port it to Lisp;)

From: ·······@eurogaran.com
Subject: Re: byte order
Date: Tue, 08 Jan 2008 09:46:41 +0000
Message-ID: <fbd1478f-58c1-49fc-a58d-f70e982ca281@d70g2000hsb.googlegroups.com>

On Jan 7, 6:09 pm, Pekka Niiranen <··············@pp5.inet.fi> wrote:

>  >>> import sys
>  >>> sys.byteorder
> 'little'
>  >>>
>
> Just read the source and port it to Lisp;)

You are plenty of right in that there are many libraries lacking for
Lisp.
Thanks for the reminding, though.

From: Vassil Nikolov
Subject: Re: byte order
Date: Mon, 07 Jan 2008 23:36:18 +0000
Message-ID: <snwwsqlxl71.fsf@luna.vassil.nikolov.name>

Rainer Joswig <······@lisp.de> writes:
> is there a way to determine the byte order
> of the underlying 'machine'? Portable?

  You can obtain the answer to a related question by writing 1 to a
  file using an (UNSIGNED-BYTE 16) stream and then reading from the
  same file using an (UNSIGNED-BYTE 8) stream.  Yes, behavior still
  depends on the implementation, so it only gives a partial answer.
  (In practice, it is highly probable that all implementations will
  work as we'd like them to.)

  A trivial first draft (run on x86):

    * (defun little-endian-p ()
        (with-open-file (s "ub16" :direction :output :element-type '(unsigned-byte 16))
          (print (stream-element-type s) *trace-output*)  ;implementation-dependent [*]
          (write-byte 1 s))
        (with-open-file (s "ub16" :direction :input :element-type '(unsigned-byte 8))
          (print (stream-element-type s) *trace-output*)  ;implementation-dependent
          (= 1 (read-byte s))))

    little-endian-p
    * (little-endian-p)

    (unsigned-byte 16) 
    (unsigned-byte 8) 
    t

    $ od -t x1 ub16
    0000000 01 00
    0000002

  [*] See the last paragraph of the "Exceptional Situations"
      subsection of OPEN's specification.

  *        *        *

  I am tempted to think that the following is portable in C (it
  compiles cleanly with ``gcc -ansi -pedantic'', but that isn't
  everything, of course):

    ==> little-endian-p.c <==

    int main () {
      int i;
      i = 1;
      return (*(char *)&i == 0);
    }

  but I am also tempted to think that I might be missing some
  provision of the standard.

  Implementing a test that can also detect PDP-endianness is left as
  an exercise...

  ---Vassil.

-- 
Bound variables, free programmers.

From: Rainer Joswig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 00:07:47 +0000
Message-ID: <joswig-3C0236.01074608012008@news-europe.giganews.com>

In article <···············@luna.vassil.nikolov.name>,
 Vassil Nikolov <···············@pobox.com> wrote:

> Rainer Joswig <······@lisp.de> writes:
> > is there a way to determine the byte order
> > of the underlying 'machine'? Portable?
> 
>   You can obtain the answer to a related question by writing 1 to a
>   file using an (UNSIGNED-BYTE 16) stream and then reading from the
>   same file using an (UNSIGNED-BYTE 8) stream.  Yes, behavior still
>   depends on the implementation, so it only gives a partial answer.
>   (In practice, it is highly probable that all implementations will
>   work as we'd like them to.)

Thanks for the answer!

>   A trivial first draft (run on x86):
> 
>     * (defun little-endian-p ()
>         (with-open-file (s "ub16" :direction :output :element-type '(unsigned-byte 16))
>           (print (stream-element-type s) *trace-output*)  ;implementation-dependent [*]
>           (write-byte 1 s))
>         (with-open-file (s "ub16" :direction :input :element-type '(unsigned-byte 8))
>           (print (stream-element-type s) *trace-output*)  ;implementation-dependent
>           (= 1 (read-byte s))))
> 
>     little-endian-p
>     * (little-endian-p)
> 
>     (unsigned-byte 16) 
>     (unsigned-byte 8) 
>     t

That looked good. Then I ran it on a Lisp Machine:

Command: (little-endian-p)
(UNSIGNED-BYTE 16) 
Error: File was created with byte size 16; it may not be opened with byte size 8.
       For RJNXP:>joswig>ub16.lisp.1

LMFS:OPEN-LOCAL-LMFS-1
   Arg 0: #P"RJNXP:>joswig>ub16.lisp.newest"
   Arg 1 (LMFS:LOGPATH): #P"RJNXP:>joswig>ub16.lisp.newest"
   Arg 2 (LMFS:ACCESS-PATH): #<FS:LOCAL-LMFS-ACCESS-PATH RJNXP using LOCAL-FILE 2050271547>
   Rest Arg (LMFS:OPTIONS): (:DIRECTION :INPUT :ELEMENT-TYPE (UNSIGNED-BYTE 8) ...)
s-A, <Resume>: Retry OPEN of RJNXP:>joswig>ub16.lisp.newest
s-B:           Retry OPEN using a different pathname
s-C, <Abort>:  Return to Lisp Top Level in a TELNET server
s-D:           Restart process TELNET terminal
->

Sigh. It will be hardcoded, then.

Btw., it is a little-endian machine, too.

I tried another variant: use an ARRAY of 32 bits and another one
of 4 times 8 bits displaced to it. Then compile the access code
with SAFETY 0 to get rid of the runtime check. Does also
not work on the platforms I'd be interested in...

More embarrassing, the software I was looking at (not written by me, I swear),
named it wrong  -  %big-endian was really %little-endian.


> 
>     $ od -t x1 ub16
>     0000000 01 00
>     0000002
> 
>   [*] See the last paragraph of the "Exceptional Situations"
>       subsection of OPEN's specification.
> 
>   *        *        *
> 
>   I am tempted to think that the following is portable in C (it
>   compiles cleanly with ``gcc -ansi -pedantic'', but that isn't
>   everything, of course):
> 
>     ==> little-endian-p.c <==
> 
>     int main () {
>       int i;
>       i = 1;
>       return (*(char *)&i == 0);
>     }
> 
>   but I am also tempted to think that I might be missing some
>   provision of the standard.
> 
> 
>   Implementing a test that can also detect PDP-endianness is left as
>   an exercise...
> 
>   ---Vassil.

-- 
http://lispm.dyndns.org/

From: Vassil Nikolov
Subject: Re: byte order
Date: Tue, 08 Jan 2008 01:47:38 +0000
Message-ID: <snwejctxf45.fsf@luna.vassil.nikolov.name>

Rainer Joswig <······@lisp.de> writes:

> In article <···············@luna.vassil.nikolov.name>,
>  Vassil Nikolov <···············@pobox.com> wrote:
> ...
>>   (In practice, it is highly probable that all implementations will
>>   work as we'd like them to.)
> ...
>>       (defun little-endian-p ()
>>         (with-open-file (s "ub16" :direction :output :element-type '(unsigned-byte 16))
>>           (print (stream-element-type s) *trace-output*)  ;implementation-dependent [*]
>>           (write-byte 1 s))
>>         (with-open-file (s "ub16" :direction :input :element-type '(unsigned-byte 8))
>>           (print (stream-element-type s) *trace-output*)  ;implementation-dependent
>>           (= 1 (read-byte s))))
>> ...
> That looked good. Then I ran it on a Lisp Machine:
>
> Command: (little-endian-p)
> (UNSIGNED-BYTE 16) 
> Error: File was created with byte size 16; it may not be opened with byte size 8.
>        For RJNXP:>joswig>ub16.lisp.1

  Good point.  I was in a unix state of mind.  "Highly probable" it
  isn't.

  If there is no way to determine endianness programmatically [*]
  _within_ a Lisp Machine, perhaps it can be done extramurally, as it
  were, by writing to an (UNSIGNED-BYTE 16) value to a network
  destination and then getting an (UNSIGNED-BYTE 8) value echoed?
  (Ignoring how impractical this is for the sake of the exercise.)

  [*] At the Common Lisp level; I suppose there are low-level
      facilities that allow one to examine "raw" memory contents.

  By the way, perhaps it will help to know why exactly you need to
  detect endianness.  A specific reason might hint at a specific
  solution.

  ---Vassil.

-- 
Bound variables, free programmers.

From: verec
Subject: Re: byte order
Date: Tue, 08 Jan 2008 04:06:23 +0000
Message-ID: <4782f6bf$0$510$5a6aecb4@news.aaisp.net.uk>

On 2008-01-08 01:47:38 +0000, Vassil Nikolov <···············@pobox.com> said:

>   By the way, perhaps it will help to know why exactly you need to
>   detect endianness.  A specific reason might hint at a specific
>   solution.

Yes, this begs the question :) If it is nigh impossible to
determine endianness, what use could have this information
apart from display purpose ("Your machine is little-endian")?

The only case that springs to mind is interoprability at the
file format level, where you'd have to write "by hand" 16, 32
or 64 bits values in the same way the _host_ does. But then
that file format ought to be specified at the outset, and
you would know which endianness to use regardless of the host's
no?
--
JFB

From: Pascal Bourguignon
Subject: Re: byte order
Date: Tue, 08 Jan 2008 19:59:36 +0000
Message-ID: <87myrg6qc7.fsf@thalassa.informatimago.com>

Vassil Nikolov <···············@pobox.com> writes:

> Rainer Joswig <······@lisp.de> writes:
>> is there a way to determine the byte order
>> of the underlying 'machine'? Portable?
>
>   You can obtain the answer to a related question by writing 1 to a
>   file using an (UNSIGNED-BYTE 16) stream and then reading from the
>   same file using an (UNSIGNED-BYTE 8) stream.  Yes, behavior still
>   depends on the implementation, so it only gives a partial answer.
>   (In practice, it is highly probable that all implementations will
>   work as we'd like them to.)

Not portably.  Clisp always writes its files in little-endian, to make
its files portable from one platform to another.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Our users will know fear and cower before our software! Ship it!
Ship it and let them flee like the dogs they are!"

From: Pascal Bourguignon
Subject: Re: byte order
Date: Mon, 07 Jan 2008 20:02:54 +0000
Message-ID: <87r6gt76a9.fsf@thalassa.informatimago.com>

Rainer Joswig <······@lisp.de> writes:
> is there a way to determine the byte order
> of the underlying 'machine'? Portable?

Not in Common Lisp.

Not in standard C either.

In clisp, when you have clx: #+CLX-LITTLE-ENDIAN / #+CLX-BIG-ENDIAN


In gcc on normal 32-bit platforms:

#include <stdio.h>
int main(void){
  union {
     int  i;
     char c[sizeof(int)];
  } v;
  v.i=0x01020304;
  if((c[0]==0x01)&&(c[1]==0x02)(c[2]==0x03)(c[3]==0x04)){
      printf("big endian\n");
  }else if((c[3]==0x01)&&(c[2]==0x02)(c[1]==0x03)(c[0]==0x04)){
      printf("little endian\n");
  }else if((c[1]==0x01)&&(c[0]==0x02)(c[3]==0x03)(c[2]==0x04)){
      printf("pdf endian\n"); /* IIRC */
  }else{
      printf("other endian\n");
  }
  return(0);
}


You could do the same with some FFI in lisps having a FFI, or similar
in lisp providing a low-level access to the memory.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay

From: Jeff M.
Subject: Re: byte order
Date: Tue, 08 Jan 2008 18:50:25 +0000
Message-ID: <44fc8130-d5eb-4bed-b33d-fe77359c1885@k39g2000hsf.googlegroups.com>

On Jan 7, 2:02 pm, Pascal Bourguignon <····@informatimago.com> wrote:
> Rainer Joswig <······@lisp.de> writes:
> > is there a way to determine the byte order
> > of the underlying 'machine'? Portable?
>
> Not in Common Lisp.
>
> Not in standard C either.

You're joking, right?

// returns 1 if little endian, 0 if big
int am_i_little_endian() {
    int x = 0x12345678;
    char* b = (char*)&x;
    return *b == (x & 0xFF);
}

Jeff M.

From: Raymond Wiker
Subject: Re: byte order
Date: Tue, 08 Jan 2008 20:37:53 +0000
Message-ID: <m2ir24f3z2.fsf@Macintosh-2.local>

"Jeff M." <·······@gmail.com> writes:

> On Jan 7, 2:02 pm, Pascal Bourguignon <····@informatimago.com> wrote:
>> Rainer Joswig <······@lisp.de> writes:
>> > is there a way to determine the byte order
>> > of the underlying 'machine'? Portable?
>>
>> Not in Common Lisp.
>>
>> Not in standard C either.
>
> You're joking, right?
>
> // returns 1 if little endian, 0 if big
> int am_i_little_endian() {
>     int x = 0x12345678;
>     char* b = (char*)&x;
>     return *b == (x & 0xFF);
> }

	Are you *absolutely* sure that's standard C, and not just
something that happens to work most of the time?

From: Jeff M.
Subject: Re: byte order
Date: Tue, 08 Jan 2008 21:23:50 +0000
Message-ID: <209673e3-4b2c-4353-8a0e-96b676e57668@e6g2000prf.googlegroups.com>

On Jan 8, 2:37 pm, Raymond Wiker <····@RawMBP.local> wrote:
> "Jeff M." <·······@gmail.com> writes:
> > On Jan 7, 2:02 pm, Pascal Bourguignon <····@informatimago.com> wrote:
> >> Rainer Joswig <······@lisp.de> writes:
> >> > is there a way to determine the byte order
> >> > of the underlying 'machine'? Portable?
>
> >> Not in Common Lisp.
>
> >> Not in standard C either.
>
> > You're joking, right?
>
> > // returns 1 if little endian, 0 if big
> > int am_i_little_endian() {
> >     int x = 0x12345678;
> >     char* b = (char*)&x;
> >     return *b == (x & 0xFF);
> > }
>
>         Are you *absolutely* sure that's standard C, and not just
> something that happens to work most of the time?

Tell ya what, I'm more than willing to admit being wrong - I've had to
do it many times over the course of my life (and I'm sure I have many
more ahead of me). :-)

But if you can give me an example where it doesn't work, that's
infinitely more useful (to me and the OP) than just questioning
whether or not it works. I've used the above code many, many times in
production code and on numerous platforms. I have yet to see it not
work. Does that make me *absolutely* sure? No, but let's just say...
confident. :-)

Jeff M.

From: Sohail Somani
Subject: Re: byte order
Date: Tue, 08 Jan 2008 21:34:36 +0000
Message-ID: <M%Rgj.327$yQ1.78@edtnps89>

On Tue, 08 Jan 2008 13:23:50 -0800, Jeff M. wrote:

> Tell ya what, I'm more than willing to admit being wrong - I've had to
> do it many times over the course of my life (and I'm sure I have many
> more ahead of me).
> 
> But if you can give me an example where it doesn't work, that's
> infinitely more useful (to me and the OP) than just questioning whether
> or not it works. I've used the above code many, many times in production
> code and on numerous platforms. I have yet to see it not work. Does that
> make me *absolutely* sure? No, but let's just say... confident.

Why do you use the value 0x12345678? Wouldn't the value 1 be sufficient 
and more readable? I might be missing something, but:

int am_i_little_endian()
{
  int x = 1;
  char * b = (char*)&x;
  return *b==1;
}

Didn't actually try the above though :-)

-- 
Sohail Somani
http://uint32t.blogspot.com

From: Pascal Bourguignon
Subject: Re: byte order
Date: Tue, 08 Jan 2008 21:48:16 +0000
Message-ID: <873at8gfa7.fsf@thalassa.informatimago.com>

"Jeff M." <·······@gmail.com> writes:

> On Jan 8, 2:37 pm, Raymond Wiker <····@RawMBP.local> wrote:
>> "Jeff M." <·······@gmail.com> writes:
>> > On Jan 7, 2:02 pm, Pascal Bourguignon <····@informatimago.com> wrote:
>> >> Rainer Joswig <······@lisp.de> writes:
>> >> > is there a way to determine the byte order
>> >> > of the underlying 'machine'? Portable?
>>
>> >> Not in Common Lisp.
>>
>> >> Not in standard C either.
>>
>> > You're joking, right?
>>
>> > // returns 1 if little endian, 0 if big
>> > int am_i_little_endian() {
>> >     int x = 0x12345678;
>> >     char* b = (char*)&x;
>> >     return *b == (x & 0xFF);
>> > }
>>
>>         Are you *absolutely* sure that's standard C, and not just
>> something that happens to work most of the time?
>
> Tell ya what, I'm more than willing to admit being wrong - I've had to
> do it many times over the course of my life (and I'm sure I have many
> more ahead of me). :-)
>
> But if you can give me an example where it doesn't work, that's
> infinitely more useful (to me and the OP) than just questioning
> whether or not it works. I've used the above code many, many times in
> production code and on numerous platforms. I have yet to see it not
> work. Does that make me *absolutely* sure? No, but let's just say...
> confident. :-)

Well, IIRC, there's nothing in the C standard that prevents an
implementation to box the integers.

Assume that 0x12345678 is stored as:
+--+--+--+--+--+
|78|12|34|56|78|
+--+--+--+--+--+
(for example, there's a type tag 0x7 for integer, a size 0x4<<1 and a
bit = 0 for some garbage collector or anything else).  Then your
function will return true, when obviously the bytes are really stored
in big endian, with boxing.

Now, if you restrict yourself to implementations running on 32-bit
machines with 8-bit char with unboxed 2-complement integers, ok, it
will work.  You're just lucky this kind of machine represent 99.999%
of the chips sold.  (In the above example, we'd have sizeof(int)==5,
assuming a 8-bit char).

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

HEALTH WARNING: Care should be taken when lifting this product,
since its mass, and thus its weight, is dependent on its velocity
relative to the user.

From: George Neuner
Subject: Re: byte order
Date: Wed, 09 Jan 2008 03:43:54 +0000
Message-ID: <ong8o3deors31r3rott560emt1ooeqnmhb@4ax.com>

On Tue, 08 Jan 2008 22:48:16 +0100, Pascal Bourguignon
<···@informatimago.com> wrote:

>"Jeff M." <·······@gmail.com> writes:
>
>> On Jan 8, 2:37 pm, Raymond Wiker <····@RawMBP.local> wrote:
>>> "Jeff M." <·······@gmail.com> writes:
>>> > On Jan 7, 2:02 pm, Pascal Bourguignon <····@informatimago.com> wrote:
>>> >> Rainer Joswig <······@lisp.de> writes:
>>> >> > is there a way to determine the byte order
>>> >> > of the underlying 'machine'? Portable?
>>>
>>> >> Not in Common Lisp.
>>>
>>> >> Not in standard C either.
>>>
>>> > You're joking, right?
>>>
>>> > // returns 1 if little endian, 0 if big
>>> > int am_i_little_endian() {
>>> >     int x = 0x12345678;
>>> >     char* b = (char*)&x;
>>> >     return *b == (x & 0xFF);
>>> > }
>>>
>>>         Are you *absolutely* sure that's standard C, and not just
>>> something that happens to work most of the time?
>>
>> Tell ya what, I'm more than willing to admit being wrong - I've had to
>> do it many times over the course of my life (and I'm sure I have many
>> more ahead of me). :-)
>>
>> But if you can give me an example where it doesn't work, that's
>> infinitely more useful (to me and the OP) than just questioning
>> whether or not it works. I've used the above code many, many times in
>> production code and on numerous platforms. I have yet to see it not
>> work. Does that make me *absolutely* sure? No, but let's just say...
>> confident. :-)
>
>Well, IIRC, there's nothing in the C standard that prevents an
>implementation to box the integers.

That's technically true but see below.  The C standard doesn't say a
lot of things ... for example, it does not say you can't implement
integers as lines of dancing elephants.

>Assume that 0x12345678 is stored as:
>+--+--+--+--+--+
>|78|12|34|56|78|
>+--+--+--+--+--+
>(for example, there's a type tag 0x7 for integer, a size 0x4<<1 and a
>bit = 0 for some garbage collector or anything else).  Then your
>function will return true, when obviously the bytes are really stored
>in big endian, with boxing.
>
>Now, if you restrict yourself to implementations running on 32-bit
>machines with 8-bit char with unboxed 2-complement integers, ok, it
>will work.  You're just lucky this kind of machine represent 99.999%
>of the chips sold.  (In the above example, we'd have sizeof(int)==5,
>assuming a 8-bit char).

The C standard is somewhat less than consistent and hard to reason
about ... particularly where pointers, type conversions and characters
are concerned.  The whole notion of characters and strings in C was an
afterthought.  C89/C90 is the least restrictive of the existing
standards, but even there I think your example would be
non-conforming.

First, C89 does seem to require 2's complement format - it's not
stated anywhere but the need is implicit in the integer promotion
rules ($3.2.1.2).

As for your tagged integer:  

C89 specifies that sizeof() returns # of bytes and sizeof(char) == 1
($3.3.3.4).  So char == byte.  Though the bit-width of a byte is left
implementation dependent, this means that chars are atomic values
which could not have a separate tag.  

The pointer casting rules ($3.3.4) specify that "It is guaranteed that
a pointer to an object of a given alignment may be converted to a
pointer to an object of the same alignment or a less strict alignment
and back again; the result shall compare equal to the original
pointer".  char* is the least restrictive pointer type and so the int*
cast to a char* would not change the address.

char is defined to be an integer type ($3.2.1.5,6).  The section on
integral type conversions ($3.2.1.2) says "when an integer is demoted
to an unsigned integer with smaller size, the result is the
nonnegative remainder on division by the number one greater than the
largest unsigned number that can be represented in the type with
smaller size.  When an integer is demoted to a signed integer with
smaller size, or an unsigned integer is converted to its corresponding
signed integer, if the value cannot be represented the result is
implementation-defined."

Finally, the assignment operator ($3.3.16.1) specifies that "if the
value being stored in an object is accessed from another object that
overlaps in any way the storage of the first object, then the overlap
shall be exact".  Thus the value represented by the bits of the
integer that are overlapped by the char must be preserved.  Since a
char is an integer type, that value will be an integer.

Though nothing explicitly says an integer can't have a tag, it seems
to me that the char* cast from the integer's address could not legally
point to the integer's tag, but could only point to the integer
itself.

YMMV.
George
--
for email reply remove "/" from address

From: Raymond Wiker
Subject: Re: byte order
Date: Wed, 09 Jan 2008 06:04:00 +0000
Message-ID: <m2abnffsbz.fsf@Macintosh-2.local>

"Jeff M." <·······@gmail.com> writes:

> On Jan 8, 2:37 pm, Raymond Wiker <····@RawMBP.local> wrote:
>> "Jeff M." <·······@gmail.com> writes:
>> > On Jan 7, 2:02 pm, Pascal Bourguignon <····@informatimago.com> wrote:
>> >> Rainer Joswig <······@lisp.de> writes:
>> >> > is there a way to determine the byte order
>> >> > of the underlying 'machine'? Portable?
>>
>> >> Not in Common Lisp.
>>
>> >> Not in standard C either.
>>
>> > You're joking, right?
>>
>> > // returns 1 if little endian, 0 if big
>> > int am_i_little_endian() {
>> >     int x = 0x12345678;
>> >     char* b = (char*)&x;
>> >     return *b == (x & 0xFF);
>> > }
>>
>>         Are you *absolutely* sure that's standard C, and not just
>> something that happens to work most of the time?
>
> Tell ya what, I'm more than willing to admit being wrong - I've had to
> do it many times over the course of my life (and I'm sure I have many
> more ahead of me). :-)

	Oh, I'm not saying that it won't work, and I've done something
similar myself - but I'm not absolutely certain that it is guaranteed
to work. Elsewhere in this thread, Maciej Katafiasz quoted some
passages from the standard that may be a sufficiently strong
guarantee, though.

> But if you can give me an example where it doesn't work, that's
> infinitely more useful (to me and the OP) than just questioning
> whether or not it works. I've used the above code many, many times in
> production code and on numerous platforms. I have yet to see it not
> work. Does that make me *absolutely* sure? No, but let's just say...
> confident. :-)

From: Maciej Katafiasz
Subject: Re: byte order
Date: Tue, 08 Jan 2008 22:26:34 +0000
Message-ID: <40f79973-3d17-4213-b576-04ceb5aa1dd8@j78g2000hsd.googlegroups.com>

On Jan 8, 9:37 pm, Raymond Wiker <····@RawMBP.local> wrote:
> >> Not in Common Lisp.
>
> >> Not in standard C either.
>
> > You're joking, right?
>
> > // returns 1 if little endian, 0 if big
> > int am_i_little_endian() {
> >     int x = 0x12345678;
> >     char* b = (char*)&x;
> >     return *b == (x & 0xFF);
> > }
>
>         Are you *absolutely* sure that's standard C, and not just
> something that happens to work most of the time?

§6.2.6.1.4 of ISO/IEC 9899:1999 provides explicit provisions for
casting objects to arrays of bytes, and guarantees that such an array
will actually correspond to the in-memory representation.

The standard also gives the possibility of integer types containing
padding bits, and it is not specified how such bits can be detected.
It is, however, specified that value bits must be continuous, and that
char is the smallest integer type. Therefore, assuming 8-bit bits, the
above code is bound to be portable and reliably detect little-
endianness[1] of the host machine.

Cheers,
Maciej

[1] Little-endianness understood as "least significant value byte
comes first". An implementation using little-endian value
representation with padding bits coming before the value bits wouldn't
be reliably detected, but then, any such padding-using implementation
would need to be treated as another type of endianness anyway by any
code that cares about endianness, so it's not really an issue.

From: Vassil Nikolov
Subject: Re: byte order
Date: Wed, 09 Jan 2008 05:17:29 +0000
Message-ID: <snw1w8rzifq.fsf@luna.vassil.nikolov.name>

Maciej Katafiasz <········@gmail.com> writes:

> ...
> �6.2.6.1.4 of ISO/IEC 9899:1999 provides explicit provisions for
> casting objects to arrays of bytes, and guarantees that such an array
> will actually correspond to the in-memory representation.
>
> The standard also gives the possibility of integer types containing
> padding bits, and it is not specified how such bits can be detected.
> It is, however, specified that value bits must be continuous, and that
> char is the smallest integer type. Therefore, assuming 8-bit bits, the
> above code is bound to be portable and reliably detect little-
> endianness[1] of the host machine.

  But it seems that---at least in theory---that would fail to
  distinguish little-endian from any-kind-of-endian where there are
  (at least) eight padding bits at the low address and they happen to
  have the same value as the eight least significant bits of the
  integer value...

  In any case, this is interview "anti-question" material if nothing
  else...

  Thanks for the precise reference to the standard.

  ---Vassil.

-- 
Bound variables, free programmers.

From: Raymond Toy (RT/EUS)
Subject: Re: byte order
Date: Wed, 09 Jan 2008 15:39:48 +0000
Message-ID: <sxd1w8rt3cr.fsf@rtp.ericsson.se>

>>>>> "Maciej" == Maciej Katafiasz <········@gmail.com> writes:

    Maciej> On Jan 8, 9:37�pm, Raymond Wiker <····@RawMBP.local> wrote:
    >> >> Not in Common Lisp.
    >> 
    >> >> Not in standard C either.
    >> 
    >> > You're joking, right?
    >> 
    >> > // returns 1 if little endian, 0 if big
    >> > int am_i_little_endian() {
    >> > � � int x = 0x12345678;
    >> > � � char* b = (char*)&x;
    >> > � � return *b == (x & 0xFF);
    >> > }
    >> 
    >> � � � � Are you *absolutely* sure that's standard C, and not just
    >> something that happens to work most of the time?

    Maciej> �6.2.6.1.4 of ISO/IEC 9899:1999 provides explicit provisions for
    Maciej> casting objects to arrays of bytes, and guarantees that such an array
    Maciej> will actually correspond to the in-memory representation.

    Maciej> The standard also gives the possibility of integer types containing
    Maciej> padding bits, and it is not specified how such bits can be detected.
    Maciej> It is, however, specified that value bits must be continuous, and that
    Maciej> char is the smallest integer type. Therefore, assuming 8-bit bits, the
    Maciej> above code is bound to be portable and reliably detect little-
    Maciej> endianness[1] of the host machine.

Neat.

Off-topic, but perhaps interesting....

I remember, long ago, that I once used a Harris H800 (?) "super-mini
computer".  A rather interesting machine.  It had 24-bit words, with
word addressable memory.  

Pointers to strings were interesting.  Because the machines were only
word addressable, characters were packed 3 per word, a pointer had to
be able to point to one of the 3 characters in a word.  I forget the
exact details, but I think 2 bits out of a word were used to indicate
which of the 3 characters were being used.  Of course, this meant you
couldn't address all of the possible address space available because
you used 2 of the 24 bits for other things.  But pointers to words or
floats didn't have this, so casting pointers of these types to each
other caused problems.

And long ints were also weird.  You'd think that 2 24-bit words were
used for a long int.  That's right.  But there was some kind of gap in
the middle so the expected 48 bit long ints didn't have 48 bits of
data in them.  Normally you couldn't see this, but if you played some
games with shifting and oring/anding, you could actually see the gap
and create weird stuff.

I guess if the H800 were still alive (is it?) it wouldn't probably
wouldn't be able to support an ISO/IEC 9899:1999 compiler.

I certainly learned a lot about writing portable code back then, since
my code ran on H800 and pc's and some Sun workstations.  :-)

Ray

From: George Neuner
Subject: Re: byte order
Date: Tue, 08 Jan 2008 21:58:29 +0000
Message-ID: <0fr7o3hdrn7f2f3be25mhdk0rhd6v83rpl@4ax.com>

On Tue, 08 Jan 2008 21:37:53 +0100, Raymond Wiker <···@RawMBP.local>
wrote:

>"Jeff M." <·······@gmail.com> writes:
>
>> On Jan 7, 2:02 pm, Pascal Bourguignon <····@informatimago.com> wrote:
>>> Rainer Joswig <······@lisp.de> writes:
>>> > is there a way to determine the byte order
>>> > of the underlying 'machine'? Portable?
>>>
>>> Not in Common Lisp.
>>>
>>> Not in standard C either.
>>
>> You're joking, right?
>>
>> // returns 1 if little endian, 0 if big
>> int am_i_little_endian() {
>>     int x = 0x12345678;
>>     char* b = (char*)&x;
>>     return *b == (x & 0xFF);
>> }
>
>	Are you *absolutely* sure that's standard C, and not just
>something that happens to work most of the time? 

The C standard doesn't specify whether chars are signed or unsigned so
this code might produce a compiler warning on ==, but it is guaranteed
to work: (x & 0xFF) = 0x78, *b = 0x78 if little endian or *b = 0x01 if
big endian.  

It is definitely not as straight forward as Pascal Bourguignon's
solution using unions (though Pascal's tests were overkill).

George
--
for email reply remove "/" from address

From: ·······@eurogaran.com
Subject: Re: byte order
Date: Tue, 08 Jan 2008 09:53:09 +0000
Message-ID: <c9df440c-e46d-46a4-b623-bdcc10ae3e1f@m77g2000hsc.googlegroups.com>

> is there a way to determine the byte order
> of the underlying 'machine'? Portable?
>

Easiest solution would be to compare the result of
(machine-type)
with the contents of a previously made assoc. list.

Most probably a single pairings list could be constructed that would
work in every implementation.

From: Rainer Joswig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 10:38:24 +0000
Message-ID: <joswig-CA4DF9.11382408012008@news-europe.giganews.com>

In article 
<····································@m77g2000hsc.googlegroups.com>,
 ·······@eurogaran.com wrote:

> > is there a way to determine the byte order
> > of the underlying 'machine'? Portable?
> >
> 
> Easiest solution would be to compare the result of
> (machine-type)
> with the contents of a previously made assoc. list.

That would not be enough. On some processors
two different operating systems (or even programs)
may use different byte-orders.

> 
> Most probably a single pairings list could be constructed that would
> work in every implementation.

-- 
http://lispm.dyndns.org/

From: ·······@eurogaran.com
Subject: Re: byte order
Date: Tue, 08 Jan 2008 10:47:08 +0000
Message-ID: <141016ec-e97c-43b1-9a14-296f39861771@v46g2000hsv.googlegroups.com>

> > Easiest solution would be to compare the result of
> > (machine-type)
> > with the contents of a previously made assoc. list.
>
> That would not be enough. On some processors
> two different operating systems (or even programs)
> may use different byte-orders.

I didn't know that. Could you give some example cases?

From: George Neuner
Subject: Re: byte order
Date: Tue, 08 Jan 2008 17:02:35 +0000
Message-ID: <rea7o316vd3la3ko55h4e10bitnb0kd4j7@4ax.com>

On Tue, 8 Jan 2008 02:47:08 -0800 (PST), ·······@eurogaran.com wrote:

>> > Easiest solution would be to compare the result of
>> > (machine-type)
>> > with the contents of a previously made assoc. list.
>>
>> That would not be enough. On some processors
>> two different operating systems (or even programs)
>> may use different byte-orders.
>
>I didn't know that. Could you give some example cases?

Just an addition to Rainer Joswig's longer response ... some of the
PowerPCs support changing byte order on a per VM page basis, so even
within a single process you could have your data either way.  It's
quite useful for (un)marshalling data in a heterogenous environment.

George

btw: please make sure you attribute comments from other posters.  It's
very hard to follow the conversion when you don't know who wrote what.

--
for email reply remove "/" from address

From: Rainer Joswig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 11:05:15 +0000
Message-ID: <joswig-D5C6E9.12051508012008@news-europe.giganews.com>

In article 
<····································@v46g2000hsv.googlegroups.com>,
 ·······@eurogaran.com wrote:

> > > Easiest solution would be to compare the result of
> > > (machine-type)
> > > with the contents of a previously made assoc. list.
> >
> > That would not be enough. On some processors
> > two different operating systems (or even programs)
> > may use different byte-orders.
> 
> I didn't know that. Could you give some example cases?

ARM6, SPARC v9, DEC Alpha, many PowerPC etc.
do support different byte-orders (to some degree).

For example the Virtual PC emulator on the PowerPC (G4)
used that to more efficiently emulate a little-endian
machine on a big-endian processor. So, you
had a Mac running Mac OS as big-endian and
the emulator running under Mac OS was running
in little-endian processing mode. Later the PowerPC 970 did not have
that capability.

http://developer.apple.com/documentation/Hardware/DeviceManagers/pci_srvcs/pci_cards_drivers/PCI_BOOK.24e.html

From some SUN paper:

However, some 
recent computer architectures support both big endian and little endian modes. 
Why? 
The reason for a processor architecture to support both big and little endian 
operation invariably derives from a requirement to support legacy 
environments�operating systems and applications�of both endiannesses. 
MIPS was originally a big endian architecture; MIPS added little endian 
support to induce DEC, with a little endian legacy, to adopt MIPS processors 
for its desktop systems. IBM had a big endian legacy on its workstation and 
server systems and a little endian legacy on its Intel-based personal computers 
and wanted to support both with the PowerPC architecture. The IA-64 
architecture resulted from a collaboration between Intel, with a little endian 
legacy, and Hewlett Packard with a big endian legacy on its workstations, so it, 
too, supports both.

-- 
http://lispm.dyndns.org/

From: ·······@eurogaran.com
Subject: Re: byte order
Date: Tue, 08 Jan 2008 12:04:21 +0000
Message-ID: <d595444f-1d7a-4d3c-9a4c-5a47a2e0579d@f3g2000hsg.googlegroups.com>

> > I didn't know that. Could you give some example cases?
>
> ARM6, SPARC v9, DEC Alpha, many PowerPC etc.
> do support different byte-orders (to some degree).
>
> For example the Virtual PC emulator on the PowerPC (G4)
> used that to more efficiently emulate a little-endian
> machine on a big-endian processor. So, you
> had a Mac running Mac OS as big-endian and
> the emulator running under Mac OS was running
> in little-endian processing mode. Later the PowerPC 970 did not have
> that capability.

Interesting. So MCL running on the MacOS would see PowerPC (64bit, big-
endian) as machine type, while say CLISP running simultaneously inside
the PC emulator would probably detect i386 (32bit, little-endian).
Which one should be considered as "correct" remains a rather
metaphysical question.

Perhaps the need to know the endianness should be considered an
indicative of bad programming style to begin with.

From: Rainer Joswig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 12:52:59 +0000
Message-ID: <joswig-314AD4.13525908012008@news-europe.giganews.com>

In article 
<····································@f3g2000hsg.googlegroups.com>,
 ·······@eurogaran.com wrote:

> > > I didn't know that. Could you give some example cases?
> >
> > ARM6, SPARC v9, DEC Alpha, many PowerPC etc.
> > do support different byte-orders (to some degree).
> >
> > For example the Virtual PC emulator on the PowerPC (G4)
> > used that to more efficiently emulate a little-endian
> > machine on a big-endian processor. So, you
> > had a Mac running Mac OS as big-endian and
> > the emulator running under Mac OS was running
> > in little-endian processing mode. Later the PowerPC 970 did not have
> > that capability.
> 
> Interesting. So MCL running on the MacOS would see PowerPC (64bit, big-
> endian) as machine type, while say CLISP running simultaneously inside
> the PC emulator would probably detect i386 (32bit, little-endian).
> Which one should be considered as "correct" remains a rather
> metaphysical question.
> 
> Perhaps the need to know the endianness should be considered an
> indicative of bad programming style to begin with.

So your software does not exchange data with other systems?

-- 
http://lispm.dyndns.org/

From: Vesa Karvonen
Subject: Re: byte order
Date: Tue, 08 Jan 2008 13:28:51 +0000
Message-ID: <flvtqj$p5n$1@oravannahka.helsinki.fi>

Rainer Joswig <······@lisp.de> wrote:
> In article 
> <····································@f3g2000hsg.googlegroups.com>,
>  ·······@eurogaran.com wrote:
[...]
> > Perhaps the need to know the endianness should be considered an
> > indicative of bad programming style to begin with.

It can be brittle.

> So your software does not exchange data with other systems?

I would recommend doing such data exchange using machine independent data
formats as much as possible.  In particular, regardless of what endianness
your machine has, always read/write the data using one endianness (little,
big, or whatever).  You shouldn't need to know the endianness of a machine
to do that.

-Vesa Karvonen

From: ·······@eurogaran.com
Subject: Re: byte order
Date: Tue, 08 Jan 2008 13:55:44 +0000
Message-ID: <66f63567-b566-45b3-b416-8c99498dbe0b@p69g2000hsa.googlegroups.com>

> I would recommend doing such data exchange using machine independent data
> formats as much as possible.  In particular, regardless of what endianness
> your machine has, always read/write the data using one endianness (little,
> big, or whatever).  You shouldn't need to know the endianness of a machine
> to do that.
>

That's exactly what they do with images:
JPEG contains big-endian values while GIF images contain little-endian
values.

From: Zach Beane
Subject: Re: byte order
Date: Tue, 08 Jan 2008 15:44:02 +0000
Message-ID: <m3lk709vb1.fsf@unnamed.xach.com>

·······@eurogaran.com writes:

>> I would recommend doing such data exchange using machine independent data
>> formats as much as possible.  In particular, regardless of what endianness
>> your machine has, always read/write the data using one endianness (little,
>> big, or whatever).  You shouldn't need to know the endianness of a machine
>> to do that.
>>
>
> That's exactly what they do with images:
> JPEG contains big-endian values while GIF images contain little-endian
> values.

And ESRI shapefiles contain both big-endian and little-endian values,
depending on the field. Fun!

Zach

From: Rainer Joswig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 14:35:57 +0000
Message-ID: <joswig-6D811F.15355608012008@news-europe.giganews.com>

In article <············@oravannahka.helsinki.fi>,
 Vesa Karvonen <·············@cs.helsinki.fi> wrote:

> Rainer Joswig <······@lisp.de> wrote:
> > In article 
> > <····································@f3g2000hsg.googlegroups.com>,
> >  ·······@eurogaran.com wrote:
> [...]
> > > Perhaps the need to know the endianness should be considered an
> > > indicative of bad programming style to begin with.
> 
> It can be brittle.
> 
> > So your software does not exchange data with other systems?
> 
> I would recommend doing such data exchange using machine independent data
> formats as much as possible.  In particular, regardless of what endianness
> your machine has, always read/write the data using one endianness (little,
> big, or whatever).  You shouldn't need to know the endianness of a machine
> to do that.
> 
> -Vesa Karvonen

But you have to know. If you want to write big-endian binary data,
you need to know if your Lisp runs little- or big-endian.
The software I'm looking at is exactly doing that. It deals
with binary data and should run on different platforms.

Example:

If you write 32 bit values to a binary stream, the result
will be different depending what machine / OS your Lisp system
runs on.

If you use a Lisp on a SPARC/Solaris, you get different
results, than under, say x86/Solaris.

So, if you need to write big-endian 32bit data,
that means on a big-endian system you can just write the binary
data with WRITE-BYTE. On a little-endian system you have
to reorder the bytes.

It starts with TCP/IP. 32bit values are big-endian. You might
need to deal with that if you are on a little-endian system.

Java is also using big-endian.

From: Thomas F. Burdick
Subject: Re: byte order
Date: Tue, 08 Jan 2008 14:58:03 +0000
Message-ID: <e993ea33-df2c-456c-80b4-ddc772dcebc1@l32g2000hse.googlegroups.com>

On Jan 8, 3:35 pm, Rainer Joswig <······@lisp.de> wrote:
> In article <············@oravannahka.helsinki.fi>,
>  Vesa Karvonen <·············@cs.helsinki.fi> wrote:
>
>
>
> > Rainer Joswig <······@lisp.de> wrote:
> > > In article
> > > <····································@f3g2000hsg.googlegroups.com>,
> > >  ·······@eurogaran.com wrote:
> > [...]
> > > > Perhaps the need to know the endianness should be considered an
> > > > indicative of bad programming style to begin with.
>
> > It can be brittle.
>
> > > So your software does not exchange data with other systems?
>
> > I would recommend doing such data exchange using machine independent data
> > formats as much as possible.  In particular, regardless of what endianness
> > your machine has, always read/write the data using one endianness (little,
> > big, or whatever).  You shouldn't need to know the endianness of a machine
> > to do that.
>
> > -Vesa Karvonen
>
> But you have to know. If you want to write big-endian binary data,
> you need to know if your Lisp runs little- or big-endian.
> The software I'm looking at is exactly doing that. It deals
> with binary data and should run on different platforms.

You can always read and write 8-bit bytes at a time, and use ldb and
(setf ldb) to arrange those into larger values in the desired byte
order.  Then your code will run unmodified under big-, little-, or
wacky-endian machines.

From: Rainer Joswig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 15:29:45 +0000
Message-ID: <joswig-E2FA7B.16294508012008@news-europe.giganews.com>

In article 
<····································@l32g2000hse.googlegroups.com>,
 "Thomas F. Burdick" <········@gmail.com> wrote:

> On Jan 8, 3:35 pm, Rainer Joswig <······@lisp.de> wrote:
> > In article <············@oravannahka.helsinki.fi>,
> >  Vesa Karvonen <·············@cs.helsinki.fi> wrote:
> >
> >
> >
> > > Rainer Joswig <······@lisp.de> wrote:
> > > > In article
> > > > <····································@f3g2000hsg.googlegroups.com>,
> > > >  ·······@eurogaran.com wrote:
> > > [...]
> > > > > Perhaps the need to know the endianness should be considered an
> > > > > indicative of bad programming style to begin with.
> >
> > > It can be brittle.
> >
> > > > So your software does not exchange data with other systems?
> >
> > > I would recommend doing such data exchange using machine independent data
> > > formats as much as possible.  In particular, regardless of what endianness
> > > your machine has, always read/write the data using one endianness (little,
> > > big, or whatever).  You shouldn't need to know the endianness of a machine
> > > to do that.
> >
> > > -Vesa Karvonen
> >
> > But you have to know. If you want to write big-endian binary data,
> > you need to know if your Lisp runs little- or big-endian.
> > The software I'm looking at is exactly doing that. It deals
> > with binary data and should run on different platforms.
> 
> You can always read and write 8-bit bytes at a time, and use ldb and
> (setf ldb) to arrange those into larger values in the desired byte
> order.  Then your code will run unmodified under big-, little-, or
> wacky-endian machines.

Performance?

From: Zach Beane
Subject: Re: byte order
Date: Tue, 08 Jan 2008 17:06:41 +0000
Message-ID: <m3hcho9rha.fsf@unnamed.xach.com>

Rainer Joswig <······@lisp.de> writes:

> Performance?

Try it first, then profile, then fix it where performance is
unacceptable.

I use (unsigned-byte 8) streams and vectors to write
endian-independent values to image files, and it works fast enough for
me to generate tens of thousands of graphics every day. Your needs
might be different.

Zach

From: Thomas F. Burdick
Subject: Re: byte order
Date: Wed, 09 Jan 2008 08:47:44 +0000
Message-ID: <890cd4f9-541e-4c38-b909-3be611f70e16@i3g2000hsf.googlegroups.com>

On Jan 8, 4:29 pm, Rainer Joswig <······@lisp.de> wrote:
> In article
> <····································@l32g2000hse.googlegroups.com>,
>  "Thomas F. Burdick" <········@gmail.com> wrote:
>
>
>
> > On Jan 8, 3:35 pm, Rainer Joswig <······@lisp.de> wrote:
> > > In article <············@oravannahka.helsinki.fi>,
> > >  Vesa Karvonen <·············@cs.helsinki.fi> wrote:
>
> > > > Rainer Joswig <······@lisp.de> wrote:
> > > > > In article
> > > > > <····································@f3g2000hsg.googlegroups.com>,
> > > > >  ·······@eurogaran.com wrote:
> > > > [...]
> > > > > > Perhaps the need to know the endianness should be considered an
> > > > > > indicative of bad programming style to begin with.
>
> > > > It can be brittle.
>
> > > > > So your software does not exchange data with other systems?
>
> > > > I would recommend doing such data exchange using machine independent data
> > > > formats as much as possible.  In particular, regardless of what endianness
> > > > your machine has, always read/write the data using one endianness (little,
> > > > big, or whatever).  You shouldn't need to know the endianness of a machine
> > > > to do that.
>
> > > > -Vesa Karvonen
>
> > > But you have to know. If you want to write big-endian binary data,
> > > you need to know if your Lisp runs little- or big-endian.
> > > The software I'm looking at is exactly doing that. It deals
> > > with binary data and should run on different platforms.
>
> > You can always read and write 8-bit bytes at a time, and use ldb and
> > (setf ldb) to arrange those into larger values in the desired byte
> > order.  Then your code will run unmodified under big-, little-, or
> > wacky-endian machines.
>
> Performance?

Unless you're writing a deep-packet inspecting router, I seriously
doubt that any reasonable CL will give you performance problems with
this technique (or obvious variations on it -- think of read-sequence,
for example).

But what makes you think you have alternatives?  Okay, so you're doing
little-endian i/o and you've successfully detected that you're on a
big-endian machine; what do you do now?  You *have* to arrange the
bytes in the order you need them, there's no magical way of avoiding
it.

From: John Thingstad
Subject: Re: byte order
Date: Wed, 09 Jan 2008 11:07:11 +0000
Message-ID: <op.t4ns59o3ut4oq5@pandora.alfanett.no>

P� Tue, 08 Jan 2008 16:29:45 +0100, skrev Rainer Joswig <······@lisp.de>:

>
> Performance?

What do you think buffering is for?

--------------
John Thingstad

From: Duane Rettig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 15:59:02 +0000
Message-ID: <o0sl18ia0p.fsf@gemini.franz.com>

Rainer Joswig <······@lisp.de> writes:

> In article <············@oravannahka.helsinki.fi>,
>  Vesa Karvonen <·············@cs.helsinki.fi> wrote:
>
>> Rainer Joswig <······@lisp.de> wrote:
>> > In article 
>> > <····································@f3g2000hsg.googlegroups.com>,
>> >  ·······@eurogaran.com wrote:
>> [...]
>> > > Perhaps the need to know the endianness should be considered an
>> > > indicative of bad programming style to begin with.
>> 
>> It can be brittle.
>> 
>> > So your software does not exchange data with other systems?
>> 
>> I would recommend doing such data exchange using machine independent data
>> formats as much as possible.  In particular, regardless of what endianness
>> your machine has, always read/write the data using one endianness (little,
>> big, or whatever).  You shouldn't need to know the endianness of a machine
>> to do that.
>> 
>> -Vesa Karvonen
>
> But you have to know. If you want to write big-endian binary data,
> you need to know if your Lisp runs little- or big-endian.
> The software I'm looking at is exactly doing that. It deals
> with binary data and should run on different platforms.

See my response earlier on this thread.  In addition to what I said
there about read-vector/write-vector, we also supply as part of our
"osi" (operating system interface) module the four inet functions
htonl, htons, ntohl, and ntohs:

http://www.franz.com/support/documentation/8.1/doc/os-interface.htm#ntohl-op-bookmarkxx

for doing specific conversions without knowing the endianness of the
architecture you are on.
-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: ···············@gmail.com
Subject: Re: byte order
Date: Tue, 08 Jan 2008 18:48:13 +0000
Message-ID: <a184ab84-f1fc-45ac-b511-1a1883e242f8@c23g2000hsa.googlegroups.com>

On Jan 8, 9:35 am, Rainer Joswig <······@lisp.de> wrote:

>
> But you have to know. If you want to write big-endian binary data,
> you need to know if your Lisp runs little- or big-endian.
> The software I'm looking at is exactly doing that. It deals
> with binary data and should run on different platforms.
>
> Example:
>
> If you write 32 bit values to a binary stream, the result
> will be different depending what machine / OS your Lisp system
> runs on.
>
> If you use a Lisp on a SPARC/Solaris, you get different
> results, than under, say x86/Solaris.
>
> So, if you need to write big-endian 32bit data,
> that means on a big-endian system you can just write the binary
> data with WRITE-BYTE. On a little-endian system you have
> to reorder the bytes.

I think this particular passage and this thread in general, are in
danger of confusing the separate issues of "data layout in byte-
addressed memory" with "data layout in I/O streams." I/O streams (and
files) are going to have complexities that go beyond "little endian"
and "big endian" distinctions. In particular, whether the streams have
implementation-, file-system- or OS-specific support for indicating
byte order and byte packing conventions.

Writing integers to a stream of element type (UNSIGNED-BYTE 32) is not
checking at all the same thing as C code that packs a long/char[4]
union in memory and reads back the result.

> It starts with TCP/IP. 32bit values are big-endian. You might
> need to deal with that if you are on a little-endian system.
>
> Java is also using big-endian.

From: Victor Anyakin
Subject: Re: byte order
Date: Tue, 08 Jan 2008 15:13:40 +0000
Message-ID: <86y7b0qriz.fsf@victor-mobi.my.domain>

Rainer Joswig <······@lisp.de> writes:

> In article <············@oravannahka.helsinki.fi>,
>  Vesa Karvonen <·············@cs.helsinki.fi> wrote:
>
>> Rainer Joswig <······@lisp.de> wrote:
>> > In article 
>> > <····································@f3g2000hsg.googlegroups.com>,
>> >  ·······@eurogaran.com wrote:
>> [...]
>> > > Perhaps the need to know the endianness should be considered an
>> > > indicative of bad programming style to begin with.
>> 
>> It can be brittle.
>> 
>> > So your software does not exchange data with other systems?
>> 
>> I would recommend doing such data exchange using machine independent data
>> formats as much as possible.  In particular, regardless of what endianness
>> your machine has, always read/write the data using one endianness (little,
>> big, or whatever).  You shouldn't need to know the endianness of a machine
>> to do that.
>> 
>> -Vesa Karvonen
>
> It starts with TCP/IP. 32bit values are big-endian. You might
> need to deal with that if you are on a little-endian system.

I guess, there are the

htonl, htons, ntohl, ntohs -- convert values between host and network
  byte order

set of native functions (man section 3) to deal with byte order.

With best regards,
Victor

From: Casper H.S. Dik
Subject: Re: byte order
Date: Tue, 08 Jan 2008 15:48:08 +0000
Message-ID: <47839b38$0$85795$e4fe514c@news.xs4all.nl>

Vesa Karvonen <·············@cs.helsinki.fi> writes:

>I would recommend doing such data exchange using machine independent data
>formats as much as possible.  In particular, regardless of what endianness
>your machine has, always read/write the data using one endianness (little,
>big, or whatever).  You shouldn't need to know the endianness of a machine
>to do that.

Ah, but to do so requires some part of the software to know the endianness.

(The only way around that is using only textual representations which is not
highly efficient)

Casper
-- 
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

From: Vesa Karvonen
Subject: Re: byte order
Date: Tue, 08 Jan 2008 16:02:25 +0000
Message-ID: <fm06qh$3pl$1@oravannahka.helsinki.fi>

Casper H.S. Dik <··········@sun.com> wrote:
> Vesa Karvonen <·············@cs.helsinki.fi> writes:

> >I would recommend doing such data exchange using machine independent data
> >formats as much as possible.  In particular, regardless of what endianness
> >your machine has, always read/write the data using one endianness (little,
> >big, or whatever).  You shouldn't need to know the endianness of a machine
> >to do that.

> Ah, but to do so requires some part of the software to know the endianness.

In the case of integers it does not, because you can always easily take
them apart via repeated division.  Floating point values are more
difficult to serialize without primitive support from the compiler (but it
can be done).

-Vesa Karvonen

From: Madhu
Subject: Re: byte order
Date: Tue, 08 Jan 2008 15:28:28 +0000
Message-ID: <m3lk70ibfn.fsf@robolove.meer.net>

* Vesa Karvonen <············@oravannahka.helsinki.fi> 
Wrote on 8 Jan 2008 13:28:51 GMT:
| Rainer Joswig <······@lisp.de> wrote:
| It can be brittle.
|
|> So your software does not exchange data with other systems?
|
| I would recommend doing such data exchange using machine independent data
| formats as much as possible.  In particular, regardless of what endianness
| your machine has, always read/write the data using one endianness (little,
| big, or whatever).  You shouldn't need to know the endianness of a machine
| to do that.

I'm sure Rainer has a case where some data is being generated in the
host byte format.  For example tcpdump(8) savefiles, intended for
dissection with pcap(3), can be read on a machine with a different byte
order than the ones they were saved on.  Some field(s) may be specified
to be host byte order, in which case it becomes necessary to swab stuff
when reading the file, and to determine whether the byte order of the
machine differs from the machine where the dump was done.

Here is how I do it: the file magic indicates the byte order of the
machine doing the dump.  When reading the file, start reading the magic
as little-endian, and then determine correct endianness to read the rest
of the file. (Any loopholes?)  Sketch:

(defvar *pcap-file-header-magic*
  '((:little-endian 	#xd4c3b2a1)	;1234
    (:pdp-endian	#xb2a1d4c3)	;3412 (BOGUS)!
    (:big-endian 	#xa1b2c3d4)))	;4321

(defun read-dump-file (...)  ...
  (let* ((*endian* :little-endian) ...
         (magic (read-binary ....)))
   ;; Then set endian again depending on the value if swapped
     (let ((*endian*
            (car (rassoc magic *pcap-file-header-magic* :key #'car))))
        ...
        (read-binary ...) ...)))

--
Madhu

From: Vassil Nikolov
Subject: Re: byte order
Date: Wed, 09 Jan 2008 04:57:51 +0000
Message-ID: <snw63y3zjcg.fsf@luna.vassil.nikolov.name>

Madhu <·······@meer.net> writes:

> ...
> Here is how I do it: the file magic indicates the byte order of the
> machine doing the dump.

  And then there is the zero-width non-breaking space (U+FFFE, or is
  it U+FEFF), and... UTF-8 rules.  I don't know what the moral of this
  story is---I wonder what the Duchess would say...

  ---Vassil.

-- 
Bound variables, free programmers.

From: Pascal Bourguignon
Subject: Re: byte order
Date: Tue, 08 Jan 2008 20:19:44 +0000
Message-ID: <87bq7w6pen.fsf@thalassa.informatimago.com>

Rainer Joswig <······@lisp.de> writes:

> In article 
> <····································@f3g2000hsg.googlegroups.com>,
>  ·······@eurogaran.com wrote:
>
>> > > I didn't know that. Could you give some example cases?
>> >
>> > ARM6, SPARC v9, DEC Alpha, many PowerPC etc.
>> > do support different byte-orders (to some degree).
>> >
>> > For example the Virtual PC emulator on the PowerPC (G4)
>> > used that to more efficiently emulate a little-endian
>> > machine on a big-endian processor. So, you
>> > had a Mac running Mac OS as big-endian and
>> > the emulator running under Mac OS was running
>> > in little-endian processing mode. Later the PowerPC 970 did not have
>> > that capability.
>> 
>> Interesting. So MCL running on the MacOS would see PowerPC (64bit, big-
>> endian) as machine type, while say CLISP running simultaneously inside
>> the PC emulator would probably detect i386 (32bit, little-endian).
>> Which one should be considered as "correct" remains a rather
>> metaphysical question.
>> 
>> Perhaps the need to know the endianness should be considered an
>> indicative of bad programming style to begin with.
>
> So your software does not exchange data with other systems?

Ah, but this is a totally different question!

You need to know the byte order of your file and transmission media.
Nothing to do with the bytesex of the processor/system.

You should write byte sequences.  If your file format specifies a
24-bit integer I=(A*256+B)*256+C (with A, B, C in [0..255]) stored as:
+---+---+---+
| A | B | C | "big endian"
+---+---+---+ 
then you write it with:

(with-open-file (stream path :direction :output :external-format '(unsigned-byte 8))
   (write-byte (ldb (byte 8 16) i))
   (write-byte (ldb (byte 8  8) i))
   (write-byte (ldb (byte 8  0) i)))


On the other hand, if it specifies this order:
+---+---+---+
| C | A | B | "random endian"
+---+---+---+ 
then you write it with:

(with-open-file (stream path :direction :output :external-format '(unsigned-byte 8))
   (write-byte (ldb (byte 8  0) i))
   (write-byte (ldb (byte 8 16) i))
   (write-byte (ldb (byte 8  8) i)))


etc.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Our users will know fear and cower before our software! Ship it!
Ship it and let them flee like the dogs they are!"

From: Rob Warnock
Subject: Re: byte order
Date: Wed, 09 Jan 2008 08:33:02 +0000
Message-ID: <EtudnTg_cagjGxnanZ2dnUVZ_vjinZ2d@speakeasy.net>

<·······@eurogaran.com> wrote:
+---------------
| > On some processors two different operating systems
| > (or even programs) may use different byte-orders.
| 
| I didn't know that. Could you give some example cases?
+---------------

SGI's MIPS-based Irix workstations were big-endian;
DEC's MIPS-based Ultrix workstations were little-endian.


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Pascal Bourguignon
Subject: Re: byte order
Date: Wed, 09 Jan 2008 20:09:04 +0000
Message-ID: <87ve62g3rz.fsf@thalassa.informatimago.com>

Madhu <·······@meer.net> writes:

> * Rainer Joswig <····························@news-europe.giganews.com> :
> Wrote on Tue, 08 Jan 2008 11:38:24 +0100:
>
> | In article 
> | <····································@m77g2000hsc.googlegroups.com>,
> |  ·······@eurogaran.com wrote:
> |
> |> > is there a way to determine the byte order
> |> > of the underlying 'machine'? Portable?
> |> >
> |> 
> |> Easiest solution would be to compare the result of
> |> (machine-type)
> |> with the contents of a previously made assoc. list.
> |
> | That would not be enough. On some processors
> | two different operating systems (or even programs)
> | may use different byte-orders.
>
> Can you use the technique I outlined in another post in this thread?  It
> would involve creating 2 files with a certain magic, one on a big endian
> machine and one on a little endian machine.  Copying the files to the
> target machine.  Reading the files would be sufficient to determine
> endianness.  I think this should be portable enough for use inside an
> application if you have control over configuration management.

The Common Lisp standard doesn't define any mapping of its file
formats (external-format) to the host files.  The only standard
external-format is :DEFAULT, and nothing is specified about it!

However most implementation on 8-bit addressable system do implement
:element-type '(unsigned-byte 8) :external-format :default sanely,
that is, mapping each byte to an octet in the host file, without any
overhead.  This would be the most you can do to read and write files
portably.  However, it would be totally useless to infer byte order
since you have to explicitely write the bytes in the order you want.

For :element-type (unsigned-byte N) with (/= N 8), anything can
happen.  To implement the requirements of the standard, most
implementations will add a header or a trailer to the file when N is
not a multiple of 8.  Some implementation will write the bytes in an
order that depend on the host system, but some won't.  

For example, clisp ALWAYS writes binary files in little-endian order,
to ensure portability of the files from all the platforms it works on.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
        Un chat errant
se soulage
        dans le jardin d'hiver
                                        Shiki

From: Pascal Bourguignon
Subject: Re: byte order
Date: Tue, 08 Jan 2008 20:02:25 +0000
Message-ID: <87ir246q7i.fsf@thalassa.informatimago.com>

·······@eurogaran.com writes:

·······@eurogaran.com writes:

>> is there a way to determine the byte order
>> of the underlying 'machine'? Portable?
>>
>
> Easiest solution would be to compare the result of
> (machine-type)
> with the contents of a previously made assoc. list.
>
> Most probably a single pairings list could be constructed that would
> work in every implementation.

AFAIK, PowerPC, ARM, etc, can work either in little endian and in big
endian.  Knowing only the processor name is not enough to know what
endianness is used by the system.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Our users will know fear and cower before our software! Ship it!
Ship it and let them flee like the dogs they are!"

From: Duane Rettig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 15:45:43 +0000
Message-ID: <o0wsqkiamw.fsf@gemini.franz.com>

Rainer Joswig <······@lisp.de> writes:

> Hi,
>
> is there a way to determine the byte order
> of the underlying 'machine'? Portable?

From the rest of this thread, you obviously understand that endianness
is not a characteristic of a machine but of a state of a machine (I
presume that's why you put quotes around 'machine').  Thus it tends to
be more a characteristic of an operating system than of a machine.  I
suppose it could even be made a per-program basis, like the
distinction between 32 and 64 bits on machines which support both.
But usually those systems have separate libraries for each (or
libraries which can be configured for each), and I've never seen a set
of libraries that differ only in endianness - in that sense the
endianness tends to get paired with the operating system itself.

What operations are you trying to perform?  In Allegro CL, we do two
things: We provide either :big-endian or :little-endian on *features*,
and we also implement simple-streams, which has an extended function
pair called read-vector/write-vector:

http://www.franz.com/support/documentation/8.1/doc/operators/excl/read-vector.htm
http://www.franz.com/support/documentation/8.1/doc/operators/excl/write-vector.htm

These are similar to read-sequence/write-sequence, but they are more
octet-fill oriented rather than element oriented.  Each of these
functions has an :endian-swap keyword argument - you can specify a bit
pattern or certain keywords to indicate the kind of swapping, which
includes the :network-order keyword.  Thus any files written with
write-vector with :endian-swap :network-order can be read from any
lisp with read-vector call with :endian-swap :network-order.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Rainer Joswig
Subject: Re: byte order
Date: Tue, 08 Jan 2008 16:21:52 +0000
Message-ID: <joswig-61580E.17213708012008@news-europe.giganews.com>

In article <··············@gemini.franz.com>,
 Duane Rettig <·····@franz.com> wrote:

> Rainer Joswig <······@lisp.de> writes:
> 
> > Hi,
> >
> > is there a way to determine the byte order
> > of the underlying 'machine'? Portable?
> 
> From the rest of this thread, you obviously understand that endianness
> is not a characteristic of a machine but of a state of a machine (I
> presume that's why you put quotes around 'machine').  Thus it tends to
> be more a characteristic of an operating system than of a machine.  I
> suppose it could even be made a per-program basis, like the
> distinction between 32 and 64 bits on machines which support both.
> But usually those systems have separate libraries for each (or
> libraries which can be configured for each), and I've never seen a set
> of libraries that differ only in endianness - in that sense the
> endianness tends to get paired with the operating system itself.
> 
> What operations are you trying to perform?  In Allegro CL, we do two
> things: We provide either :big-endian or :little-endian on *features*,
> and we also implement simple-streams, which has an extended function
> pair called read-vector/write-vector:
> 
> http://www.franz.com/support/documentation/8.1/doc/operators/excl/read-vector.htm
> http://www.franz.com/support/documentation/8.1/doc/operators/excl/write-vector.htm
> 
> These are similar to read-sequence/write-sequence, but they are more
> octet-fill oriented rather than element oriented.  Each of these
> functions has an :endian-swap keyword argument - you can specify a bit
> pattern or certain keywords to indicate the kind of swapping, which
> includes the :network-order keyword.  Thus any files written with
> write-vector with :endian-swap :network-order can be read from any
> lisp with read-vector call with :endian-swap :network-order.

Thanks for the info!