From: Brian Seitz
Subject: Testing for binary file
Date: 
Message-ID: <Pine.GSO.4.21.0203041655180.16466-100000@ywing.stsci.edu>
Is there a standard way to test a file for being binary or
text?  Something to the effect of -B in Perl.  I would be using either
Allegro 5.01 or CMUCL (whatever version is in Debian unstable).

Thanks,

Brian  

From: Erik Naggum
Subject: Re: Testing for binary file
Date: 
Message-ID: <3224279457059736@naggum.net>
* Brian Seitz <······@stsci.edu>
| Is there a standard way to test a file for being binary or
| text?  Something to the effect of -B in Perl.  I would be using either
| Allegro 5.01 or CMUCL (whatever version is in Debian unstable).

  I have no idea what -B does in Perl, but a text file is generally
  understood to be a file that lacks any other control characters than the
  horizontal (CR, HT, SP) and vertical (LF, VT) format effectors.  If you
  have a decently encoded character set, that means very few characters in
  the ranges #x00-#x1f and #x7f-#x9f.

  If you have some IBM-based crud page or any one of the usual Microsoft
  disasters, there is no way to tell for real, except you would probably
  find periodic line breaks with CRLF in text files.

  A common and very simple negative test for a text file is if the last
  character in teh file is not a line feed.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.
From: Larry Clapp
Subject: Re: Testing for binary file
Date: 
Message-ID: <du246a.bg7.ln@rabbit.ddts.net>
In article <········································@ywing.stsci.edu>, Brian Seitz wrote:
> Is there a standard way to test a file for being binary or
> text?  Something to the effect of -B in Perl.  I would be using either
> Allegro 5.01 or CMUCL (whatever version is in Debian unstable).

I dunno about a *standard* way, but you could always rewrite Perl's -B
operator.  From perlfunc(1):

   The "-T" and "-B" switches work as follows.  The first block or so of the
   file is examined for odd characters such as strange control codes or
   characters with the high bit set.  If too many strange characters (>30%)
   are found, it's a "-B" file, otherwise it's a "-T" file.  Also, any file
   containing null in the first block is considered a binary file.

(-T, for you non-Perl-ers, tests for text files.)

-- Larry
From: Marco Antoniotti
Subject: Re: Testing for binary file
Date: 
Message-ID: <y6c4rjt1x97.fsf@octagon.mrl.nyu.edu>
Larry Clapp <·····@theclapp.org> writes:

> In article <········································@ywing.stsci.edu>, Brian Seitz wrote:
> > Is there a standard way to test a file for being binary or
> > text?  Something to the effect of -B in Perl.  I would be using either
> > Allegro 5.01 or CMUCL (whatever version is in Debian unstable).
> 
> I dunno about a *standard* way, but you could always rewrite Perl's -B
> operator.  From perlfunc(1):
> 
>    The "-T" and "-B" switches work as follows.  The first block or so of the
>    file is examined for odd characters such as strange control codes or
>    characters with the high bit set.  If too many strange characters (>30%)
>    are found, it's a "-B" file, otherwise it's a "-T" file.  Also, any file
>    containing null in the first block is considered a binary file.

Sorry.  Binary files are defined by having 42% of "strange" characters
in the first 4242 sextets (4+2 bits).

Perl got this wrong. :)

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Tim Bradshaw
Subject: Re: Testing for binary file
Date: 
Message-ID: <ey3pu2hejrj.fsf@cley.com>
* Marco Antoniotti wrote:

> Sorry.  Binary files are defined by having 42% of "strange" characters
> in the first 4242 sextets (4+2 bits).

> Perl got this wrong. :)

Rubbish.  Files are binary if more than 17 of the first 23 5-bit bytes
are not legal BAUDOT, with the exception that, if they spell 
'EWIGE BLUMENKRAFT FNORD', in which case the file is considered binary
anyway.

--tim