Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 09 Sep 2009 08:16:09 -0700
From:      Tim Kientzle <kientzle@freebsd.org>
To:        Andrey Chernov <ache@nagual.pp.ru>, Roman Divacky <rdivacky@freebsd.org>,  src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r196981 - head/usr.bin/unzip
Message-ID:  <4AA7C6B9.1020600@freebsd.org>
In-Reply-To: <20090909132616.GA35808@nagual.pp.ru>
References:  <200909081555.n88FtDwe052523@svn.freebsd.org> <20090909132616.GA35808@nagual.pp.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Andrey Chernov wrote:
> On Tue, Sep 08, 2009 at 03:55:13PM +0000, Roman Divacky wrote:
>> +	 * Detect whether this is a text file.  ...  but libarchive
>> +	 * does not read the central directory, so we have to
>> +	 * guess ...
>> +	 */
>> +	if (a_opt && n == 0) {
>> +		for (p = buffer; p < end; ++p) {
>> +			if (!isascii((unsigned char)*p)) {
>> +				text = 0;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
> 
> If I understand the purpose of this code right, better use
> isalnum()+ispunct()+ispace()
> combination to count non-ASCII people too.
> Also setlocale() call must be added to the main() for that.

Personally, I would rather see unzip just ignore the -a
option entirely, but I suppose that's probably infeasible.

Since this is only to support -a (which does end-of-line
conversions), I would suggest using a rather different
set of heuristics that examines end-of-line sequences
and control characters only:
   * Any byte value <31 that's not CR or LF: not text
   * LF not preceded by CR: not text
   * CR not followed by LF: not text (or at least, not DOS text)
   * Otherwise, it is text.

At a minimum, this dodges the locale issue.

Someday, I'll get around to filling in the seek support
that libarchive needs for reading central directories,
then unzip can look at the "text file" bit (which
is no more reliable than anything described above) and
this code can just go away.

Tim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4AA7C6B9.1020600>