From owner-svn-src-all@FreeBSD.ORG Wed Sep 9 15:16:10 2009 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E2007106566C; Wed, 9 Sep 2009 15:16:10 +0000 (UTC) (envelope-from kientzle@freebsd.org) Received: from kientzle.com (kientzle.com [66.166.149.50]) by mx1.freebsd.org (Postfix) with ESMTP id AEC888FC16; Wed, 9 Sep 2009 15:16:10 +0000 (UTC) Received: (from root@localhost) by kientzle.com (8.14.3/8.14.3) id n89FG9BU006546; Wed, 9 Sep 2009 08:16:09 -0700 (PDT) (envelope-from kientzle@freebsd.org) Received: from dark.x.kientzle.com (fw2.kientzle.com [10.123.1.2]) by kientzle.com with SMTP id 594eueahh7fz2ha79fk9dbp7pi; Wed, 09 Sep 2009 08:16:09 -0700 (PDT) (envelope-from kientzle@freebsd.org) Message-ID: <4AA7C6B9.1020600@freebsd.org> Date: Wed, 09 Sep 2009 08:16:09 -0700 From: Tim Kientzle User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8.1.21) Gecko/20090601 SeaMonkey/1.1.16 MIME-Version: 1.0 To: Andrey Chernov , Roman Divacky , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <200909081555.n88FtDwe052523@svn.freebsd.org> <20090909132616.GA35808@nagual.pp.ru> In-Reply-To: <20090909132616.GA35808@nagual.pp.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: svn commit: r196981 - head/usr.bin/unzip X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Sep 2009 15:16:11 -0000 Andrey Chernov wrote: > On Tue, Sep 08, 2009 at 03:55:13PM +0000, Roman Divacky wrote: >> + * Detect whether this is a text file. ... but libarchive >> + * does not read the central directory, so we have to >> + * guess ... >> + */ >> + if (a_opt && n == 0) { >> + for (p = buffer; p < end; ++p) { >> + if (!isascii((unsigned char)*p)) { >> + text = 0; >> + break; >> + } >> + } >> + } >> + > > If I understand the purpose of this code right, better use > isalnum()+ispunct()+ispace() > combination to count non-ASCII people too. > Also setlocale() call must be added to the main() for that. Personally, I would rather see unzip just ignore the -a option entirely, but I suppose that's probably infeasible. Since this is only to support -a (which does end-of-line conversions), I would suggest using a rather different set of heuristics that examines end-of-line sequences and control characters only: * Any byte value <31 that's not CR or LF: not text * LF not preceded by CR: not text * CR not followed by LF: not text (or at least, not DOS text) * Otherwise, it is text. At a minimum, this dodges the locale issue. Someday, I'll get around to filling in the seek support that libarchive needs for reading central directories, then unzip can look at the "text file" bit (which is no more reliable than anything described above) and this code can just go away. Tim