From owner-freebsd-hackers@FreeBSD.ORG Tue Jun 24 20:32:21 2008 Return-Path: Delivered-To: hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8001F1065672; Tue, 24 Jun 2008 20:32:21 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from viefep32-int.chello.at (viefep32-int.chello.at [62.179.121.50]) by mx1.freebsd.org (Postfix) with ESMTP id 9A1D78FC0C; Tue, 24 Jun 2008 20:32:20 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from [89.134.207.231] by viefep32-int.chello.at (InterMail vM.7.08.02.02 201-2186-121-104-20070414) with ESMTP id <20080624203218.KLU3605.viefep32-int.chello.at@[89.134.207.231]>; Tue, 24 Jun 2008 22:32:18 +0200 Message-ID: <486159D1.3060704@FreeBSD.org> Date: Tue, 24 Jun 2008 22:32:17 +0200 From: Gabor Kovesdan User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: Andrey Chernov References: <20080617102900.GA46479@nagual.pp.ru> <485798C4.2050605@FreeBSD.org> <20080618055851.GA85018@nagual.pp.ru> <86zlpjduew.fsf@ds4.des.no> <20080618083739.GA87100@nagual.pp.ru> <867icndqv5.fsf@ds4.des.no> <4858DBF6.5070001@bluemedia.pl> <86skvbc9gn.fsf@ds4.des.no> <20080618114917.GB89383@nagual.pp.ru> <485E4C69.1080805@FreeBSD.org> <20080622135343.GA72068@nagual.pp.ru> In-Reply-To: <20080622135343.GA72068@nagual.pp.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: hackers@FreeBSD.org, current@FreeBSD.org Subject: Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jun 2008 20:32:21 -0000 > > 1) You can't convert just whole buffer after fread() since it can be > ended in the middle of multibyte sequence on BUFSIZ edge. Look how GNU > utils do it. > OK, now I haven't thought of this aspect. What about this? #define iswbinary(ch) (!iswspace((ch)) && iswcntrl((ch))) int bin_file(FILE *f) { wint_t ch = L'\0'; size_t i; int ret = 0; if (fseek(f, 0L, SEEK_SET) == -1) return (0); for (i = 0; (i <= BUFSIZ) && (ch != WEOF); i++) { ch = fgetwc(f); if (iswbinary(ch)) { ret = 1; break; } } rewind(f); return (ret); } int mmbin_file(struct mmfile *f) { int i; wchar_t *wbuf; size_t s; if ((s = mbstowcs(NULL, f->base, 0)) == -1) return (0); wbuf = grep_malloc((s + 1) * sizeof(wchar_t)); if (mbstowcs(wbuf, f->base, s) == -1) return (0); /* XXX knows too much about mmf internals */ for (i = 0; i < BUFSIZ && i < f->len; i++) if (iswbinary(wbuf[i])) { free(wbuf); return (1); } free(wbuf); return (0); } This should be ok, right? > 2) Better use iswspace and iswcntrl instead of iswctype. > Ok, changed, thanks. I've also been looking for such functions, but man wctype doesn't mention them. > 3) util.c needs to be fixed in several places too. > Yes, I know, I'm just advancing step by step. The next item will be to fix that word boundary handling. Regards, Gabor