From owner-freebsd-current@FreeBSD.ORG Tue Oct 1 20:02:21 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8173C507; Tue, 1 Oct 2013 20:02:21 +0000 (UTC) (envelope-from crees@physics.org) Received: from mk-outboundfilter-2.mail.uk.tiscali.com (mk-outboundfilter-2.mail.uk.tiscali.com [212.74.114.38]) by mx1.freebsd.org (Postfix) with ESMTP id C595D25B0; Tue, 1 Oct 2013 20:02:20 +0000 (UTC) X-Trace: 20075414/mk-outboundfilter-2.mail.uk.tiscali.com/PIPEX/$ON_NET_AUTH_ACCEPTED/Talk_Talk_Customer/2.102.106.185/None/crees@physics.org X-SBRS: None X-RemoteIP: 2.102.106.185 X-IP-MAIL-FROM: crees@physics.org X-SMTP-AUTH: bayofrum@uwclub.net X-MUA: Mozilla/5.0 (Windows NT 6.0; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 X-IP-BHB: Once X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AosHANooS1ICZmq5/2dsb2JhbABagwdGA75MgnsKgScXdIIlAQEFOB4iARALDgoJFg8JAwIBAgEnHgYNAQcBAYgGvU2PUQcug3QDnh4Ui0eDJTs X-IPAS-Result: AosHANooS1ICZmq5/2dsb2JhbABagwdGA75MgnsKgScXdIIlAQEFOB4iARALDgoJFg8JAwIBAgEnHgYNAQcBAYgGvU2PUQcug3QDnh4Ui0eDJTs X-IronPort-AV: E=Sophos;i="4.90,1015,1371078000"; d="scan'208";a="20075414" X-IP-Direction: OUT Received: from host-2-102-106-185.as13285.net (HELO pegasus.bayofrum.net) ([2.102.106.185]) by smtp.pipex.tiscali.co.uk with ESMTP; 01 Oct 2013 21:01:10 +0100 Received: from [192.168.1.148] (athene.bayofrum.net [192.168.1.148]) by pegasus.bayofrum.net (Postfix) with ESMTPSA id 918B2320F2; Tue, 1 Oct 2013 21:01:08 +0100 (BST) Message-ID: <524B2A27.7050908@physics.org> Date: Tue, 01 Oct 2013 21:01:43 +0100 From: Chris Rees User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Damian Weber Subject: Re: bug with special bracket expressions in regular expressions References: <5224A693.3000904@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-bayofrum-MailScanner-Information: Please contact the ISP for more information X-bayofrum-MailScanner-ID: 918B2320F2.A6F9C X-bayofrum-MailScanner: Found to be clean X-bayofrum-MailScanner-From: crees@physics.org X-Spam-Status: No X-Mailman-Approved-At: Tue, 01 Oct 2013 20:23:42 +0000 Cc: FreeBSD Current , freebsd-standards@FreeBSD.org, Andriy Gapon X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Oct 2013 20:02:21 -0000 On 02/09/2013 16:09, Damian Weber wrote: > > On Mon, 2 Sep 2013, Andriy Gapon wrote: > >> re_format(7) says: >> There are two special cases? of bracket expressions: the bracket expres? >> sions ?[[:<:]]? and ?[[:>:]]? match the null string at the beginning and >> end of a word respectively. A word is defined as a sequence of word >> characters which is neither preceded nor followed by word characters. A >> word character is an alnum character (as defined by ctype(3)) or an >> underscore. This is an extension, compatible with but not specified by >> IEEE Std 1003.2 (?POSIX.2?), and should be used with caution in software >> intended to be portable to other systems. >> >> However I observe the following: >> $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g' >> xx >> $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g' >> cd1 xx >> >> In my opinion '[[:<:]]' should not affect how the pattern is matched in this case. >> >> Any thoughts, suggestions? > there are two simpler expressions, whose difference I don't understand either > (tested on 8.4-PRERELEASE) > > $ echo "cd0 cd1 xx" | sed 's/cd[0-9] //g' > xx > $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9] //g' > cd1 xx Well, I agree with your analysis, and I think it's certainly a bug. Do you think that the BUGS line in regex(3) should perhaps be extended to "never works properly"?: """ Word-boundary matching does not work properly in multibyte locales. """ [[:<:]] can be replaced by \b in a pcre, which works perfectly fine (of course) echo "this word word should be deleted" | perl -pe 's,\bword ,,g' this should be deleted Chris -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.