From owner-freebsd-current@FreeBSD.ORG Mon Sep 2 17:52:20 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A680B6F7; Mon, 2 Sep 2013 17:52:20 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com [IPv6:2a00:1450:400c:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 180422632; Mon, 2 Sep 2013 17:52:19 +0000 (UTC) Received: by mail-we0-f173.google.com with SMTP id x55so46665wes.32 for ; Mon, 02 Sep 2013 10:52:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=deeeg7WpEbLBOkYTjPKUUSwKskE4x2bMPqqRhBYNzZ0=; b=xILJFsLG3cgR5eg93KslRl9EUuMQs/eVfMWvMFQbZJNaAIolnbipNrCA4skKpgcdS2 kMrIEw4Z0jiUDzLN3JiGel+LM/SvE10BKYg6f2vx+mxP3TLSbRZrTOkEJmbH9VWERtU4 O6lkm7I+L9XC4u9uNbrJgtHJUiFw8x4d+gBBaaB06xzr3JmDpoyiXWh7bvFnxPkU7PrR 53HhKfmxpI1oQw0diX1+Mt3pMNbsTPhoXkUjMLrQ/RO59eNetH6eOTTONFLP+jobwh5A LTu0GG9UmO0dDmCH0tQXAP3z7dmENyLI0BNaaxkeJLbfV9oTcWvRQfvMNFt6TNiylOpF V9EQ== MIME-Version: 1.0 X-Received: by 10.180.207.84 with SMTP id lu20mr14666983wic.50.1378144338373; Mon, 02 Sep 2013 10:52:18 -0700 (PDT) Received: by 10.216.121.197 with HTTP; Mon, 2 Sep 2013 10:52:18 -0700 (PDT) In-Reply-To: <5224C08E.1070404@FreeBSD.org> References: <5224A693.3000904@FreeBSD.org> <5224C08E.1070404@FreeBSD.org> Date: Mon, 2 Sep 2013 20:52:18 +0300 Message-ID: Subject: Re: bug with special bracket expressions in regular expressions From: Kimmo Paasiala To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD Current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 17:52:20 -0000 On Mon, Sep 2, 2013 at 7:45 PM, Andriy Gapon wrote: > on 02/09/2013 17:54 Andriy Gapon said the following: >> >> re_format(7) says: >> There are two special cases=E2=80=A1 of bracket expressions: the br= acket expres=E2=80=90 >> sions =E2=80=98[[:<:]]=E2=80=99 and =E2=80=98[[:>:]]=E2=80=99 match= the null string at the beginning and >> end of a word respectively. A word is defined as a sequence of wor= d >> characters which is neither preceded nor followed by word character= s. A >> word character is an alnum character (as defined by ctype(3)) or an >> underscore. This is an extension, compatible with but not specifie= d by >> IEEE Std 1003.2 (=E2=80=9CPOSIX.2=E2=80=9D), and should be used wit= h caution in software >> intended to be portable to other systems. >> >> However I observe the following: >> $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g' >> xx >> $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g' >> cd1 xx >> >> In my opinion '[[:<:]]' should not affect how the pattern is matched in = this case. > > It seems that the code works like this: > - first it matches "cd0 " and "removes" it > - then it passes "cd1 xx" for matching with a flag that tells that this i= s not > a real start of the string > - thus the matching code > o knows that this is not a real line start, so it can't match [[:<:]] > just for that reason > o it does _not_ know what was the character before the start of the give= n > substring, so it can not know if it could match [[:<:]] > > So matching fails. > Not sure if this is an internal problem of regex(3) or a problem of how s= ed(1) > uses regex(3). > > -- > Andriy Gapon In my opinion this is a bug. The [[:<:]] operator is said to match the empty string at the beginning of a word with no mention that the word has to be at the beginning of the whole string that is matched. OS X version of sed(1) works differently: $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g' xx $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g' xx -Kimmo