From owner-freebsd-current@FreeBSD.ORG  Mon Sep  2 16:46:27 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id F12D7AE4
 for <freebsd-current@FreeBSD.org>; Mon,  2 Sep 2013 16:46:27 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 4F68D20F4
 for <freebsd-current@FreeBSD.org>; Mon,  2 Sep 2013 16:46:26 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA24314
 for <freebsd-current@FreeBSD.org>; Mon, 02 Sep 2013 19:46:25 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1VGXGi-000NFE-9x
 for freebsd-current@FreeBSD.org; Mon, 02 Sep 2013 19:46:24 +0300
Message-ID: <5224C08E.1070404@FreeBSD.org>
Date: Mon, 02 Sep 2013 19:45:02 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130810 Thunderbird/17.0.8
MIME-Version: 1.0
To: FreeBSD Current <freebsd-current@FreeBSD.org>
Subject: Re: bug with special bracket expressions in regular expressions
References: <5224A693.3000904@FreeBSD.org>
In-Reply-To: <5224A693.3000904@FreeBSD.org>
X-Enigmail-Version: 1.5.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Sep 2013 16:46:28 -0000

on 02/09/2013 17:54 Andriy Gapon said the following:
> 
> re_format(7) says:
>      There are two special cases‡ of bracket expressions: the bracket expres‐
>      sions ‘[[:<:]]’ and ‘[[:>:]]’ match the null string at the beginning and
>      end of a word respectively.  A word is defined as a sequence of word
>      characters which is neither preceded nor followed by word characters.  A
>      word character is an alnum character (as defined by ctype(3)) or an
>      underscore.  This is an extension, compatible with but not specified by
>      IEEE Std 1003.2 (“POSIX.2”), and should be used with caution in software
>      intended to be portable to other systems.
> 
> However I observe the following:
> $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g'
> xx
> $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g'
> cd1 xx
> 
> In my opinion '[[:<:]]' should not affect how the pattern is matched in this case.

It seems that the code works like this:
- first it matches "cd0 " and "removes" it
- then it passes "cd1 xx" for matching with a flag that tells that this is not
  a real start of the string
- thus the matching code
 o knows that this is not a real line start, so it can't match [[:<:]]
   just for that reason
 o it does _not_ know what was the character before the start of the given
   substring, so it can not know if it could match [[:<:]]

So matching fails.
Not sure if this is an internal problem of regex(3) or a problem of how sed(1)
uses regex(3).

-- 
Andriy Gapon