Date: Mon, 02 Sep 2013 19:45:02 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: FreeBSD Current <freebsd-current@FreeBSD.org> Subject: Re: bug with special bracket expressions in regular expressions Message-ID: <5224C08E.1070404@FreeBSD.org> In-Reply-To: <5224A693.3000904@FreeBSD.org> References: <5224A693.3000904@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
on 02/09/2013 17:54 Andriy Gapon said the following: > > re_format(7) says: > There are two special cases‡ of bracket expressions: the bracket expres‐ > sions ‘[[:<:]]’ and ‘[[:>:]]’ match the null string at the beginning and > end of a word respectively. A word is defined as a sequence of word > characters which is neither preceded nor followed by word characters. A > word character is an alnum character (as defined by ctype(3)) or an > underscore. This is an extension, compatible with but not specified by > IEEE Std 1003.2 (“POSIX.2”), and should be used with caution in software > intended to be portable to other systems. > > However I observe the following: > $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g' > xx > $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g' > cd1 xx > > In my opinion '[[:<:]]' should not affect how the pattern is matched in this case. It seems that the code works like this: - first it matches "cd0 " and "removes" it - then it passes "cd1 xx" for matching with a flag that tells that this is not a real start of the string - thus the matching code o knows that this is not a real line start, so it can't match [[:<:]] just for that reason o it does _not_ know what was the character before the start of the given substring, so it can not know if it could match [[:<:]] So matching fails. Not sure if this is an internal problem of regex(3) or a problem of how sed(1) uses regex(3). -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5224C08E.1070404>