Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Sep 2011 15:36:24 +0930
From:      Wayne Sierke <ws@au.dyndns.ws>
To:        grarpamp <grarpamp@gmail.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Regex Wizards
Message-ID:  <1317103584.2326.48.camel@predator-ii.buffyverse>
In-Reply-To: <CAD2Ti29Uvz6tBp60SYnD-5bJ8Jf=ThbVG5UUU21NWmmqOrO5SA@mail.gmail.com>
References:  <CAD2Ti29Uvz6tBp60SYnD-5bJ8Jf=ThbVG5UUU21NWmmqOrO5SA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2011-09-26 at 22:02 -0400, grarpamp wrote:
> Under the ERE implementation in RELENG_8, I'm having
> trouble figuring out how to group and backreference this.
> 
> Given a line, where:
>  If AAA is present, CCC will be too, and B may appear in between.
>  If AAA is not present, neither CCC or B will be present.
>  DDDD is always present.
>  Junk may be present.
>  Match good lines and ouput in chunks.
> 
> echo junkAAAABCCCDDDDjunk | \
> 
> This works as expected:
> sed -E -n 's,^.*(AAAB?CCC)(DDDD).*$,1 \1 2 \2,p'
> 1 AAABCCC 2 DDDD
> 
> But making the leading bits optional per spec does not work:
> sed -E -n 's,^.*(AAAB?CCC)?(DDDD).*$,1 \1 2 \2,p'
> 1  2 DDDD
> 
> Nor does adding the usual grouping parens:
> sed -E -n 's,^.*((AAAB?CCC)?)(DDDD).*$,1 \1 2 \2,p'
> 1 2
> 
> How do I group off the leading bits?
> Or is this a limitation of ERE's?
> Or a bug?
> Thanks.

I believe that the problem is the greediness of the leading '.*'. With
the first grouping optional, its contents are consumed into the '.*'.

This seems to work:

sed -E -n -e '/AAAB?CCC/!s,.*(DDDD).*,1 \1,p' -e 's,.*(AAAB?CCC)(DDDD).*,1 \1 2 \2,p'

%echo junkAABCCCDDDDjunk | sed ...
1 DDDD

%echo junkAAAABCCCDDDDjunk | sed ...
1 AAABCCC 2 DDDD

%echo junkAAAACCCDDDDjunk | sed ...
1 AAACCC 2 DDDD

%echo junkAAAABCCDDDDjunk | sed ...
1 DDDD


Wayne





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1317103584.2326.48.camel>