From owner-freebsd-questions@FreeBSD.ORG Tue Sep 27 06:21:49 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBC591065670 for ; Tue, 27 Sep 2011 06:21:49 +0000 (UTC) (envelope-from ws@au.dyndns.ws) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [150.101.137.141]) by mx1.freebsd.org (Postfix) with ESMTP id 4DEEC8FC1B for ; Tue, 27 Sep 2011 06:21:48 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEACplgU6WZWdv/2dsb2JhbABCqG2BUwEBBXkQCxguVwbBWIcLBIdwkQSMIg Received: from ppp103-111.static.internode.on.net (HELO lillith-iv.ovirt.dyndns.ws) ([150.101.103.111]) by ipmail04.adl6.internode.on.net with ESMTP; 27 Sep 2011 15:36:33 +0930 X-Envelope-From: ws@au.dyndns.ws X-Envelope-To: freebsd-questions@freebsd.org Received: from [192.168.1.144] (ws@predator-ii.buffyverse [192.168.1.144]) by lillith-iv.ovirt.dyndns.ws (8.14.4/8.14.4) with ESMTP id p8R66Oau009120; Tue, 27 Sep 2011 15:36:25 +0930 (CST) (envelope-from ws@au.dyndns.ws) From: Wayne Sierke To: grarpamp In-Reply-To: References: Content-Type: text/plain; charset="ASCII" Date: Tue, 27 Sep 2011 15:36:24 +0930 Message-ID: <1317103584.2326.48.camel@predator-ii.buffyverse> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 192.168.1.142 X-Scanned-By: SpamAssassin 3.002005(2008-06-10) X-Scanned-By: ClamAV X-Spam-Score: -4.337 () ALL_TRUSTED,AWL,BAYES_00 Cc: freebsd-questions@freebsd.org Subject: Re: Regex Wizards X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ws@au.dyndns.ws List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Sep 2011 06:21:49 -0000 On Mon, 2011-09-26 at 22:02 -0400, grarpamp wrote: > Under the ERE implementation in RELENG_8, I'm having > trouble figuring out how to group and backreference this. > > Given a line, where: > If AAA is present, CCC will be too, and B may appear in between. > If AAA is not present, neither CCC or B will be present. > DDDD is always present. > Junk may be present. > Match good lines and ouput in chunks. > > echo junkAAAABCCCDDDDjunk | \ > > This works as expected: > sed -E -n 's,^.*(AAAB?CCC)(DDDD).*$,1 \1 2 \2,p' > 1 AAABCCC 2 DDDD > > But making the leading bits optional per spec does not work: > sed -E -n 's,^.*(AAAB?CCC)?(DDDD).*$,1 \1 2 \2,p' > 1 2 DDDD > > Nor does adding the usual grouping parens: > sed -E -n 's,^.*((AAAB?CCC)?)(DDDD).*$,1 \1 2 \2,p' > 1 2 > > How do I group off the leading bits? > Or is this a limitation of ERE's? > Or a bug? > Thanks. I believe that the problem is the greediness of the leading '.*'. With the first grouping optional, its contents are consumed into the '.*'. This seems to work: sed -E -n -e '/AAAB?CCC/!s,.*(DDDD).*,1 \1,p' -e 's,.*(AAAB?CCC)(DDDD).*,1 \1 2 \2,p' %echo junkAABCCCDDDDjunk | sed ... 1 DDDD %echo junkAAAABCCCDDDDjunk | sed ... 1 AAABCCC 2 DDDD %echo junkAAAACCCDDDDjunk | sed ... 1 AAACCC 2 DDDD %echo junkAAAABCCDDDDjunk | sed ... 1 DDDD Wayne