FreeBSD Mail Archives

Date:      Tue, 04 Jan 2011 16:15:01 -0800
From:      Devin Teske <dteske@vicor.com>
To:        RW <rwmaillists@googlemail.com>
Cc:        Devin Teske <dteske@vicor.com>, freebsd-questions@freebsd.org
Subject:   Re: a perl question
Message-ID:  <1294186501.26849.73.camel@localhost.localdomain>
In-Reply-To: <20110104221242.35a1710f@gumby.homeunix.com>
References:  <117654.42578.qm@web121409.mail.ne1.yahoo.com> <AANLkTinEksoXQAA4ZAziE59h%2BLRTxSgSy2WZy6UaQne%2B@mail.gmail.com> <4D231CB7.2060902@teambox.fr> <86pqsc3774.fsf@red.stonehenge.com> <16ABD485-4B26-47C4-AD19-6B84AB497874@vicor.com> <20110104221242.35a1710f@gumby.homeunix.com>

On Tue, 2011-01-04 at 22:12 +0000, RW wrote:
> On Tue, 4 Jan 2011 10:01:47 -0800
> Devin Teske <dteske@vicor.com> wrote:
> 
> > 
> > On Jan 4, 2011, at 9:33 AM, Randal L. Schwartz wrote:
> > 
> > >>>>>> "Patrick" == Patrick Bihan-Faou
> > >>>>>> <patrick.bihan-faou@teambox.fr> writes:
> > > 
> > > Patrick> cat asdf.txt | grep -v XYZ | grep -v bla
> > > 
> > > And yet, you still have the "Useless Use of Cat".
> > 
> > I know I'm joining the party late, but... what about:
> > 
> > grep -Ev '(XYZ|bla)' asdf.txt
> > 
> > or
> > 
> > awk '!/XYZ/ && !/bla/ {print}' asdf.txt
> > 
> > ok... end useless contribution.
> 
> It's odd that people seem to be taking bla-bla so literally, when it's
> clearly a place holder for arbitary text. 

Maybe because the OP should have said:

"How do I get the text between [XYZ] and [/XYZ]"

A demarcing field-search is different than a pruning line-search.

This is what the OP was looking for:

awk -v tag=XYZ '
BEGIN { buf = "" }
$0 ~ "\\["tag"\\]", $0 ~ "\\[/"tag"\\]" \
{
	if ( match($0, "\\[/"tag"\\]") ) \
	{
		buf = buf substr($0, 0, RSTART - 1)
		sub(".*\\["tag"\\]", "", buf)
		sub(/^\n*/, "", buf)
		sub(/\n*$/, "", buf)
		print buf
		buf = ""
		next
	} else
		buf = buf $0"\n"
}
END { if ( length(buf) ) print buf }' asdf.txt

or, if you would prefer to have it all on one line:

awk -v tag=XYZ 'BEGIN { buf = "" } $0 ~ "\\["tag"\\]", $0 ~ "\\[/"tag"\
\]" { if ( match($0, "\\[/"tag"\\]") ) { buf = buf substr($0, 0, RSTART
- 1); sub(".*\\["tag"\\]", "", buf); sub(/^\n*/, "", buf); sub(/\n*$/,
"", buf); print buf; buf = ""; next } else buf = buf $0"\n" } END { if
( length(buf) ) print buf }' asdf.txt

or, if you would like it as an alias:

for bash...

alias between_xyz='awk -v tag=XYZ '\''BEGIN { buf = "" } $0 ~ "\\["tag"\
\]", $0 ~ "\\[/"tag"\\]" { if ( match($0, "\\[/"tag"\\]") ) { buf = buf
substr($0, 0, RSTART - 1); sub(".*\\["tag"\\]", "", buf); sub(/^\n*/,
"", buf); sub(/\n*$/, "", buf); print buf; buf = ""; next } else buf =
buf $0"\n" } END { if ( length(buf) ) print buf }'\'

for csh:

alias between_xyz 'awk -v tag=XYZ '\''BEGIN { buf = "" } $0 ~ "\\["tag"\
\]", $0 ~ "\\[/"tag"\\]" { if ( match($0, "\\[/"tag"\\]") ) { buf = buf
substr($0, 0, RSTART - 1); sub(".*\\["tag"\\]", "", buf); sub(/^\n*/,
"", buf); sub(/\n*$/, "", buf); print buf; buf = ""; next } else buf =
buf $0"\n" } END { if ( length(buf) ) print buf }'\'

Usage:

between_xyz asdf.txt

Of course, this can even be improved upon further...

As a shell function:

# between $what $file [$file ...]
#
# Split out lines between [$what] and [/$what] using awk(1).
#
between()
{
	awk -v tag="$1" '
		BEGIN { buf = "" }
		$0 ~ "\\["tag"\\]", $0 ~ "\\[/"tag"\\]" \
		{
			if ( match($0, "\\[/"tag"\\]") ) \
			{
				buf = buf substr($0, 0, RSTART - 1)
				sub(".*\\["tag"\\]", "", buf)
				sub(/^\n*/, "", buf)
				sub(/\n*$/, "", buf)
				print buf
				buf = ""
				next
			} else
				buf = buf $0"\n"
		}
		END { if ( length(buf) ) print buf }
	' "$@"
}

Or, for those csh users, how about a fancy alias?:

alias between 'awk -v tag="\!^" '\''BEGIN { buf = "" } $0 ~ "\\["tag"\
\]", $0 ~ "\\[/"tag"\\]" { if ( match($0, "\\[/"tag"\\]") ) { buf = buf
substr($0, 0, RSTART - 1); sub(".*\\["tag"\\]", "", buf); sub(/^\n*/,
"", buf); sub(/\n*$/, "", buf); print buf; buf = ""; next } else buf =
buf $0"\n" } END { if ( length(buf) ) print buf }'\'' \!:2-$'

Usage:

between XYZ asdf.txt

AND... (lol)... last but not least...

If you want to have case-insensitivity, you'll have to change:

	BEGIN { buf = "" }

to:

	BEGIN { IGNORECASE = 1; buf = "" }


NOTE: FYI, when you need to grab text that spans multiple lines between
two field delimiters, C/C++ is superior to perl/awk which excel at line-
based I/O versus block I/O. However, I conclude that the OP wanted
something that was executable from the command-line (considering that
he/she actually gave a basic construct for a perl one-liner (which might
as well be an awk one-liner considering FreeBSD doesn't come with Perl
in the base anymore and thus not every machine is guaranteed to have
perl -- while every machine has awk).

ANOTHER NOTE: The above is not intended to start a language flame-war...
just an observation. If you have observed an easy _and_ convenient
method that _does_ use perl/awk (in a manner more efficient than the
above), I'm sure the OP/list would love it. Otherwise, I really do view
this operation as being easier in C using functions like strchr,
strrchr, etc.
-- 
Cheers,
Devin Teske

-> CONTACT INFORMATION <-
Business Solutions Consultant II
FIS - fisglobal.com
510-735-5650 Mobile
510-621-2038 Office
510-621-2020 Office Fax
909-477-4578 Home/Fax
devin.teske@fisglobal.com

-> LEGAL DISCLAIMER <-
This message  contains confidential  and proprietary  information
of the sender,  and is intended only for the person(s) to whom it
is addressed. Any use, distribution, copying or disclosure by any
other person  is strictly prohibited.  If you have  received this
message in error,  please notify  the e-mail sender  immediately,
and delete the original message without making a copy.

-> FUN STUFF <-
-----BEGIN GEEK CODE BLOCK-----
Version 3.1
GAT/CS d(+) s: a- C++(++++) UB++++$ P++(++++) L++(++++) !E--- W++ N? o? K- w O
M+ V- PS+ PE Y+ PGP- t(+) 5? X+(++) R>++ tv(+) b+(++) DI+(++) D(+) G+>++ e>+ h
r>++ y+ 
------END GEEK CODE BLOCK------
http://www.geekcode.com/

-> END TRANSMISSION <-

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1294186501.26849.73.camel>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation