Date: Tue, 04 Jan 2011 16:15:01 -0800 From: Devin Teske <dteske@vicor.com> To: RW <rwmaillists@googlemail.com> Cc: Devin Teske <dteske@vicor.com>, freebsd-questions@freebsd.org Subject: Re: a perl question Message-ID: <1294186501.26849.73.camel@localhost.localdomain> In-Reply-To: <20110104221242.35a1710f@gumby.homeunix.com> References: <117654.42578.qm@web121409.mail.ne1.yahoo.com> <AANLkTinEksoXQAA4ZAziE59h%2BLRTxSgSy2WZy6UaQne%2B@mail.gmail.com> <4D231CB7.2060902@teambox.fr> <86pqsc3774.fsf@red.stonehenge.com> <16ABD485-4B26-47C4-AD19-6B84AB497874@vicor.com> <20110104221242.35a1710f@gumby.homeunix.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2011-01-04 at 22:12 +0000, RW wrote: > On Tue, 4 Jan 2011 10:01:47 -0800 > Devin Teske <dteske@vicor.com> wrote: > > > > > On Jan 4, 2011, at 9:33 AM, Randal L. Schwartz wrote: > > > > >>>>>> "Patrick" == Patrick Bihan-Faou > > >>>>>> <patrick.bihan-faou@teambox.fr> writes: > > > > > > Patrick> cat asdf.txt | grep -v XYZ | grep -v bla > > > > > > And yet, you still have the "Useless Use of Cat". > > > > I know I'm joining the party late, but... what about: > > > > grep -Ev '(XYZ|bla)' asdf.txt > > > > or > > > > awk '!/XYZ/ && !/bla/ {print}' asdf.txt > > > > ok... end useless contribution. > > It's odd that people seem to be taking bla-bla so literally, when it's > clearly a place holder for arbitary text. Maybe because the OP should have said: "How do I get the text between [XYZ] and [/XYZ]" A demarcing field-search is different than a pruning line-search. This is what the OP was looking for: awk -v tag=XYZ ' BEGIN { buf = "" } $0 ~ "\\["tag"\\]", $0 ~ "\\[/"tag"\\]" \ { if ( match($0, "\\[/"tag"\\]") ) \ { buf = buf substr($0, 0, RSTART - 1) sub(".*\\["tag"\\]", "", buf) sub(/^\n*/, "", buf) sub(/\n*$/, "", buf) print buf buf = "" next } else buf = buf $0"\n" } END { if ( length(buf) ) print buf }' asdf.txt or, if you would prefer to have it all on one line: awk -v tag=XYZ 'BEGIN { buf = "" } $0 ~ "\\["tag"\\]", $0 ~ "\\[/"tag"\ \]" { if ( match($0, "\\[/"tag"\\]") ) { buf = buf substr($0, 0, RSTART - 1); sub(".*\\["tag"\\]", "", buf); sub(/^\n*/, "", buf); sub(/\n*$/, "", buf); print buf; buf = ""; next } else buf = buf $0"\n" } END { if ( length(buf) ) print buf }' asdf.txt or, if you would like it as an alias: for bash... alias between_xyz='awk -v tag=XYZ '\''BEGIN { buf = "" } $0 ~ "\\["tag"\ \]", $0 ~ "\\[/"tag"\\]" { if ( match($0, "\\[/"tag"\\]") ) { buf = buf substr($0, 0, RSTART - 1); sub(".*\\["tag"\\]", "", buf); sub(/^\n*/, "", buf); sub(/\n*$/, "", buf); print buf; buf = ""; next } else buf = buf $0"\n" } END { if ( length(buf) ) print buf }'\' for csh: alias between_xyz 'awk -v tag=XYZ '\''BEGIN { buf = "" } $0 ~ "\\["tag"\ \]", $0 ~ "\\[/"tag"\\]" { if ( match($0, "\\[/"tag"\\]") ) { buf = buf substr($0, 0, RSTART - 1); sub(".*\\["tag"\\]", "", buf); sub(/^\n*/, "", buf); sub(/\n*$/, "", buf); print buf; buf = ""; next } else buf = buf $0"\n" } END { if ( length(buf) ) print buf }'\' Usage: between_xyz asdf.txt Of course, this can even be improved upon further... As a shell function: # between $what $file [$file ...] # # Split out lines between [$what] and [/$what] using awk(1). # between() { awk -v tag="$1" ' BEGIN { buf = "" } $0 ~ "\\["tag"\\]", $0 ~ "\\[/"tag"\\]" \ { if ( match($0, "\\[/"tag"\\]") ) \ { buf = buf substr($0, 0, RSTART - 1) sub(".*\\["tag"\\]", "", buf) sub(/^\n*/, "", buf) sub(/\n*$/, "", buf) print buf buf = "" next } else buf = buf $0"\n" } END { if ( length(buf) ) print buf } ' "$@" } Or, for those csh users, how about a fancy alias?: alias between 'awk -v tag="\!^" '\''BEGIN { buf = "" } $0 ~ "\\["tag"\ \]", $0 ~ "\\[/"tag"\\]" { if ( match($0, "\\[/"tag"\\]") ) { buf = buf substr($0, 0, RSTART - 1); sub(".*\\["tag"\\]", "", buf); sub(/^\n*/, "", buf); sub(/\n*$/, "", buf); print buf; buf = ""; next } else buf = buf $0"\n" } END { if ( length(buf) ) print buf }'\'' \!:2-$' Usage: between XYZ asdf.txt AND... (lol)... last but not least... If you want to have case-insensitivity, you'll have to change: BEGIN { buf = "" } to: BEGIN { IGNORECASE = 1; buf = "" } NOTE: FYI, when you need to grab text that spans multiple lines between two field delimiters, C/C++ is superior to perl/awk which excel at line- based I/O versus block I/O. However, I conclude that the OP wanted something that was executable from the command-line (considering that he/she actually gave a basic construct for a perl one-liner (which might as well be an awk one-liner considering FreeBSD doesn't come with Perl in the base anymore and thus not every machine is guaranteed to have perl -- while every machine has awk). ANOTHER NOTE: The above is not intended to start a language flame-war... just an observation. If you have observed an easy _and_ convenient method that _does_ use perl/awk (in a manner more efficient than the above), I'm sure the OP/list would love it. Otherwise, I really do view this operation as being easier in C using functions like strchr, strrchr, etc. -- Cheers, Devin Teske -> CONTACT INFORMATION <- Business Solutions Consultant II FIS - fisglobal.com 510-735-5650 Mobile 510-621-2038 Office 510-621-2020 Office Fax 909-477-4578 Home/Fax devin.teske@fisglobal.com -> LEGAL DISCLAIMER <- This message contains confidential and proprietary information of the sender, and is intended only for the person(s) to whom it is addressed. Any use, distribution, copying or disclosure by any other person is strictly prohibited. If you have received this message in error, please notify the e-mail sender immediately, and delete the original message without making a copy. -> FUN STUFF <- -----BEGIN GEEK CODE BLOCK----- Version 3.1 GAT/CS d(+) s: a- C++(++++) UB++++$ P++(++++) L++(++++) !E--- W++ N? o? K- w O M+ V- PS+ PE Y+ PGP- t(+) 5? X+(++) R>++ tv(+) b+(++) DI+(++) D(+) G+>++ e>+ h r>++ y+ ------END GEEK CODE BLOCK------ http://www.geekcode.com/ -> END TRANSMISSION <-
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1294186501.26849.73.camel>