Date: Mon, 23 Aug 2010 02:00:14 +0930 From: Wayne Sierke <ws@au.dyndns.ws> To: Paul Schmehl <pschmehl_lists@tx.rr.com> Cc: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: Any awk gurus on the list? Message-ID: <1282494614.58781.15759.camel@predator-ii.buffyverse> In-Reply-To: <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu> References: <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2010-08-20 at 12:12 -0500, Paul Schmehl wrote: > I'm trying to figure out how to use awk to parse values from a string of > unknown length and unknown fields using awk, from within a shell script, and > write those values to a file in a certain order. > > Here's a typical string that I want to parse: > > alert ip > [50.0.0.0/8,100.0.0.0/6,104.0.0.0/5,112.0.0.0/6,173.0.0.0/8,174.0.0.0/7,176.0.0.0/5,184.0.0.0/6] > any -> $HOME_NET any (msg:"ET POLICY Reserved IP Space Traffic - Bogon Nets 2"; > classtype:bad-unknown; reference:url,www.cymru.com/Documents/bogon-list.html; > threshold: type limit, track by_src, count 1, seconds 360; sid:2002750; rev:10;) > > What I want to do is extract the value after "sid:", the value after > "reference:" and the value after "msg:" and insert them into a file that would > look like this: > > 2002750 || "ET POLICY Reserved IP Space Traffic - Bogon Nets 2" || > url,www.cymru.com/Documents/bogon-list.html Probably not a complete solution for your problem domain but you might glean an idea or two from this: awk 'BEGIN {FS="\\(|; *"} /#/ {next} {for(i=1;i<=NF;i++) print $i}' mtc.rules.test | awk 'BEGIN {FS=":" ; OFS=" || "} $1 == "sid" {sid=$2} $1 == "msg" {msg=$2} $1 == "reference" {ref=$2} $1 == ")" {print sid,msg,ref}' > Yes, I know I could do this easily in Perl. I'm doing this to try and improve > my understanding of awk. I *think* I've figured out that the right approach is > to use an associative array, and this command: No need for an array unless you want to retain the records for later processing. For simple record-by-record processing scalar vars suffice. > # awk '!/#/ { for (i=1; i<=NF; i++) { if ( $i ~ /sid/) {mtcmsg[sid]=$i; print > mtcmsg[sid]}}}' < /usr/local/etc/snort/rules/mtc.rules.test A couple of things to note: $i ~ /sid/ : will match the string "sid" anywhere within the field - either use ~ /^sid$/ or == "sid" for exact matching mtcmsg[sid] : references the scalar var named "sid" (which is empty => mtcmsg[""]) Of course, if you choose to you can also just execute some run-of-the-mill regex matches and string manipulations in awk: awk '!/#/ {s1=match($0, "sid:[^;]*"); if (s1) sid=substr($0, RSTART+4, RLENGTH-4); s2=match($0, "msg:[^;]*"); if (s2) msg=substr($0, RSTART+4, RLENGTH-4); s3=match($0, "reference:[^;]*"); if (s3) ref=substr($0, RSTART+10, RLENGTH-10); if (s1*s2*s3) print sid" || "msg" || "ref}' mtc.rules.test but that lacks any real awk-ness. Wayne
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1282494614.58781.15759.camel>