From owner-freebsd-questions@FreeBSD.ORG Sun Aug 22 16:45:36 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EFDCA10656A3 for ; Sun, 22 Aug 2010 16:45:36 +0000 (UTC) (envelope-from ws@au.dyndns.ws) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by mx1.freebsd.org (Postfix) with ESMTP id 797B48FC08 for ; Sun, 22 Aug 2010 16:45:36 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApEBAGbtcEyWZWdv/2dsb2JhbAAH2TqFNwSEMw Received: from ppp103-111.static.internode.on.net (HELO [192.168.1.144]) ([150.101.103.111]) by ipmail07.adl2.internode.on.net with ESMTP; 23 Aug 2010 02:00:18 +0930 From: Wayne Sierke To: Paul Schmehl In-Reply-To: <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu> References: <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu> Content-Type: text/plain; charset="ASCII" Date: Mon, 23 Aug 2010 02:00:14 +0930 Message-ID: <1282494614.58781.15759.camel@predator-ii.buffyverse> Mime-Version: 1.0 X-Mailer: Evolution 2.30.2 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: FreeBSD Questions Subject: Re: Any awk gurus on the list? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Aug 2010 16:45:37 -0000 On Fri, 2010-08-20 at 12:12 -0500, Paul Schmehl wrote: > I'm trying to figure out how to use awk to parse values from a string of > unknown length and unknown fields using awk, from within a shell script, and > write those values to a file in a certain order. > > Here's a typical string that I want to parse: > > alert ip > [50.0.0.0/8,100.0.0.0/6,104.0.0.0/5,112.0.0.0/6,173.0.0.0/8,174.0.0.0/7,176.0.0.0/5,184.0.0.0/6] > any -> $HOME_NET any (msg:"ET POLICY Reserved IP Space Traffic - Bogon Nets 2"; > classtype:bad-unknown; reference:url,www.cymru.com/Documents/bogon-list.html; > threshold: type limit, track by_src, count 1, seconds 360; sid:2002750; rev:10;) > > What I want to do is extract the value after "sid:", the value after > "reference:" and the value after "msg:" and insert them into a file that would > look like this: > > 2002750 || "ET POLICY Reserved IP Space Traffic - Bogon Nets 2" || > url,www.cymru.com/Documents/bogon-list.html Probably not a complete solution for your problem domain but you might glean an idea or two from this: awk 'BEGIN {FS="\\(|; *"} /#/ {next} {for(i=1;i<=NF;i++) print $i}' mtc.rules.test | awk 'BEGIN {FS=":" ; OFS=" || "} $1 == "sid" {sid=$2} $1 == "msg" {msg=$2} $1 == "reference" {ref=$2} $1 == ")" {print sid,msg,ref}' > Yes, I know I could do this easily in Perl. I'm doing this to try and improve > my understanding of awk. I *think* I've figured out that the right approach is > to use an associative array, and this command: No need for an array unless you want to retain the records for later processing. For simple record-by-record processing scalar vars suffice. > # awk '!/#/ { for (i=1; i<=NF; i++) { if ( $i ~ /sid/) {mtcmsg[sid]=$i; print > mtcmsg[sid]}}}' < /usr/local/etc/snort/rules/mtc.rules.test A couple of things to note: $i ~ /sid/ : will match the string "sid" anywhere within the field - either use ~ /^sid$/ or == "sid" for exact matching mtcmsg[sid] : references the scalar var named "sid" (which is empty => mtcmsg[""]) Of course, if you choose to you can also just execute some run-of-the-mill regex matches and string manipulations in awk: awk '!/#/ {s1=match($0, "sid:[^;]*"); if (s1) sid=substr($0, RSTART+4, RLENGTH-4); s2=match($0, "msg:[^;]*"); if (s2) msg=substr($0, RSTART+4, RLENGTH-4); s3=match($0, "reference:[^;]*"); if (s3) ref=substr($0, RSTART+10, RLENGTH-10); if (s1*s2*s3) print sid" || "msg" || "ref}' mtc.rules.test but that lacks any real awk-ness. Wayne