From owner-freebsd-questions@FreeBSD.ORG  Sun Aug 22 16:45:36 2010
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EFDCA10656A3
	for <freebsd-questions@freebsd.org>;
	Sun, 22 Aug 2010 16:45:36 +0000 (UTC) (envelope-from ws@au.dyndns.ws)
Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net
	[150.101.137.131])
	by mx1.freebsd.org (Postfix) with ESMTP id 797B48FC08
	for <freebsd-questions@freebsd.org>;
	Sun, 22 Aug 2010 16:45:36 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApEBAGbtcEyWZWdv/2dsb2JhbAAH2TqFNwSEMw
Received: from ppp103-111.static.internode.on.net (HELO [192.168.1.144])
	([150.101.103.111])
	by ipmail07.adl2.internode.on.net with ESMTP; 23 Aug 2010 02:00:18 +0930
From: Wayne Sierke <ws@au.dyndns.ws>
To: Paul Schmehl <pschmehl_lists@tx.rr.com>
In-Reply-To: <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu>
References: <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu>
Content-Type: text/plain; charset="ASCII"
Date: Mon, 23 Aug 2010 02:00:14 +0930
Message-ID: <1282494614.58781.15759.camel@predator-ii.buffyverse>
Mime-Version: 1.0
X-Mailer: Evolution 2.30.2 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Questions <freebsd-questions@freebsd.org>
Subject: Re: Any awk gurus on the list?
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Aug 2010 16:45:37 -0000

On Fri, 2010-08-20 at 12:12 -0500, Paul Schmehl wrote:
> I'm trying to figure out how to use awk to parse values from a string of 
> unknown length and unknown fields using awk, from within a shell script, and 
> write those values to a file in a certain order.
> 
> Here's a typical string that I want to parse:
> 
> alert ip 
> [50.0.0.0/8,100.0.0.0/6,104.0.0.0/5,112.0.0.0/6,173.0.0.0/8,174.0.0.0/7,176.0.0.0/5,184.0.0.0/6] 
> any -> $HOME_NET any (msg:"ET POLICY Reserved IP Space Traffic - Bogon Nets 2"; 
> classtype:bad-unknown; reference:url,www.cymru.com/Documents/bogon-list.html; 
> threshold: type limit, track by_src, count 1, seconds 360; sid:2002750; rev:10;)
> 
> What I want to do is extract the value after "sid:", the value after 
> "reference:" and the value after "msg:" and insert them into a file that would 
> look like this:
> 
> 2002750 || "ET POLICY Reserved IP Space Traffic - Bogon Nets 2" || 
> url,www.cymru.com/Documents/bogon-list.html

Probably not a complete solution for your problem domain but you might
glean an idea or two from this:

        awk 'BEGIN {FS="\\(|; *"} /#/ {next} {for(i=1;i<=NF;i++) print
        $i}' mtc.rules.test | awk 'BEGIN {FS=":" ; OFS=" || "} $1 ==
        "sid" {sid=$2} $1 == "msg" {msg=$2} $1 == "reference" {ref=$2}
        $1 == ")" {print sid,msg,ref}'

> Yes, I know I could do this easily in Perl.  I'm doing this to try and improve 
> my understanding of awk.  I *think* I've figured out that the right approach is 
> to use an associative array, and this command:

No need for an array unless you want to retain the records for later
processing. For simple record-by-record processing scalar vars suffice.

> #  awk '!/#/ { for (i=1; i<=NF; i++) { if ( $i ~ /sid/) {mtcmsg[sid]=$i; print 
> mtcmsg[sid]}}}' < /usr/local/etc/snort/rules/mtc.rules.test

A couple of things to note:

        $i ~ /sid/ : will match the string "sid" anywhere within the
        field - either use ~ /^sid$/  or  == "sid" for exact matching
        mtcmsg[sid] : references the scalar var named "sid" (which is
        empty => mtcmsg[""])


Of course, if you choose to you can also just execute some
run-of-the-mill regex matches and string manipulations in awk:

        awk '!/#/ {s1=match($0, "sid:[^;]*"); if (s1) sid=substr($0,
        RSTART+4, RLENGTH-4); s2=match($0, "msg:[^;]*"); if (s2)
        msg=substr($0, RSTART+4, RLENGTH-4); s3=match($0,
        "reference:[^;]*"); if (s3) ref=substr($0, RSTART+10,
        RLENGTH-10); if (s1*s2*s3) print sid" || "msg" || "ref}'
        mtc.rules.test

but that lacks any real awk-ness.


Wayne