Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Aug 2010 02:00:14 +0930
From:      Wayne Sierke <ws@au.dyndns.ws>
To:        Paul Schmehl <pschmehl_lists@tx.rr.com>
Cc:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: Any awk gurus on the list?
Message-ID:  <1282494614.58781.15759.camel@predator-ii.buffyverse>
In-Reply-To: <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu>
References:  <23BA961B74BA2B5CA8B523F9@utd65257.utdallas.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2010-08-20 at 12:12 -0500, Paul Schmehl wrote:
> I'm trying to figure out how to use awk to parse values from a string of 
> unknown length and unknown fields using awk, from within a shell script, and 
> write those values to a file in a certain order.
> 
> Here's a typical string that I want to parse:
> 
> alert ip 
> [50.0.0.0/8,100.0.0.0/6,104.0.0.0/5,112.0.0.0/6,173.0.0.0/8,174.0.0.0/7,176.0.0.0/5,184.0.0.0/6] 
> any -> $HOME_NET any (msg:"ET POLICY Reserved IP Space Traffic - Bogon Nets 2"; 
> classtype:bad-unknown; reference:url,www.cymru.com/Documents/bogon-list.html; 
> threshold: type limit, track by_src, count 1, seconds 360; sid:2002750; rev:10;)
> 
> What I want to do is extract the value after "sid:", the value after 
> "reference:" and the value after "msg:" and insert them into a file that would 
> look like this:
> 
> 2002750 || "ET POLICY Reserved IP Space Traffic - Bogon Nets 2" || 
> url,www.cymru.com/Documents/bogon-list.html

Probably not a complete solution for your problem domain but you might
glean an idea or two from this:

        awk 'BEGIN {FS="\\(|; *"} /#/ {next} {for(i=1;i<=NF;i++) print
        $i}' mtc.rules.test | awk 'BEGIN {FS=":" ; OFS=" || "} $1 ==
        "sid" {sid=$2} $1 == "msg" {msg=$2} $1 == "reference" {ref=$2}
        $1 == ")" {print sid,msg,ref}'

> Yes, I know I could do this easily in Perl.  I'm doing this to try and improve 
> my understanding of awk.  I *think* I've figured out that the right approach is 
> to use an associative array, and this command:

No need for an array unless you want to retain the records for later
processing. For simple record-by-record processing scalar vars suffice.

> #  awk '!/#/ { for (i=1; i<=NF; i++) { if ( $i ~ /sid/) {mtcmsg[sid]=$i; print 
> mtcmsg[sid]}}}' < /usr/local/etc/snort/rules/mtc.rules.test

A couple of things to note:

        $i ~ /sid/ : will match the string "sid" anywhere within the
        field - either use ~ /^sid$/  or  == "sid" for exact matching
        mtcmsg[sid] : references the scalar var named "sid" (which is
        empty => mtcmsg[""])


Of course, if you choose to you can also just execute some
run-of-the-mill regex matches and string manipulations in awk:

        awk '!/#/ {s1=match($0, "sid:[^;]*"); if (s1) sid=substr($0,
        RSTART+4, RLENGTH-4); s2=match($0, "msg:[^;]*"); if (s2)
        msg=substr($0, RSTART+4, RLENGTH-4); s3=match($0,
        "reference:[^;]*"); if (s3) ref=substr($0, RSTART+10,
        RLENGTH-10); if (s1*s2*s3) print sid" || "msg" || "ref}'
        mtc.rules.test

but that lacks any real awk-ness.


Wayne





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1282494614.58781.15759.camel>