From owner-freebsd-questions@FreeBSD.ORG Thu Jan 23 20:46:03 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B0A74D3E; Thu, 23 Jan 2014 20:46:03 +0000 (UTC) Received: from ip-005.utdallas.edu (ip-005.utdallas.edu [129.110.182.13]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C294118FD; Thu, 23 Jan 2014 20:45:56 +0000 (UTC) X-Group: None X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkcMABt+4VKBbgogVWdsb2JhbABag0SJHqIjkUyBKwMBFwQHAgkHFCiCJQEBAQQnEQIxGgQLEQQBAQEnB0MDCQgGARIZAodWAxHBIQ2FVheMbIIbBoMegRQEiUiQDZQyHQ X-IPAS-Result: AkcMABt+4VKBbgogVWdsb2JhbABag0SJHqIjkUyBKwMBFwQHAgkHFCiCJQEBAQQnEQIxGgQLEQQBAQEnB0MDCQgGARIZAodWAxHBIQ2FVheMbIIbBoMegRQEiUiQDZQyHQ X-IronPort-AV: E=Sophos;i="4.95,708,1384322400"; d="scan'208";a="21158311" Received: from zxtm01.utdallas.edu (HELO utd71538.utdallas.edu) ([129.110.10.32]) by ip-005.utdallas.edu with ESMTP/TLS/DHE-RSA-AES256-SHA; 23 Jan 2014 14:44:37 -0600 Date: Thu, 23 Jan 2014 14:45:16 -0600 From: Paul Schmehl To: dteske@FreeBSD.org, 'RW' , freebsd-questions@freebsd.org Subject: RE: awk programming question Message-ID: In-Reply-To: <04a201cf1878$8ebce540$ac36afc0$@FreeBSD.org> References: <20140123185604.4cbd7611@gumby.homeunix.com> <04a201cf1878$8ebce540$ac36afc0$@FreeBSD.org> X-Mailer: Mulberry/4.1.0a1 (Mac OS X) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline; size=6497 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: Paul Schmehl List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jan 2014 20:46:03 -0000 --On January 23, 2014 at 12:20:26 PM -0800 dteske@FreeBSD.org wrote: > > >> -----Original Message----- >> From: RW [mailto:rwmaillists@googlemail.com] >> Sent: Thursday, January 23, 2014 10:56 AM >> To: freebsd-questions@freebsd.org >> Subject: Re: awk programming question >> >> On Thu, 23 Jan 2014 09:30:35 -0700 (MST) Warren Block wrote: >> >> > On Thu, 23 Jan 2014, Paul Schmehl wrote: >> > >> > > I'm kind of stubborn. There's lots of different ways to skin a cat, >> > > but I like to force myself to use the built-in utilities to do >> > > things so I can learn more about them and better understand how they >> > > work. >> > > >> > > So, I'm trying to parse a file of snort rules, extract two string >> > > values and insert a double pipe between them to create a sig-msg.map >> > > file >> > > >> > > Here's a typical rule: >> > > >> > > alert udp $HOME_NET any -> $EXTERNAL_NET 69 (msg:"E3[rb] ET POLICY >> > > Outbound TFTP Read Request"; content:"|00 01|"; depth:2; >> > > classtype:bad-unknown; sid:2008120; rev:1;) >> > > >> > > Here's a typical sig-msg.map file entry: >> > > >> > > 9624 || RPC UNIX authentication machinename string overflow attempt >> > > UDP >> > > >> > > So, from the above rule I would want to create a single line like >> > > this: >> > > >> > > 2008120 || E3[rb] ET POLICY Outbound TFTP Read Request >> > > >> > > There are several ways I can extract one or the other value, and >> > > I've figured out how to extract the sid and add the double pipe, but >> > > for the life of me I can't figure out how to extract and print out >> > > sid || msg. >> > > >> > > This prints out the sid and the double pipe: >> > > >> > > echo `awk 'match($0,/sid:[0-9]*;/) {print substr($0,RSTART,RLENGTH)" >> > > || "}' /tmp/mtc.rules | tr -d ";sid" >> > > >> > > It seems I could put the results into a variable rather than >> > > printing them out, and then print var1 || var2, but my google foo >> > > hasn't found a useful example. >> > > >> > > Surely there's a way to do this using awk? I can use tr for >> > > cleanup. I just need to get close to the right result. >> > > >> > > How about it awk experts? What's the cleanest way to get this done? >> > >> > Not an awk expert, but you can do math on the start and length >> > variables to get just the date part: >> > >> > echo "sid:2008120;" \ >> > | awk '{ match($0, /sid:[0-9]*;/) ; \ >> > ymd=substr($0, RSTART+4, RLENGTH-5) ; print ymd }' >> > >> > Closer to what you want: >> > >> > echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;' >> > \ | awk '{ match($0, /sid:[0-9]*;/) ; \ >> > ymd=substr($0, RSTART+4, RLENGTH-5) ; \ >> > match($0, /msg:.*;/) ; \ >> > msg = substr($0, RSTART+4, RLENGTH-5) ; \ >> > print ymd, "||", msg }' >> > >> > Note the error that the too-greedy regex creates, and the inability of >> > awk to capture regex sub-expressions. awk does not have a way to >> > reduce the greediness, at least that I'm aware. You may be able to >> > work around that, like if the message is always the same length. >> >> >> $ echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;' >> | \ >> awk '{ match($0, /sid:[0-9]+;/) ; ymd=substr($0, RSTART+4, RLENGTH-5) ; > \ >> match($0, /msg:[^;]+;/) ; msg = substr($0, RSTART+4, RLENGTH-5) ; > \ >> print ymd, "||", msg }' >> >> 2008120 || "E3[rb] ET POLICY Outbound TFTP Read Request" >> >> Note that awk supports +, but not newfangled things like *. > > With respect to regex, what awk really needs is the quantifier syntax... > > * = {0,} = zero or more > + = {1,} = one or more > {x,y} = any quantity from x inclusively up to y > {x,} = any quantity from x or more > > sed supports it -- e.g., echo "aaa" | sed -e 's/a\{1,2\}//' # produces "a" > sed -E (aka sed -r) supports it -- e.g., echo "aaa" | sed -E 's/a{1,2}//' > # produces "a" > grep supports it -- e.g., echo "aaa" | grep 'a\{2,\}' # match printed > grep -E (aka egrep) supports it -- e.g., echo "aaa" | grep -E 'a{2,}' # > match printed > perl supports it -- obviously (in the modern regex form, lacking > backslash) nvi supports it -- e.g., :%s/a\{1,2\}// > vim supports it -- obviously (and uses the backslash form; even with > noncompatible set) > > onetrueawk however does NOT support it -- example given... > echo aaa | awk '/a{2,}/{print}' # no match printed > echo aaa | awk '/a\{2,\}/{print}' # no match printed > > There's a couple of other nits here... > > 1. sig-msg.map file according to OP shouldn't have the quotes that are > present from the snort rule input > 2. Doesn't ignore lines of disinterest (See http://oreilly.com/pub/h/1393) > NB: The result code of match() is ignored; I don't think the program > should output > known bad sig-msg.map lines (where an sid is not given, for example; which > appears > to be the key for the sig-msg.map file). > > I gather that a more complete solution would be as follows: > > awk '!/^[[:space:]]*(#|$)/{if (!match($0, > /[[:space:](;]sid:[[:space:]]*[0-9]/)) next; sid = substr($0, RSTART + > RLENGTH - 1); sub(/[^0-9].*/, "", sid); if (!match($0, > /[[:space:](;]msg:[[:space:]]*/)) next; buf = substr($0, RSTART + > RLENGTH); quoted = substr(buf, 0, 1) == "\""; split(buf, msg, quoted ? > "\"" : FS); print sid, "||", msg[quoted ? 2 : 1]}' rules_file > > Where "rules_file" is the name of the file you want to parse. > > Putting this into a script, we can clean it up so that it's readable... > ># !/bin/sh > awk ' > !/^[[:space:]]*(#|$)/ { > if (!match($0, /[[:space:](;]sid:[[:space:]]*[0-9]/)) next > sid = substr($0, RSTART + RLENGTH - 1) > sub(/[^0-9].*/, "", sid) > if (!match($0, /[[:space:](;]msg:[[:space:]]*/)) next > buf = substr($0, RSTART + RLENGTH) > quoted = substr(buf, 0, 1) == "\"" > split(buf, msg, quoted ? "\"" : FS) > print sid, "||", msg[quoted ? 2 : 1] > }' "$@" Thanks so much! In the end I opted to use perl, because i had more pressing matters to attend to, but I'm please to know that it's doable with awk, and I will test your script (and endeavor to more fully understand it) when I have the time to do so. -- Paul Schmehl, Senior Infosec Analyst As if it wasn't already obvious, my opinions are my own and not those of my employer. ******************************************* "It is as useless to argue with those who have renounced the use of reason as to administer medication to the dead." Thomas Jefferson "There are some ideas so wrong that only a very intelligent person could believe in them." George Orwell