Date: Thu, 23 Jan 2014 18:56:04 +0000 From: RW <rwmaillists@googlemail.com> To: freebsd-questions@freebsd.org Subject: Re: awk programming question Message-ID: <20140123185604.4cbd7611@gumby.homeunix.com> In-Reply-To: <alpine.BSF.2.00.1401230900270.76961@wonkity.com> References: <F01EB9CE742DEB17DB6B51C7@localhost> <alpine.BSF.2.00.1401230900270.76961@wonkity.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 23 Jan 2014 09:30:35 -0700 (MST) Warren Block wrote: > On Thu, 23 Jan 2014, Paul Schmehl wrote: > > > I'm kind of stubborn. There's lots of different ways to skin a > > cat, but I like to force myself to use the built-in utilities to do > > things so I can learn more about them and better understand how > > they work. > > > > So, I'm trying to parse a file of snort rules, extract two string > > values and insert a double pipe between them to create a > > sig-msg.map file > > > > Here's a typical rule: > > > > alert udp $HOME_NET any -> $EXTERNAL_NET 69 (msg:"E3[rb] ET POLICY > > Outbound TFTP Read Request"; content:"|00 01|"; depth:2; > > classtype:bad-unknown; sid:2008120; rev:1;) > > > > Here's a typical sig-msg.map file entry: > > > > 9624 || RPC UNIX authentication machinename string overflow attempt > > UDP > > > > So, from the above rule I would want to create a single line like > > this: > > > > 2008120 || E3[rb] ET POLICY Outbound TFTP Read Request > > > > There are several ways I can extract one or the other value, and > > I've figured out how to extract the sid and add the double pipe, > > but for the life of me I can't figure out how to extract and print > > out sid || msg. > > > > This prints out the sid and the double pipe: > > > > echo `awk 'match($0,/sid:[0-9]*;/) {print > > substr($0,RSTART,RLENGTH)" || "}' /tmp/mtc.rules | tr -d ";sid" > > > > It seems I could put the results into a variable rather than > > printing them out, and then print var1 || var2, but my google foo > > hasn't found a useful example. > > > > Surely there's a way to do this using awk? I can use tr for > > cleanup. I just need to get close to the right result. > > > > How about it awk experts? What's the cleanest way to get this done? > > Not an awk expert, but you can do math on the start and length > variables to get just the date part: > > echo "sid:2008120;" \ > | awk '{ match($0, /sid:[0-9]*;/) ; \ > ymd=substr($0, RSTART+4, RLENGTH-5) ; print ymd }' > > Closer to what you want: > > echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; > sid:2008120;' \ | awk '{ match($0, /sid:[0-9]*;/) ; \ > ymd=substr($0, RSTART+4, RLENGTH-5) ; \ > match($0, /msg:.*;/) ; \ > msg = substr($0, RSTART+4, RLENGTH-5) ; \ > print ymd, "||", msg }' > > Note the error that the too-greedy regex creates, and the inability > of awk to capture regex sub-expressions. awk does not have a way to > reduce the greediness, at least that I'm aware. You may be able to > work around that, like if the message is always the same length. $ echo 'msg:"E3[rb] ET POLICY Outbound TFTP Read Request"; sid:2008120;' |\ awk '{ match($0, /sid:[0-9]+;/) ; ymd=substr($0, RSTART+4, RLENGTH-5) ; \ match($0, /msg:[^;]+;/) ; msg = substr($0, RSTART+4, RLENGTH-5) ; \ print ymd, "||", msg }' 2008120 || "E3[rb] ET POLICY Outbound TFTP Read Request" Note that awk supports +, but not newfangled things like *.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140123185604.4cbd7611>