Date: Tue, 18 Apr 2017 02:19:26 +0200 From: Polytropon <freebsd@edvax.de> To: Ernie Luzar <luzar722@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: awk help Message-ID: <20170418021926.8410148b.freebsd@edvax.de> In-Reply-To: <58F53EEA.2030206@gmail.com> References: <58F25A01.1060208@gmail.com> <7951DF71-5CD3-4B53-9CB4-13CAA8945983@huiekin.org> <58F4CD14.7090008@gmail.com> <c95e03d2-986d-3c3c-198a-a28ab862dc70@gmail.com> <58F53EEA.2030206@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 17 Apr 2017 18:17:14 -0400, Ernie Luzar wrote: > In general I am experimenting with ipfilter ippools {IE; in-core > I have written a csh "process hits" script that takes 5+ minutes to > process that report. I have seen awk used in some public scripts but I > have never used it before. I wanted to learn awk and though it would be > a good idea to rewrite my csh "process hits" script in awk. To have a > fair comparison I needed the awk version to do the rm & touch on the > files that the csh version does. Allow me a short side note: What you've written (and presented to the list) is not a csh script. It's a sh script. FreeBSD's default dialog shell is csh, the C shell, but the default scripting shell is sh, a "kind of" Bourne shell. Using sh for scripting is something like an "industry standard". Nobody likes to write scripts for the C shell. :-) > Its obvious that awk is far superior in performance over native csh > programming. It is. The awk scripting language is intended for text processing, pattern matching, output manipulation and "text-related" programming, while sh (not csh!) is much better for "general" programming, and of course as "all purpose programming glue". :-) > I have another csh script to expire records from the master file that > runs a long time. The csh script follows; > > # The following logic removes expired records > for line in `cat $temp_master_db`; do > ip=`echo -n $line | cut -w -f 2` > date=`echo -n $line | cut -w -f 1` > > if [ "$on_one" = "YES" ]; then > on_one="NO" > previous_ip="$ip" > previous_date="$date" > continue > fi > > if [ "$ip" != "$previous_ip" ]; then > > if [ $previous_date -le $expire_date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > else > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > previous_ip="$ip" > previous_date="$date" > fi > else > # Here current ip and previous_ip are the same. > # Check if expired. > if [ $previous_date -le $expire_date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > fi > if [ $previous_date -le $date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > fi > > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > previous_ip="$ip" > previous_date="$date" > > fi > done > > # At EOF, must still process previous record. > if [ $previous_date -le $expire_date ]; then > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > fi > > > Is there some standard awk model to achieve this previous-save logic? >From quickly reading that code, it should be possible to re-implement this with awk. I'm currently not aware of a "pattern name" of what you're trying to accomplish, but should be able to "translate" the sh code into awk code. > Also can a csh $variable be used inside of an awk program? No directly. A sh (not csh!) variable is prefixed by $, but the awk program is typically enclosed in single quotes which prohibit the normal function of $FOO or ${FOO}; awk uses $ itself, for example as field identifiers like $0, $1, $2 and so on. If you'd have _no_ $ in your awk code, you could probably do something like this: #!/bin/sh FOO=100 awk "BEGIN { print $FOO }" But of course, now you'll get problems using double quotes in awk. However, there is (at least) a way to deal with this problem: Prefix the data you're going to process with "special lines", let's say they start with #, a name (the "variable name", a =, and the "value". You can easily generate this as a temporary file from your "glue" script. Example: #!/bin/sh # variables and values FOO="100" BAR="123.456.789.0" # file names CONFIGFILE="/tmp/config.tmp" DATA_IN="ip_in.txt" DATA_OUT="ip_out.txt" echo "#FOO=${FOO}" > ${CONFIGFILE} echo "#BAR=${BAR}" >> ${CONFIGFILE} cat ${CONFIGFILE} ${DATA_IN} | awk -F "=" ' /^#[A-Z]/ { if ($1 == "#FOO") foo = $2 if ($1 == "#BAR") bar = $2 } /Address/ { ... # something that uses foo } /Hits/ { ... # something else that uses bar } ' > ${DATA_OUT} rm ${CONFIGFILE} In case you want to "filter out" those "special lines", you can for example use | grep -v "^#" | in your processing pipeline. Another option would be a "search and replace" mechanism that modifies the awk program itself. That can be done with awk or sed (NB: sed, the stream editor, is one of the most convenient ways to do a "search and replace" operation: | sed "s/from/to/g" | in your pipeline. As you see the " quotes, using shell variables is no problem here. Let's say your awk script has two "placeholders" called FOO and BAR (make sure they're unique!). You simply replace them with the values present in the sh "glue". Example: #!/bin/sh # variables and values FOO="100" BAR="123.456.789.0" # file names DATA_IN="ip_in.txt" DATA_OUT="ip_out.txt" SCRIPT_ORIG="process_ip_orig.awk" SCRIPT_MOD="process_ip.awk" sed "s/FOO/${FOO}/g; s/BAR/${BAR}/g" < ${SCRIPT_ORIG} > ${SCRIPT_MOD} cat ${DATA_IN} | awk -f ${SCRIPT_MOD} > ${DATA_OUT} rm ${SCRIPT_MOD} NB: Useless use of cat. :-) I'm sure there are several other ways of doing this, but maybe those two examples can help or at least inspire you. :-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170418021926.8410148b.freebsd>