Date: Mon, 17 Apr 2017 20:06:16 -0400 From: Mike Jeays <mj001@rogers.com> To: Ernie Luzar <luzar722@gmail.com> Cc: "freebsd-questions@FreeBSD. ORG" <freebsd-questions@FreeBSD.ORG> Subject: Re: awk help Message-ID: <7b381f8f-e2a5-26ea-075e-96ae35efb25d@rogers.com> In-Reply-To: <58F53EEA.2030206@gmail.com> References: <58F25A01.1060208@gmail.com> <7951DF71-5CD3-4B53-9CB4-13CAA8945983@huiekin.org> <58F4CD14.7090008@gmail.com> <c95e03d2-986d-3c3c-198a-a28ab862dc70@gmail.com> <58F53EEA.2030206@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 17-04-17 06:17 PM, Ernie Luzar wrote: > Andreas Perstinger wrote: >> On 2017-04-17 16:11, Ernie Luzar wrote: >>> When I first tested /^Address/ and /^ Hits/ produced no output. I >>> changed them to /Address/ and /Hits/ and this produced output. I >>> could not find any reference to the ^ sign, so I would like to know >>> what is it suppose to do? >> >> "^" inside a regular expression is an anchor and matches the beginning >> of the line. (See "man re_format" or e.g. >> http://www.regular-expressions.info/anchors.html ). In the example >> you've posted, the lines containing "Address" and "Hits" are indented >> which means there are spaces/tabs between the beginning of the line and >> these words. Thus the patterns don't match. >> >>> I am not having success using the system commands rm & touch as shown >>> in the following example. >>> >>> awk 'BEGIN { "date +%Y%m%d" | getline date hits_yes = >>> "/etc/ipf_pool_awk_hits_yes" hits_no = "/etc/ipf_pool_awk_hits_no" rm >>> hits_yes rm hits_no "touch hits_yes" "touch hits_no" }' $hits_rpt >> >> You need to use the built-in function "system" in order to use system >> commands, e.g. >> >> system("rm " hits_yes) >> >> This concatenates the literal string "rm " with the content of the awk >> variable "hits_yes" which results in the string "rm >> /etc/ipf_pool_awk_hits_yes" and this command is then executed. >> >>> I know the date system command is working, but can't figure out how >>> to code rm & touch to get them to work. Is this even possible? >> >> The "date" command works without using the "system" function because it >> is part of the special syntax for the "getline" function. >> >> But I wonder whether you really need to use commands like "rm" and >> "touch" inside an awk script. What are you trying to accomplish? >> >> Bye, Andreas > > > This is what I am trying to accomplish. > > In general I am experimenting with ipfilter ippools {IE; in-core > tables). I used a ippool command that generates the 2 line record pair > report that I posted about in my first post. > > I have written a csh "process hits" script that takes 5+ minutes to > process that report. I have seen awk used in some public scripts but I > have never used it before. I wanted to learn awk and though it would > be a good idea to rewrite my csh "process hits" script in awk. To have > a fair comparison I needed the awk version to do the rm & touch on the > files that the csh version does. > > Well to say the least, I was shocked at the run time results. Using > the same hits.rpt file as input, the csh script took 5 minutes to > complete and the awk script took less than 1 second. They both output > the same file of ip address that have a hit count > than zero. These > two files have the same size and contain the same number of lines and > diff shows no differences between the files. > > Its obvious that awk is far superior in performance over native csh > programming. > > I have another csh script to expire records from the master file that > runs a long time. The csh script follows; > > # The following logic removes expired records > for line in `cat $temp_master_db`; do > ip=`echo -n $line | cut -w -f 2` > date=`echo -n $line | cut -w -f 1` > > if [ "$on_one" = "YES" ]; then > on_one="NO" > previous_ip="$ip" > previous_date="$date" > continue > fi > > if [ "$ip" != "$previous_ip" ]; then > > if [ $previous_date -le $expire_date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > else > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > previous_ip="$ip" > previous_date="$date" > fi > else > # Here current ip and previous_ip are the same. > # Check if expired. > if [ $previous_date -le $expire_date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > fi > if [ $previous_date -le $date ]; then > # Drop the record from the master db file as expired. > previous_ip="$ip" > previous_date="$date" > continue > fi > > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > previous_ip="$ip" > previous_date="$date" > > fi > done > > # At EOF, must still process previous record. > if [ $previous_date -le $expire_date ]; then > db_rec="$previous_date $previous_ip" > echo "${db_rec}" >> $master_db_new > fi > > > Is there some standard awk model to achieve this previous-save logic? > > Also can a csh $variable be used inside of an awk program? > > Thanks That is an amazing difference in performance - I might have expected a five to ten times improvement, but not 300+ times. I don't see anything very time-consuming in the script above. Is it possible for you to post the equivalent csh and awk scripts? Either I or someone with more experience with csh might be able to spot the problem. > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7b381f8f-e2a5-26ea-075e-96ae35efb25d>