Date: Sat, 23 Aug 2008 15:16:33 -0700 From: Walt Pawley <walt@wump.org> To: Matthew Seaman <m.seaman@infracaninophile.co.uk> Cc: Oliver Fromme <olli@lurza.secnetix.de>, freebsd-questions@freebsd.org Subject: Re: sed/awk, instead of Perl Message-ID: <p06240812c4d6374bdf83@[10.0.0.10]> In-Reply-To: <48AFD1ED.5070800@infracaninophile.co.uk> References: <200808220759.m7M7xuh0047625@lurza.secnetix.de> <p0624080cc4d504c10465@[10.0.0.10]> <48AFD1ED.5070800@infracaninophile.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
At 10:01 AM +0100 8/23/08, Matthew Seaman wrote: >Walt Pawley wrote: >> >> At the risk of beating this to death, I just happened to >> stumble on a real world example of why one might want to use >> Perl for sed-ly stuff. >> ... snip ... >> wump$ ls -l Desktop/klog >> -rw-r--r-- 1 wump 1001 52753322 22 Aug 16:37 Desktop/klog >> wump$ time sed "s/ .*//" Desktop/klog > kadr1 >> >> real 0m10.800s >> user 0m10.580s >> sys 0m0.250s >> wump$ time perl -pe 's/ .*//' Desktop/klog > kadr2 >> >> real 0m0.975s >> user 0m0.700s >> sys 0m0.270s >> wump$ cmp kadr1 kadr2 >> wump$ >> >> Why disparity in execution speed? ... > >Careful now. Have you accounted for the effect of the klog file >being cached in VM rather than having to be read afresh from disk? >It makes a very big difference in how fast it is processed. No, I hadn't done any such accounting. So, wrote a little script you can surmise from the following output: wump$ sh -v spdtst time perl -pe 's/ .*//' Desktop/klog > /dev/null real 0m0.961s user 0m0.740s sys 0m0.230s time sed "s/ .*//" Desktop/klog > /dev/null real 0m10.506s user 0m10.270s sys 0m0.250s time awk '{print $1}' Desktop/klog > /dev/null real 0m2.333s user 0m2.140s sys 0m0.180s time sed "s/ .*//" Desktop/klog > /dev/null real 0m10.489s user 0m10.250s sys 0m0.230s time perl -pe 's/ .*//' Desktop/klog > /dev/null real 0m0.799s user 0m0.580s sys 0m0.220s >In order to get meaningful data for this sort of test you should >do a dummy run or two of each command in fairly quick succession, >and then repeat your test runs a number of times and look at the >average and standard deviation of the execution times. ... Yeah, Hoyle would like that. But for me, I think the results are clear enough without all the messing with statistical computations. 10 to 1 or better is good enough for me to think there's some major difference. That said, it would appear that caching can make a difference - which is why I put the Perl invocation first ... so it would be running without the benefit of caching. But I don't believe I was entirely successful in that effort. The very first time I ran this, which was also the very first time in a whole day that the klog file had been accessed, the first Perl invocation took about 2 seconds of real time and still only 0.7 seconds of user time. I don't believe caching explains the execution speed disparity. It was mentioned that this function is made for awk, so I tried that as well. It is also evidently not as quick as Perl at doing the job. The time shown above is quite consistent with a number of other runs I've tried with awk. I suspect a real Perl internals maven could explain this. I have some ideas but they're conjecture. Perhaps some effort to improve execution efficiency in sed and awk would not be wasted? -- Walter M. Pawley <walt@wump.org> Wump Research & Company 676 River Bend Road, Roseburg, OR 97470 541-672-8975
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?p06240812c4d6374bdf83>