Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Nov 2009 21:22:28 +0100
From:      Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net>
To:        freebsd-hackers@freebsd.org
Subject:   Issue with grep -i (on i386 only?)
Message-ID:  <200911032122.28905.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net>

next in thread | raw e-mail | index | archive | help
--Boundary-00=_EEJ8Ksjl4Ao5+OM
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi,

attached a little test script for grep's -i performance. I tried a few 
different machines and the 64-bit 7.2 machine I could steal doesn't seem to be 
affected and out performs pcregrep.
On i386 machines, grep -i is significantly slower:
i386, 7.2-STABLE of Sep 8, load averages: 0.00, 0.02, 0.00,
Mem: 336M Active, 442M Inact, 217M Wired, 38M Cache, 112M Buf, 198M Free
dev.cpu.0.freq: 2992 (Intel P-IV HTT enabled)
16Meg file result:
=>>> 16777216
    =>>> fgrep
        0.04 real         0.02 user         0.01 sys
        0.04 real         0.03 user         0.01 sys
    =>>> pcregrep
        0.21 real         0.19 user         0.02 sys
        0.21 real         0.20 user         0.00 sys
    =>>> grep
        0.04 real         0.02 user         0.01 sys << not -i
        3.64 real         3.61 user         0.01 sys << -i

i386, 8.0-RC1 FreeBSD 8.0-RC1 #15 r197337M, load averages: 1.61, 1.35, 1.12
Mem: 920M Active, 87M Inact, 215M Wired, 69M Cache, 112M Buf, 195M Free
dev.cpu.0.freq: 1733 (Intel dual core laptop)
16Meg file result:
=>>> 16777216
    =>>> fgrep
        0.04 real         0.02 user         0.01 sys
        0.05 real         0.04 user         0.00 sys
    =>>> pcregrep
        0.26 real         0.23 user         0.01 sys
        0.29 real         0.24 user         0.00 sys
    =>>> grep
        0.04 real         0.04 user         0.00 sys
        4.73 real         4.15 user         0.01 sys

amd64, 7.2-RELEASE-p4 #1 r198384M, load averages: 0.00, 0.00, 0.00
Mem: 115M Active, 182M Inact, 264M Wired, 101M Cache, 213M Buf, 1311M Free
CPU: Dual-Core AMD Opteron(tm) Processor 2210 (1800.08-MHz K8-class CPU)
64Meg file result:
=>>> 67108864
    =>>> fgrep
        0.18 real         0.13 user         0.04 sys
        0.19 real         0.17 user         0.02 sys
    =>>> pcregrep
        0.89 real         0.85 user         0.03 sys
        0.98 real         0.92 user         0.06 sys
    =>>> grep
        0.18 real         0.16 user         0.01 sys
        0.19 real         0.16 user         0.03 sys


So on the laptop I modified the testscript as it is attached now and while 
there is still a significant delay, the wallclock time is less then half, when 
the expression is rewritten with the same meaning:
=>>> 16777216
    =>>> fgrep
        0.04 real         0.03 user         0.00 sys
        0.05 real         0.03 user         0.01 sys
        0.02 real         0.00 user         0.00 sys
    =>>> pcregrep
        0.26 real         0.21 user         0.02 sys
        0.26 real         0.22 user         0.02 sys
        0.44 real         0.35 user         0.01 sys
    =>>> grep
        0.04 real         0.04 user         0.00 sys
        4.45 real         4.15 user         0.01 sys
        2.00 real         1.81 user         0.00 sys <-- [fF][Oo][Oo]

So it looks to me that, while there is a problem with case insensitive 
comparison, just rewriting the expression is an optimization grep could 
perform.
Either way, with the new text tools being written (done?) is this problem 
being attacked, not fixable due to specifications or not considered an issue?
Any PR's needed / I missed? Patches to try?

[And it just occured to me bsdgrep is in ports]:
    =>>> bsdgrep
        0.93 real         0.74 user         0.00 sys
        4.80 real         4.33 user         0.02 sys
        4.97 real         4.34 user         0.01 sys

So here the optimization does not fly.
-- 
Mel

--Boundary-00=_EEJ8Ksjl4Ao5+OM
Content-Type: text/plain;
  charset="UTF-8";
  name="test.sh.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="test.sh.txt"

#!/bin/sh
# vim: ts=4 sw=4 noet tw=78 ai

PCREGREP=`which pcregrep`
BSDGREP=`which bsdgrep`
[ -n ${PCREGREP} ] && PCREGREP=`basename ${PCREGREP}`
[ -n ${BSDGREP} ] && BSDGREP=`basename ${BSDGREP}`

me=`basename $0`
BYTES="1048576 2097152 4194304 8388608 16777216"
if [ ! -x /usr/bin/jot ]; then
	echo "Need jot"
	exit 1
fi
if [ ! -x /usr/bin/rs ]; then
	echo "Need rs"
	exit 1
fi

for b in ${BYTES}; do
	TMPFILE=`mktemp -t ${me}`
	if [ ! -f ${TMPFILE} ]; then
		echo Can\'t create tmp files in ${TMPDIR:="/tmp"}
		exit 2
	fi
	jot -r -c ${b} a z |rs -g 0 20 > ${TMPFILE}
	echo "=>>> ${b}"
	for prog in fgrep ${PCREGREP} ${BSDGREP} grep ; do
		echo "    =>>> ${prog}"
		/usr/bin/time ${prog} foo ${TMPFILE} >/dev/null
		/usr/bin/time ${prog} -i foo ${TMPFILE} >/dev/null
		/usr/bin/time ${prog} '[fF][Oo][Oo]' ${TMPFILE} >/dev/null
	done
	rm ${TMPFILE}
done


--Boundary-00=_EEJ8Ksjl4Ao5+OM--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200911032122.28905.mel.flynn%2Bfbsd.hackers>