Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Aug 2010 10:56:14 -0700
From:      Chip Camden <sterling@camdensoftware.com>
To:        freebsd-questions@FreeBSD.ORG
Subject:   Re: Grepping a list of words
Message-ID:  <20100812175614.GJ20504@libertas.local.camdensoftware.com>
In-Reply-To: <867hjv92r2.fsf@gmail.com>
References:  <20100812153535.61549.qmail@joyce.lan> <201008121644.o7CGiflh099466@lurza.secnetix.de> <867hjv92r2.fsf@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--ELVYuRnMxQ5nnKRy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Anonymous on Thursday, 12 August 2010:
> Oliver Fromme <olli@lurza.secnetix.de> writes:
>=20
> > John Levine <johnl@iecc.com> wrote:
> >  > > > % egrep 'word1|word2|word3|...|wordn' filename.txt
> >  >=20
> >  > > Thanks for the replies. This suggestion won't do the job as the li=
st of
> >  > > words is very long, maybe 50-60. This is why I asked how to place =
them all
> >  > > in a file. One reply dealt with using a file with egrep. I'll try =
that.
> >  >=20
> >  > Gee, 50 words, that's about a 300 character pattern, that's not a pr=
oblem
> >  > for any shell or version of grep I know.
> >  >=20
> >  > But reading the words from a file is equivalent and as you note most
> >  > likely easier to do.
> >
> > The question is what is more efficient.  This might be
> > important if that kind of grep command is run very often
> > by a script, or if it's run on very large files.
> >
> > My guess is that one large regular expression is more
> > efficient than many small ones.  But I haven't done real
> > benchmarks to prove this.
>=20
> BTW, not using regular expressions is even more efficient, e.g.
>=20
>   $ fgrep -f /usr/share/dict/words /etc/group
>=20
> When using egrep(1) it takes considerably more time and memory.

Having written a regex engine myself, I can see why.  Though I'm sure
egrep is highly optimized, even the most optimized DFA table is going to ta=
ke more
cycles to navigate than a simple string comparison.  Not to mention the
initial overhead of parsing the regex and building that table.

--=20
Sterling (Chip) Camden    | sterling@camdensoftware.com | 2048D/3A978E4F
http://camdensoftware.com | http://chipstips.com        | http://chipsquips=
.com

--ELVYuRnMxQ5nnKRy
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iQEcBAEBAgAGBQJMZDW+AAoJEIpckszW26+RN+cH/RUdu7Eb1pTuM2stldoxTgDM
FTI6/e6GkegfImHi8h7G2mlXfzgQPo+XmaPZtaT90UTz+IPtK2NgJmwWkN8QS8ZH
/W8TLcBhJ8wJ5PfKFhYBMHWjNgiBeFB4wYd6Nsq2U2b1aRBugElQkZjFBM19pTh3
P+3wt1cKVbeIQOMT+4HhycVKthasMHl9ERzrvHR6pjSnOfPkLN0EqKpyUOmxnTbG
dIcjfIzPcqQjtKkcSIQAZsJYp3smlXm3jod3Y1uW2vcZrOMTe8yVin3A7ZfvOHnM
IGLcWPvYI/ozgi7GaOsWo+Qr/uIPtXaueFZBrTFktWrccnVoJObL7mU1Ulmq5yI=
=IseJ
-----END PGP SIGNATURE-----

--ELVYuRnMxQ5nnKRy--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100812175614.GJ20504>