From owner-freebsd-questions@FreeBSD.ORG Thu Aug 12 17:56:19 2010 Return-Path: Delivered-To: freebsd-questions@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C86610656B6 for ; Thu, 12 Aug 2010 17:56:19 +0000 (UTC) (envelope-from sterling@camdensoftware.com) Received: from wh2.interactivevillages.com (wh2.interactivevillages.com [75.125.250.34]) by mx1.freebsd.org (Postfix) with ESMTP id 65A888FC21 for ; Thu, 12 Aug 2010 17:56:19 +0000 (UTC) Received: from 174-21-101-5.tukw.qwest.net ([174.21.101.5] helo=_HOSTNAME_) by wh2.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1Ojbn1-0008B5-5D for freebsd-questions@FreeBSD.ORG; Thu, 12 Aug 2010 10:42:04 -0700 Received: by _HOSTNAME_ (sSMTP sendmail emulation); Thu, 12 Aug 2010 10:56:14 -0700 Date: Thu, 12 Aug 2010 10:56:14 -0700 From: Chip Camden To: freebsd-questions@FreeBSD.ORG Message-ID: <20100812175614.GJ20504@libertas.local.camdensoftware.com> Mail-Followup-To: freebsd-questions@FreeBSD.ORG References: <20100812153535.61549.qmail@joyce.lan> <201008121644.o7CGiflh099466@lurza.secnetix.de> <867hjv92r2.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ELVYuRnMxQ5nnKRy" Content-Disposition: inline In-Reply-To: <867hjv92r2.fsf@gmail.com> User-Agent: Mutt/1.4.2.3i Company: Camden Software Consulting URL: http://camdensoftware.com X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh2.interactivevillages.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - camdensoftware.com Cc: Subject: Re: Grepping a list of words X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2010 17:56:19 -0000 --ELVYuRnMxQ5nnKRy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Anonymous on Thursday, 12 August 2010: > Oliver Fromme writes: >=20 > > John Levine wrote: > > > > > % egrep 'word1|word2|word3|...|wordn' filename.txt > > >=20 > > > > Thanks for the replies. This suggestion won't do the job as the li= st of > > > > words is very long, maybe 50-60. This is why I asked how to place = them all > > > > in a file. One reply dealt with using a file with egrep. I'll try = that. > > >=20 > > > Gee, 50 words, that's about a 300 character pattern, that's not a pr= oblem > > > for any shell or version of grep I know. > > >=20 > > > But reading the words from a file is equivalent and as you note most > > > likely easier to do. > > > > The question is what is more efficient. This might be > > important if that kind of grep command is run very often > > by a script, or if it's run on very large files. > > > > My guess is that one large regular expression is more > > efficient than many small ones. But I haven't done real > > benchmarks to prove this. >=20 > BTW, not using regular expressions is even more efficient, e.g. >=20 > $ fgrep -f /usr/share/dict/words /etc/group >=20 > When using egrep(1) it takes considerably more time and memory. Having written a regex engine myself, I can see why. Though I'm sure egrep is highly optimized, even the most optimized DFA table is going to ta= ke more cycles to navigate than a simple string comparison. Not to mention the initial overhead of parsing the regex and building that table. --=20 Sterling (Chip) Camden | sterling@camdensoftware.com | 2048D/3A978E4F http://camdensoftware.com | http://chipstips.com | http://chipsquips= .com --ELVYuRnMxQ5nnKRy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iQEcBAEBAgAGBQJMZDW+AAoJEIpckszW26+RN+cH/RUdu7Eb1pTuM2stldoxTgDM FTI6/e6GkegfImHi8h7G2mlXfzgQPo+XmaPZtaT90UTz+IPtK2NgJmwWkN8QS8ZH /W8TLcBhJ8wJ5PfKFhYBMHWjNgiBeFB4wYd6Nsq2U2b1aRBugElQkZjFBM19pTh3 P+3wt1cKVbeIQOMT+4HhycVKthasMHl9ERzrvHR6pjSnOfPkLN0EqKpyUOmxnTbG dIcjfIzPcqQjtKkcSIQAZsJYp3smlXm3jod3Y1uW2vcZrOMTe8yVin3A7ZfvOHnM IGLcWPvYI/ozgi7GaOsWo+Qr/uIPtXaueFZBrTFktWrccnVoJObL7mU1Ulmq5yI= =IseJ -----END PGP SIGNATURE----- --ELVYuRnMxQ5nnKRy--