Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Jul 1999 16:45:33 -0700
From:      John-Mark Gurney <gurney_j@efn.org>
To:        James Howard <howardjp@wam.umd.edu>
Cc:        Tim Vanderhoek <vanderh@ecf.utoronto.ca>, "Daniel C. Sobral" <dcs@newsguy.com>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: replacing grep(1)
Message-ID:  <19990729164533.36798@hydrogen.fircrest.net>
In-Reply-To: <Pine.GSO.4.10.9907291856100.11776-100000@rac9.wam.umd.edu>; from James Howard on Thu, Jul 29, 1999 at 07:05:57PM -0400
References:  <19990729182229.E24296@mad> <Pine.GSO.4.10.9907291856100.11776-100000@rac9.wam.umd.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

--69ThD9tjz6Eq5Ldr
Content-Type: text/plain; charset=us-ascii

James Howard scribbled this message on Jul 29:
> On Thu, 29 Jul 1999, Tim Vanderhoek wrote:
> 
> > fgetln() does a complete copy of the line buffer whenever an
> > excessively long line is found.  On this point, it's hard to do better
> > without using mmap(), but mmap() has its own disadvantages.  My last
> > suggestion to James was to assume a worst case for long lines and mark
> > the worst worst case with an XXX "this is unfortunate".
> 
> <warning type="Anything said here wrong is my fault, not DES's">
> 
> DES tells me he has a new version (0.10) which mmap()s.  It supposedly
> cuts the run time down significantly, I do not have the numbers in front
> of me.  Unfortunetly he has not posted this version yet so I cannot
> download it and run it myself.  He also says that if mmap fails, he drops
> back to stdio.  This should only happen in the NFS case, the > 2G case,
> etc.
> 
> </warning>
> 
> > [Never mind that it should be spending near 100% of its time in
> >  procline...that just means he's still got work to do... :-]
> 
> I'd rather see it spending 100% of its time in regexec(), then I can just
> blame Henry Spencer :)
> 
> Someone said there was new regex code out, is this true?  Can anyone with
> a copy test grep with it?

ok, I just made a patch to eliminate the copy that was happening in
procfile, and it sped up a grep of a 5meg termcap from about 2.9sec
down to .6 seconds... this includes time spent profiling the program..
GNU grep w/o profiling only takes .15sec so we ARE getting closer to
GNU grep...

it was VERY simple to do... and attached is the patch... this uses the
option REG_STARTEND to do what the copy was trying to do... all of the
code to use REG_STARTEND was already there, it just needed to be enabled..

enjoy!

-- 
  John-Mark Gurney                              Voice: +1 541 684 8449
  Cu Networking					  P.O. Box 5693, 97405

  "The soul contains in itself the event that shall presently befall it.
  The event is only the actualizing of its thought." -- Ralph Waldo Emerson

--69ThD9tjz6Eq5Ldr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="grep.patch"

diff -u grep-0.10.orig/util.c grep-0.10/util.c
--- grep-0.10.orig/util.c	Thu Jul 29 05:00:15 1999
+++ grep-0.10/util.c	Thu Jul 29 16:38:06 1999
@@ -93,7 +93,6 @@
 	file_t *f;
 	str_t ln;
 	int c, t, z;
-	char *tmp;
 
 	if (fn == NULL) {
 		fn = "(standard input)";
@@ -119,13 +118,8 @@
 		initqueue();
 	for (c = 0; !(lflag && c);) {
 		ln.off = grep_tell(f);
-		if ((tmp = grep_fgetln(f, &ln.len)) == NULL)
+		if ((ln.dat = grep_fgetln(f, &ln.len)) == NULL)
 			break;
-		ln.dat = grep_malloc(ln.len + 1);
-		memcpy(ln.dat, tmp, ln.len);
-		ln.dat[ln.len] = 0;
-		if (ln.len > 0 && ln.dat[ln.len - 1] == '\n')
-			ln.dat[--ln.len] = 0;
 		ln.line_no++;
 
 		z = tail;
@@ -133,7 +127,6 @@
 			enqueue(&ln);
 			linesqueued++;
 		}
-		free(ln.dat);
 		c += t;
 	}
 	if (Bflag > 0)
@@ -174,7 +167,8 @@
 	pmatch.rm_so = 0;
 	pmatch.rm_eo = l->len;
 	for (c = i = 0; i < patterns; i++) {
-		r = regexec(&r_pattern[i], l->dat, 0, &pmatch, eflags);
+		r = regexec(&r_pattern[i], l->dat, 0, &pmatch,
+		    eflags | REG_STARTEND);
 		if (r == REG_NOMATCH && t == 0)
 			continue;
 		if (wflag && r == 0) {

--69ThD9tjz6Eq5Ldr--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990729164533.36798>