From owner-freebsd-hackers Thu Jul 29 16:45:57 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from metriclient-2.uoregon.edu (metriclient-2.uoregon.edu [128.223.172.2]) by hub.freebsd.org (Postfix) with ESMTP id 2AC2E1564E for ; Thu, 29 Jul 1999 16:45:50 -0700 (PDT) (envelope-from gurney_j@efn.org) Received: (from jmg@localhost) by metriclient-2.uoregon.edu (8.9.1/8.8.7) id QAA24077; Thu, 29 Jul 1999 16:45:33 -0700 (PDT) Message-ID: <19990729164533.36798@hydrogen.fircrest.net> Date: Thu, 29 Jul 1999 16:45:33 -0700 From: John-Mark Gurney To: James Howard Cc: Tim Vanderhoek , "Daniel C. Sobral" , freebsd-hackers@FreeBSD.ORG Subject: Re: replacing grep(1) References: <19990729182229.E24296@mad> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=69ThD9tjz6Eq5Ldr X-Mailer: Mutt 0.69 In-Reply-To: ; from James Howard on Thu, Jul 29, 1999 at 07:05:57PM -0400 Reply-To: John-Mark Gurney Organization: Cu Networking X-Operating-System: FreeBSD 3.0-RELEASE i386 X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31 96 7A 22 B3 D8 56 36 F4 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG --69ThD9tjz6Eq5Ldr Content-Type: text/plain; charset=us-ascii James Howard scribbled this message on Jul 29: > On Thu, 29 Jul 1999, Tim Vanderhoek wrote: > > > fgetln() does a complete copy of the line buffer whenever an > > excessively long line is found. On this point, it's hard to do better > > without using mmap(), but mmap() has its own disadvantages. My last > > suggestion to James was to assume a worst case for long lines and mark > > the worst worst case with an XXX "this is unfortunate". > > > > DES tells me he has a new version (0.10) which mmap()s. It supposedly > cuts the run time down significantly, I do not have the numbers in front > of me. Unfortunetly he has not posted this version yet so I cannot > download it and run it myself. He also says that if mmap fails, he drops > back to stdio. This should only happen in the NFS case, the > 2G case, > etc. > > > > > [Never mind that it should be spending near 100% of its time in > > procline...that just means he's still got work to do... :-] > > I'd rather see it spending 100% of its time in regexec(), then I can just > blame Henry Spencer :) > > Someone said there was new regex code out, is this true? Can anyone with > a copy test grep with it? ok, I just made a patch to eliminate the copy that was happening in procfile, and it sped up a grep of a 5meg termcap from about 2.9sec down to .6 seconds... this includes time spent profiling the program.. GNU grep w/o profiling only takes .15sec so we ARE getting closer to GNU grep... it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. enjoy! -- John-Mark Gurney Voice: +1 541 684 8449 Cu Networking P.O. Box 5693, 97405 "The soul contains in itself the event that shall presently befall it. The event is only the actualizing of its thought." -- Ralph Waldo Emerson --69ThD9tjz6Eq5Ldr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="grep.patch" diff -u grep-0.10.orig/util.c grep-0.10/util.c --- grep-0.10.orig/util.c Thu Jul 29 05:00:15 1999 +++ grep-0.10/util.c Thu Jul 29 16:38:06 1999 @@ -93,7 +93,6 @@ file_t *f; str_t ln; int c, t, z; - char *tmp; if (fn == NULL) { fn = "(standard input)"; @@ -119,13 +118,8 @@ initqueue(); for (c = 0; !(lflag && c);) { ln.off = grep_tell(f); - if ((tmp = grep_fgetln(f, &ln.len)) == NULL) + if ((ln.dat = grep_fgetln(f, &ln.len)) == NULL) break; - ln.dat = grep_malloc(ln.len + 1); - memcpy(ln.dat, tmp, ln.len); - ln.dat[ln.len] = 0; - if (ln.len > 0 && ln.dat[ln.len - 1] == '\n') - ln.dat[--ln.len] = 0; ln.line_no++; z = tail; @@ -133,7 +127,6 @@ enqueue(&ln); linesqueued++; } - free(ln.dat); c += t; } if (Bflag > 0) @@ -174,7 +167,8 @@ pmatch.rm_so = 0; pmatch.rm_eo = l->len; for (c = i = 0; i < patterns; i++) { - r = regexec(&r_pattern[i], l->dat, 0, &pmatch, eflags); + r = regexec(&r_pattern[i], l->dat, 0, &pmatch, + eflags | REG_STARTEND); if (r == REG_NOMATCH && t == 0) continue; if (wflag && r == 0) { --69ThD9tjz6Eq5Ldr-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message