Date: Thu, 29 Jul 1999 16:45:33 -0700 From: John-Mark Gurney <gurney_j@efn.org> To: James Howard <howardjp@wam.umd.edu> Cc: Tim Vanderhoek <vanderh@ecf.utoronto.ca>, "Daniel C. Sobral" <dcs@newsguy.com>, freebsd-hackers@FreeBSD.ORG Subject: Re: replacing grep(1) Message-ID: <19990729164533.36798@hydrogen.fircrest.net> In-Reply-To: <Pine.GSO.4.10.9907291856100.11776-100000@rac9.wam.umd.edu>; from James Howard on Thu, Jul 29, 1999 at 07:05:57PM -0400 References: <19990729182229.E24296@mad> <Pine.GSO.4.10.9907291856100.11776-100000@rac9.wam.umd.edu>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
James Howard scribbled this message on Jul 29:
> On Thu, 29 Jul 1999, Tim Vanderhoek wrote:
>
> > fgetln() does a complete copy of the line buffer whenever an
> > excessively long line is found. On this point, it's hard to do better
> > without using mmap(), but mmap() has its own disadvantages. My last
> > suggestion to James was to assume a worst case for long lines and mark
> > the worst worst case with an XXX "this is unfortunate".
>
> <warning type="Anything said here wrong is my fault, not DES's">
>
> DES tells me he has a new version (0.10) which mmap()s. It supposedly
> cuts the run time down significantly, I do not have the numbers in front
> of me. Unfortunetly he has not posted this version yet so I cannot
> download it and run it myself. He also says that if mmap fails, he drops
> back to stdio. This should only happen in the NFS case, the > 2G case,
> etc.
>
> </warning>
>
> > [Never mind that it should be spending near 100% of its time in
> > procline...that just means he's still got work to do... :-]
>
> I'd rather see it spending 100% of its time in regexec(), then I can just
> blame Henry Spencer :)
>
> Someone said there was new regex code out, is this true? Can anyone with
> a copy test grep with it?
ok, I just made a patch to eliminate the copy that was happening in
procfile, and it sped up a grep of a 5meg termcap from about 2.9sec
down to .6 seconds... this includes time spent profiling the program..
GNU grep w/o profiling only takes .15sec so we ARE getting closer to
GNU grep...
it was VERY simple to do... and attached is the patch... this uses the
option REG_STARTEND to do what the copy was trying to do... all of the
code to use REG_STARTEND was already there, it just needed to be enabled..
enjoy!
--
John-Mark Gurney Voice: +1 541 684 8449
Cu Networking P.O. Box 5693, 97405
"The soul contains in itself the event that shall presently befall it.
The event is only the actualizing of its thought." -- Ralph Waldo Emerson
[-- Attachment #2 --]
diff -u grep-0.10.orig/util.c grep-0.10/util.c
--- grep-0.10.orig/util.c Thu Jul 29 05:00:15 1999
+++ grep-0.10/util.c Thu Jul 29 16:38:06 1999
@@ -93,7 +93,6 @@
file_t *f;
str_t ln;
int c, t, z;
- char *tmp;
if (fn == NULL) {
fn = "(standard input)";
@@ -119,13 +118,8 @@
initqueue();
for (c = 0; !(lflag && c);) {
ln.off = grep_tell(f);
- if ((tmp = grep_fgetln(f, &ln.len)) == NULL)
+ if ((ln.dat = grep_fgetln(f, &ln.len)) == NULL)
break;
- ln.dat = grep_malloc(ln.len + 1);
- memcpy(ln.dat, tmp, ln.len);
- ln.dat[ln.len] = 0;
- if (ln.len > 0 && ln.dat[ln.len - 1] == '\n')
- ln.dat[--ln.len] = 0;
ln.line_no++;
z = tail;
@@ -133,7 +127,6 @@
enqueue(&ln);
linesqueued++;
}
- free(ln.dat);
c += t;
}
if (Bflag > 0)
@@ -174,7 +167,8 @@
pmatch.rm_so = 0;
pmatch.rm_eo = l->len;
for (c = i = 0; i < patterns; i++) {
- r = regexec(&r_pattern[i], l->dat, 0, &pmatch, eflags);
+ r = regexec(&r_pattern[i], l->dat, 0, &pmatch,
+ eflags | REG_STARTEND);
if (r == REG_NOMATCH && t == 0)
continue;
if (wflag && r == 0) {
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990729164533.36798>
