Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Dec 2020 16:22:46 -0600
From:      Kyle Evans <kevans@freebsd.org>
Cc:        src-committers <src-committers@freebsd.org>, svn-src-all <svn-src-all@freebsd.org>,  svn-src-head <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r368439 - head/share/mk
Message-ID:  <CACNAnaHiOw=U3BXcBvqa-cU0_nq4GGxGPh%2Bk=i5qV0GWoNAGsw@mail.gmail.com>
In-Reply-To: <202012081405.0B8E5PJM029095@repo.freebsd.org>
References:  <202012081405.0B8E5PJM029095@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 8, 2020 at 8:05 AM Kyle Evans <kevans@freebsd.org> wrote:
>
> Author: kevans
> Date: Tue Dec  8 14:05:25 2020
> New Revision: 368439
> URL: https://svnweb.freebsd.org/changeset/base/368439
>
> Log:
>   src.opts.mk: switch to bsdgrep as /usr/bin/grep
>
> [.. snip ...]
>
>   I have some WIP to make bsdgrep faster, but do not consider it a blocker
>   when compared to the pros of switching now (aforementioned bugs, licensing).
>
> [.. snip ...]

I was asked to collect some stats from that patch to speed up bsdgrep;
while the patch isn't ready yet, I decided to do a (really really)
rough comparison between gnugrep/bsdgrep as well to follow-up on the
speed aspect and perhaps provide a baseline.

You can view the results of those comparisons (user time(1) output),
which felt 'representative enough' of the difference, here:
https://people.freebsd.org/~kevans/stable/grep-stats.txt

Some notes, to help with interpretation:
- This hardware is not great
- All runs were doing a recursive grep from the root of a non-active
base/head checkout, -I was not specified, in search of instances of
the same pattern (but actually literal)
- ${grep}-non == ${grep} -r 'closefrom' .
- ${grep}-n == ${grep} -nr 'closefrom' .
- ${grep}-c8 == ${grep} -rC8 'closefrom' .

The sampling was low enough quality that we can probably just discard
all of this, but I found the final two comparisons (gnugrep vs.
gnugrep -n vs. gnugrep -C8 and bsdgrep vs. bsdgrep -n vs. bsdgrep -C8)
interesting enough that I decided to share this despite the quality.
Here are the key points that I find interesting:

gnugrep sees a pretty significant difference from the baseline to
either of the other two modes. This was expected to some extent- both
-n and -C8 will imply some level of line tracking when you're taking
the chunked search approach, as you need to count lines even in chunks
that don't have any matches for -n and you might even need to do the
same for -C8.

I think the much smaller difference between the gnugrep baseline and
-C8 indicates that they probably don't take the simple/slow approach
of counting all newlines to determine that you have 8 and where the
8th prior started, but instead wait for a match then start
backtracking.

The surprising part about the bsdgrep comparison was that there is
significant slowdown when we're checking context. There is almost
certainly room for improvement there.

Thanks,

Kyle Evans



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaHiOw=U3BXcBvqa-cU0_nq4GGxGPh%2Bk=i5qV0GWoNAGsw>