Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jun 2008 13:25:34 -0400
From:      Coleman Kane <cokane@FreeBSD.org>
To:        Doug Barton <dougb@FreeBSD.org>
Cc:        Diomidis Spinellis <dds@aueb.gr>, pfgshield-freebsd@yahoo.com, Gabor Kovesdan <gabor@FreeBSD.org>, current@FreeBSD.org
Subject:   Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Message-ID:  <1213809934.68025.39.camel@localhost>
In-Reply-To: <48593673.8030607@FreeBSD.org>
References:  <188001.76281.qm@web32705.mail.mud.yahoo.com> <48593673.8030607@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--=-WEHWmX0ohlt/WrLJpi3u
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Wed, 2008-06-18 at 09:23 -0700, Doug Barton wrote:
> pfgshield-freebsd@yahoo.com wrote:
> > On 2008-06-17, Gabor Kovesdan wrote:
> >>> egrep: empty (sub)expression
> >>>=20
> >> I've looked at this and I have a patch with a workaround:=20
> >> http://kovesdan.org/patches/grep.dougb.diff
> >=20
> > This general behavior should be considered a GNUism, the type of
> > thing that we want to avoid in scripts. Perhaps, if it's too
> > common, the workaround should be ifdef'd into a GNU compatible mode
> > or something.
>=20
> No, "here is a new bsd grep, but change all your scripts that have=20
> relied on legitimate behavior of grep in the past" is a non-starter. I=20
> haven't checked POSIX to see whether the particular construction I am=20
> using fits the spec or not, but I really don't care. We need to be=20
> feature-compatible with the GNU versions of these tools, both for=20
> legacy scripts and also for those users migrating over from linux.
>=20
> Doug
>=20

I think that this is the second time that I've seen this type of issue
come up (oh no! A GNU-ism sneaks in! Oh, the horror! The horror!).

Seriously, I really see this attitude as counter-productive to getting
wider acceptance of the project. I don't think FreeBSD's (or any BSD's)
mission statement ever involved "ridding the *NIX world of the decidedly
impure habits of the GNU system". I always thought it was more along the
lines of "lets make an OS, that is better than the GNU system, and
free-er too".

The above statement, critical of GNU grep, does not offer any insight
into what "the right way" (read: not the GNU way) is. Such
"proclamations" are justifiably going to elicit the responses along the
lines of Doug's response.

Honestly, in this case you can either choose to pick up BSD grep
yourself and do the work, or not drop in to chastise someone for
mirroring a "GNU-ism" behavior. How about coming up with a script that
parses all these "GNU-ism offenders" and corrects the GNU-isms into
POSIX-compliant behavior?=20
=EF=BB=BF
How about at least explaining what you think the proper way to do it
using our grep should be?

Here's my take on it:
  * We reject "||", "(|", and "|)" (via our regex lib) right now
  * GNU grep seems to interpret that as inclusive-or '' (which causes
the OR to match an "empty string" for that field, and basically allow it
to not exist and still match).
  * We want to remove the empty disjunction with this patch?

The above seems to indicate that even the posted fix is not going to
operate the same as it does under GNU... From a logical point-of-view, a
disjunction that contains "empty" should match an empty, so in that
sense the GNU grep behavior has a logical basis.

For example, take five strings { "aba", "ab", "abc", "ac", "an" }

The GNU pattern "^a(b|)c$" could match "abc" and "ac" because the "empty
string was explicitly matched". The solution proposed, however, would
only match "abc". Logically, there's a basis for performing the GNU
method (even in my Automata course).

So perhaps try explaining rather than shooting down. Just because GNU
does it doesn't mean it should be immediately tossed out (and *must* be
wrong). The people who develop Linux and the GNU tools generally are a
perfectly competent and intelligent bunch, even though their ideology
may differ. If that weren't the case, then we wouldn't be using their
grep in our base system in the first place.

If we want to remove the behavior from our system, we'll need to find
all cases of (a|b||c) and convert them into (a|b|c){0,1} (if the latter
is even possible on BSD grep). Only if the expression is (a|b||c)* can
it safely become (a|b|c).

--=20
Coleman Kane

--=-WEHWmX0ohlt/WrLJpi3u
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEABECAAYFAkhZRPkACgkQcMSxQcXat5f6TACeMJGASG6KLi74V+ngmq+oYUm3
1ZgAnjSBNWanKDn5ZCI9GidZQEarNKfs
=dOAQ
-----END PGP SIGNATURE-----

--=-WEHWmX0ohlt/WrLJpi3u--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1213809934.68025.39.camel>