Date: Wed, 18 Jun 2008 13:25:34 -0400 From: Coleman Kane <cokane@FreeBSD.org> To: Doug Barton <dougb@FreeBSD.org> Cc: Diomidis Spinellis <dds@aueb.gr>, pfgshield-freebsd@yahoo.com, Gabor Kovesdan <gabor@FreeBSD.org>, current@FreeBSD.org Subject: Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo] Message-ID: <1213809934.68025.39.camel@localhost> In-Reply-To: <48593673.8030607@FreeBSD.org> References: <188001.76281.qm@web32705.mail.mud.yahoo.com> <48593673.8030607@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-WEHWmX0ohlt/WrLJpi3u Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, 2008-06-18 at 09:23 -0700, Doug Barton wrote: > pfgshield-freebsd@yahoo.com wrote: > > On 2008-06-17, Gabor Kovesdan wrote: > >>> egrep: empty (sub)expression > >>>=20 > >> I've looked at this and I have a patch with a workaround:=20 > >> http://kovesdan.org/patches/grep.dougb.diff > >=20 > > This general behavior should be considered a GNUism, the type of > > thing that we want to avoid in scripts. Perhaps, if it's too > > common, the workaround should be ifdef'd into a GNU compatible mode > > or something. >=20 > No, "here is a new bsd grep, but change all your scripts that have=20 > relied on legitimate behavior of grep in the past" is a non-starter. I=20 > haven't checked POSIX to see whether the particular construction I am=20 > using fits the spec or not, but I really don't care. We need to be=20 > feature-compatible with the GNU versions of these tools, both for=20 > legacy scripts and also for those users migrating over from linux. >=20 > Doug >=20 I think that this is the second time that I've seen this type of issue come up (oh no! A GNU-ism sneaks in! Oh, the horror! The horror!). Seriously, I really see this attitude as counter-productive to getting wider acceptance of the project. I don't think FreeBSD's (or any BSD's) mission statement ever involved "ridding the *NIX world of the decidedly impure habits of the GNU system". I always thought it was more along the lines of "lets make an OS, that is better than the GNU system, and free-er too". The above statement, critical of GNU grep, does not offer any insight into what "the right way" (read: not the GNU way) is. Such "proclamations" are justifiably going to elicit the responses along the lines of Doug's response. Honestly, in this case you can either choose to pick up BSD grep yourself and do the work, or not drop in to chastise someone for mirroring a "GNU-ism" behavior. How about coming up with a script that parses all these "GNU-ism offenders" and corrects the GNU-isms into POSIX-compliant behavior?=20 =EF=BB=BF How about at least explaining what you think the proper way to do it using our grep should be? Here's my take on it: * We reject "||", "(|", and "|)" (via our regex lib) right now * GNU grep seems to interpret that as inclusive-or '' (which causes the OR to match an "empty string" for that field, and basically allow it to not exist and still match). * We want to remove the empty disjunction with this patch? The above seems to indicate that even the posted fix is not going to operate the same as it does under GNU... From a logical point-of-view, a disjunction that contains "empty" should match an empty, so in that sense the GNU grep behavior has a logical basis. For example, take five strings { "aba", "ab", "abc", "ac", "an" } The GNU pattern "^a(b|)c$" could match "abc" and "ac" because the "empty string was explicitly matched". The solution proposed, however, would only match "abc". Logically, there's a basis for performing the GNU method (even in my Automata course). So perhaps try explaining rather than shooting down. Just because GNU does it doesn't mean it should be immediately tossed out (and *must* be wrong). The people who develop Linux and the GNU tools generally are a perfectly competent and intelligent bunch, even though their ideology may differ. If that weren't the case, then we wouldn't be using their grep in our base system in the first place. If we want to remove the behavior from our system, we'll need to find all cases of (a|b||c) and convert them into (a|b|c){0,1} (if the latter is even possible on BSD grep). Only if the expression is (a|b||c)* can it safely become (a|b|c). --=20 Coleman Kane --=-WEHWmX0ohlt/WrLJpi3u Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEABECAAYFAkhZRPkACgkQcMSxQcXat5f6TACeMJGASG6KLi74V+ngmq+oYUm3 1ZgAnjSBNWanKDn5ZCI9GidZQEarNKfs =dOAQ -----END PGP SIGNATURE----- --=-WEHWmX0ohlt/WrLJpi3u--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1213809934.68025.39.camel>