From owner-freebsd-hackers@freebsd.org Sat Apr 15 16:18:10 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 47CE4D3FF79 for ; Sat, 15 Apr 2017 16:18:10 +0000 (UTC) (envelope-from bapt@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 26FF8E2F; Sat, 15 Apr 2017 16:18:10 +0000 (UTC) (envelope-from bapt@FreeBSD.org) Received: by freefall.freebsd.org (Postfix, from userid 1235) id 4EE3E732D; Sat, 15 Apr 2017 16:18:09 +0000 (UTC) Date: Sat, 15 Apr 2017 18:18:08 +0200 From: Baptiste Daroussin To: Kyle Evans Cc: freebsd-hackers@freebsd.org, Pedro Giffuni , Ed Maste Subject: Re: Replacing libgnuregex Message-ID: <20170415161808.rqcq44qcfyrrrrdg@ivaldir.net> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="zty5vwucofg7xgsw" Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170306 (1.8.0) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Apr 2017 16:18:10 -0000 --zty5vwucofg7xgsw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 15, 2017 at 01:02:42AM -0500, Kyle Evans wrote: > On Fri, Apr 14, 2017 at 1:55 PM, Kyle Evans wrote: >=20 > > On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans wrote: > > > >> > >> On the other hand, I think I could fairly easily implement most of the= se > >> into libc/regex. Here's a summary of what this option entails adding to > >> libc/regex, from what I've found: > >> > >> * Empty subexpressions(*) > >> * Add missing quantifiers to BREs: \?, \+ > >> * Add branching to BREs: \| > >> * Add backreferences (\1 through \9) to EREs > >> * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]], > >> [[:space:]], and [^[:space:]] respectively > >> * Add word boundaries and anchors: > >> ** \b: word boundary > >> ** \B: not word boundary > >> ** \<: Strt of word > >> ** \>: End of word > >> ** \`: Start of subject string > >> ** \': End of subject string > >> > >> (*) I didn't actually find anything explicitly stating this as a GNU > >> extension, but it's certainly not conformant to POSIX specifications to > >> use, it gets used a tiny bit in some ports, and we implement a workaro= und > >> in bsdgrep(1) for the simplest case of empty expressions ("") to match > >> everything and produce zero length matches. > >> > >> The main benefit of this is not having to maintain a completely separa= te > >> regex parser and the potential for inconsistencies that come along wit= h it. > >> The downside is that that would seem to promote expressions that are n= ot > >> strictly POSIX conformant. Is this a problem? Is this a problem worth > >> worrying about? > >> > >> > > FYI- A patch showing what the implementation for all of the above into > > libc/regex looks like [1]. Some cleanup is still in order and the test = set > > is not exhaustive, but this should implement all of the GNU extensions = and > > it's at least functional. > > > > It will break some things (like one of the tests, for instance) that > > relied on being able to escape an ordinary character (e.g. \b) and get = an > > ordinary character. This is specified as producing undefined behavior [= 2], > > though, so I don't feel terrible about breaking it. > > > > If this seems desirable, I can work on cleaning it up and splitting it > > into more consumable bites for FreeBSD's libc. > > > > Thanks, > > > > Kyle Evans > > > > [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff > > [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/ > > xbd_chap09.html#tag_09_03_03 > > >=20 > An amended version of this patch can be found here: > https://files.kyle-evans.net/freebsd/libc-gnuext-2.diff >=20 > This one introduces a REG_POSIX flag for regcomp(3) that removes the GNU > extension for a more POSIX conformant implementation along with an > amendment to regex.3 to document said flag. >=20 > Instead of removing the tests that don't fail like they should under GNU > extensions, I've restored them and added a 'P' flag to specify REG_POSIX > and marked the failing tests as such to clearly denote that they require a > more strict implementation. >=20 > Thanks, >=20 Thanks for working on this Just to follow up on this: Have you tested the results with the AT&T testsuite for regex? You can find it at least in the dragonfly source tree: https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/abce74f49c2c19b069= 958a0b48de0a9987d14e35 Or online I don't remember where :) another approach would be to import libtre + extension in our libc (like it= was done on dragonfly - it was actually a freebsd project that stalled) Best regards, Bapt --zty5vwucofg7xgsw Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEgOTj3suS2urGXVU3Y4mL3PG3PloFAljyR74ACgkQY4mL3PG3 Plp6JA/9HEeUfT4DYLJ9OcHaPwi/5tf54S9iOZD8waD7MtDdydtK9Hghn93rDN6q 4Cxkm1ab0qXnYfFCJqwg2o5jHvmP5RG1a1EkW4OGe0/QUluvVM2bitr7v5BC1IhI Ngrd3xZebLA6ce5KloSnuFxUWrT46CYlcKPWCwCOsXoP+tCRmEYdy5+fnVHACwlO PJtR9xGysEJmow+ZWWL6FByHfui/5Wz5hlztD5T72f8/Y4xYpHQ+HisRrTmRm8TA sxNMHkmffXmuq9wJZY+Pz10ucGkQzS2LjWYfKzN7UcHhqfpLS3GA0II1wqF9rowa RxdDTOl1SsGh5DxEkqP/hepuX5TItLL95G6N7zBmB2m+6qcWVGTINKw1CMT8wVng GeGQElR/lM3qlE8C+jj0uq0RLm33d+7weQle4oiPUScKPf6/CGwDuntHkiU8oe2+ yn8LdBNHjuXQcPkmVz34IWEnAo45ZCTuyK8ebJifjPjZEn3cSVS1TG3HARdF3QKJ e/2pWrwXaA7KXXeW5wA3HamJlcBCIbQ6DKwrKEyJUfavsjp4qmJ/sbE3ok7cM9qY oGLTJsI7YI1KdDneFiL32zzDmPv0uMj8pLTLwzvmVvzKiWw13yBweA96YEbx1+pf TPLUOLeYZhaDG9kkyZCVW9ZtSRzfupKfpC49yhsS9TQg65vdAWk= =oRT7 -----END PGP SIGNATURE----- --zty5vwucofg7xgsw--