Date: Sat, 15 Apr 2017 18:18:08 +0200 From: Baptiste Daroussin <bapt@FreeBSD.org> To: Kyle Evans <kevans91@ksu.edu> Cc: freebsd-hackers@freebsd.org, Pedro Giffuni <pfg@freebsd.org>, Ed Maste <emaste@freebsd.org> Subject: Re: Replacing libgnuregex Message-ID: <20170415161808.rqcq44qcfyrrrrdg@ivaldir.net> In-Reply-To: <CACNAnaHRi4RH4Staf6ZT5%2B1_ZqSBAR6shOd2=nYt3K9_A5kKZQ@mail.gmail.com> References: <CACNAnaEmBjWudEJwvRTSqyciOp7-oRbCEQ_e6qtGsap0oHQ4yw@mail.gmail.com> <CACNAnaGOLVKR7Y4uzhuS7EB5-UMb3tS9yKL4Srn8knThk0o1kg@mail.gmail.com> <CACNAnaHRi4RH4Staf6ZT5%2B1_ZqSBAR6shOd2=nYt3K9_A5kKZQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--zty5vwucofg7xgsw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 15, 2017 at 01:02:42AM -0500, Kyle Evans wrote: > On Fri, Apr 14, 2017 at 1:55 PM, Kyle Evans <kevans91@ksu.edu> wrote: >=20 > > On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans <kevans91@ksu.edu> wrote: > > > >> > >> On the other hand, I think I could fairly easily implement most of the= se > >> into libc/regex. Here's a summary of what this option entails adding to > >> libc/regex, from what I've found: > >> > >> * Empty subexpressions(*) > >> * Add missing quantifiers to BREs: \?, \+ > >> * Add branching to BREs: \| > >> * Add backreferences (\1 through \9) to EREs > >> * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]], > >> [[:space:]], and [^[:space:]] respectively > >> * Add word boundaries and anchors: > >> ** \b: word boundary > >> ** \B: not word boundary > >> ** \<: Strt of word > >> ** \>: End of word > >> ** \`: Start of subject string > >> ** \': End of subject string > >> > >> (*) I didn't actually find anything explicitly stating this as a GNU > >> extension, but it's certainly not conformant to POSIX specifications to > >> use, it gets used a tiny bit in some ports, and we implement a workaro= und > >> in bsdgrep(1) for the simplest case of empty expressions ("") to match > >> everything and produce zero length matches. > >> > >> The main benefit of this is not having to maintain a completely separa= te > >> regex parser and the potential for inconsistencies that come along wit= h it. > >> The downside is that that would seem to promote expressions that are n= ot > >> strictly POSIX conformant. Is this a problem? Is this a problem worth > >> worrying about? > >> > >> > > FYI- A patch showing what the implementation for all of the above into > > libc/regex looks like [1]. Some cleanup is still in order and the test = set > > is not exhaustive, but this should implement all of the GNU extensions = and > > it's at least functional. > > > > It will break some things (like one of the tests, for instance) that > > relied on being able to escape an ordinary character (e.g. \b) and get = an > > ordinary character. This is specified as producing undefined behavior [= 2], > > though, so I don't feel terrible about breaking it. > > > > If this seems desirable, I can work on cleaning it up and splitting it > > into more consumable bites for FreeBSD's libc. > > > > Thanks, > > > > Kyle Evans > > > > [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff > > [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/ > > xbd_chap09.html#tag_09_03_03 > > >=20 > An amended version of this patch can be found here: > https://files.kyle-evans.net/freebsd/libc-gnuext-2.diff >=20 > This one introduces a REG_POSIX flag for regcomp(3) that removes the GNU > extension for a more POSIX conformant implementation along with an > amendment to regex.3 to document said flag. >=20 > Instead of removing the tests that don't fail like they should under GNU > extensions, I've restored them and added a 'P' flag to specify REG_POSIX > and marked the failing tests as such to clearly denote that they require a > more strict implementation. >=20 > Thanks, >=20 Thanks for working on this Just to follow up on this: Have you tested the results with the AT&T testsuite for regex? You can find it at least in the dragonfly source tree: https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/abce74f49c2c19b069= 958a0b48de0a9987d14e35 Or online I don't remember where :) another approach would be to import libtre + extension in our libc (like it= was done on dragonfly - it was actually a freebsd project that stalled) Best regards, Bapt --zty5vwucofg7xgsw Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEgOTj3suS2urGXVU3Y4mL3PG3PloFAljyR74ACgkQY4mL3PG3 Plp6JA/9HEeUfT4DYLJ9OcHaPwi/5tf54S9iOZD8waD7MtDdydtK9Hghn93rDN6q 4Cxkm1ab0qXnYfFCJqwg2o5jHvmP5RG1a1EkW4OGe0/QUluvVM2bitr7v5BC1IhI Ngrd3xZebLA6ce5KloSnuFxUWrT46CYlcKPWCwCOsXoP+tCRmEYdy5+fnVHACwlO PJtR9xGysEJmow+ZWWL6FByHfui/5Wz5hlztD5T72f8/Y4xYpHQ+HisRrTmRm8TA sxNMHkmffXmuq9wJZY+Pz10ucGkQzS2LjWYfKzN7UcHhqfpLS3GA0II1wqF9rowa RxdDTOl1SsGh5DxEkqP/hepuX5TItLL95G6N7zBmB2m+6qcWVGTINKw1CMT8wVng GeGQElR/lM3qlE8C+jj0uq0RLm33d+7weQle4oiPUScKPf6/CGwDuntHkiU8oe2+ yn8LdBNHjuXQcPkmVz34IWEnAo45ZCTuyK8ebJifjPjZEn3cSVS1TG3HARdF3QKJ e/2pWrwXaA7KXXeW5wA3HamJlcBCIbQ6DKwrKEyJUfavsjp4qmJ/sbE3ok7cM9qY oGLTJsI7YI1KdDneFiL32zzDmPv0uMj8pLTLwzvmVvzKiWw13yBweA96YEbx1+pf TPLUOLeYZhaDG9kkyZCVW9ZtSRzfupKfpC49yhsS9TQg65vdAWk= =oRT7 -----END PGP SIGNATURE----- --zty5vwucofg7xgsw--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170415161808.rqcq44qcfyrrrrdg>