Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Apr 2017 18:18:08 +0200
From:      Baptiste Daroussin <bapt@FreeBSD.org>
To:        Kyle Evans <kevans91@ksu.edu>
Cc:        freebsd-hackers@freebsd.org, Pedro Giffuni <pfg@freebsd.org>, Ed Maste <emaste@freebsd.org>
Subject:   Re: Replacing libgnuregex
Message-ID:  <20170415161808.rqcq44qcfyrrrrdg@ivaldir.net>
In-Reply-To: <CACNAnaHRi4RH4Staf6ZT5%2B1_ZqSBAR6shOd2=nYt3K9_A5kKZQ@mail.gmail.com>
References:  <CACNAnaEmBjWudEJwvRTSqyciOp7-oRbCEQ_e6qtGsap0oHQ4yw@mail.gmail.com> <CACNAnaGOLVKR7Y4uzhuS7EB5-UMb3tS9yKL4Srn8knThk0o1kg@mail.gmail.com> <CACNAnaHRi4RH4Staf6ZT5%2B1_ZqSBAR6shOd2=nYt3K9_A5kKZQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--zty5vwucofg7xgsw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Apr 15, 2017 at 01:02:42AM -0500, Kyle Evans wrote:
> On Fri, Apr 14, 2017 at 1:55 PM, Kyle Evans <kevans91@ksu.edu> wrote:
>=20
> > On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans <kevans91@ksu.edu> wrote:
> >
> >>
> >> On the other hand, I think I could fairly easily implement most of the=
se
> >> into libc/regex. Here's a summary of what this option entails adding to
> >> libc/regex, from what I've found:
> >>
> >> * Empty subexpressions(*)
> >> * Add missing quantifiers to BREs: \?, \+
> >> * Add branching to BREs: \|
> >> * Add backreferences (\1 through \9) to EREs
> >> * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]],
> >> [[:space:]], and [^[:space:]] respectively
> >> * Add word boundaries and anchors:
> >> ** \b: word boundary
> >> ** \B: not word boundary
> >> ** \<: Strt of word
> >> ** \>: End of word
> >> ** \`: Start of subject string
> >> ** \': End of subject string
> >>
> >> (*) I didn't actually find anything explicitly stating this as a GNU
> >> extension, but it's certainly not conformant to POSIX specifications to
> >> use, it gets used a tiny bit in some ports, and we implement a workaro=
und
> >> in bsdgrep(1) for the simplest case of empty expressions ("") to match
> >> everything and produce zero length matches.
> >>
> >> The main benefit of this is not having to maintain a completely separa=
te
> >> regex parser and the potential for inconsistencies that come along wit=
h it.
> >> The downside is that that would seem to promote expressions that are n=
ot
> >> strictly POSIX conformant. Is this a problem? Is this a problem worth
> >> worrying about?
> >>
> >>
> > FYI- A patch showing what the implementation for all of the above into
> > libc/regex looks like [1]. Some cleanup is still in order and the test =
set
> > is not exhaustive, but this should implement all of the GNU extensions =
and
> > it's at least functional.
> >
> > It will break some things (like one of the tests, for instance) that
> > relied on being able to escape an ordinary character (e.g. \b) and get =
an
> > ordinary character. This is specified as producing undefined behavior [=
2],
> > though, so I don't feel terrible about breaking it.
> >
> > If this seems desirable, I can work on cleaning it up and splitting it
> > into more consumable bites for FreeBSD's libc.
> >
> > Thanks,
> >
> > Kyle Evans
> >
> > [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff
> > [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/
> > xbd_chap09.html#tag_09_03_03
> >
>=20
> An amended version of this patch can be found here:
> https://files.kyle-evans.net/freebsd/libc-gnuext-2.diff
>=20
> This one introduces a REG_POSIX flag for regcomp(3) that removes the GNU
> extension for a more POSIX conformant implementation along with an
> amendment to regex.3 to document said flag.
>=20
> Instead of removing the tests that don't fail like they should under GNU
> extensions, I've restored them and added a 'P' flag to specify REG_POSIX
> and marked the failing tests as such to clearly denote that they require a
> more strict implementation.
>=20
> Thanks,
>=20

Thanks for working on this

Just to follow up on this:

Have you tested the results with the AT&T testsuite for regex?

You can find it at least in the dragonfly source tree:
https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/abce74f49c2c19b069=
958a0b48de0a9987d14e35

Or online I don't remember where :)

another approach would be to import libtre + extension in our libc (like it=
 was
done on dragonfly - it was actually a freebsd project that stalled)

Best regards,
Bapt

--zty5vwucofg7xgsw
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEgOTj3suS2urGXVU3Y4mL3PG3PloFAljyR74ACgkQY4mL3PG3
Plp6JA/9HEeUfT4DYLJ9OcHaPwi/5tf54S9iOZD8waD7MtDdydtK9Hghn93rDN6q
4Cxkm1ab0qXnYfFCJqwg2o5jHvmP5RG1a1EkW4OGe0/QUluvVM2bitr7v5BC1IhI
Ngrd3xZebLA6ce5KloSnuFxUWrT46CYlcKPWCwCOsXoP+tCRmEYdy5+fnVHACwlO
PJtR9xGysEJmow+ZWWL6FByHfui/5Wz5hlztD5T72f8/Y4xYpHQ+HisRrTmRm8TA
sxNMHkmffXmuq9wJZY+Pz10ucGkQzS2LjWYfKzN7UcHhqfpLS3GA0II1wqF9rowa
RxdDTOl1SsGh5DxEkqP/hepuX5TItLL95G6N7zBmB2m+6qcWVGTINKw1CMT8wVng
GeGQElR/lM3qlE8C+jj0uq0RLm33d+7weQle4oiPUScKPf6/CGwDuntHkiU8oe2+
yn8LdBNHjuXQcPkmVz34IWEnAo45ZCTuyK8ebJifjPjZEn3cSVS1TG3HARdF3QKJ
e/2pWrwXaA7KXXeW5wA3HamJlcBCIbQ6DKwrKEyJUfavsjp4qmJ/sbE3ok7cM9qY
oGLTJsI7YI1KdDneFiL32zzDmPv0uMj8pLTLwzvmVvzKiWw13yBweA96YEbx1+pf
TPLUOLeYZhaDG9kkyZCVW9ZtSRzfupKfpC49yhsS9TQg65vdAWk=
=oRT7
-----END PGP SIGNATURE-----

--zty5vwucofg7xgsw--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170415161808.rqcq44qcfyrrrrdg>