Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Apr 2017 01:02:42 -0500
From:      Kyle Evans <kevans91@ksu.edu>
To:        <freebsd-hackers@freebsd.org>
Cc:        Pedro Giffuni <pfg@freebsd.org>, Ed Maste <emaste@freebsd.org>
Subject:   Re: Replacing libgnuregex
Message-ID:  <CACNAnaHRi4RH4Staf6ZT5%2B1_ZqSBAR6shOd2=nYt3K9_A5kKZQ@mail.gmail.com>
In-Reply-To: <CACNAnaGOLVKR7Y4uzhuS7EB5-UMb3tS9yKL4Srn8knThk0o1kg@mail.gmail.com>
References:  <CACNAnaEmBjWudEJwvRTSqyciOp7-oRbCEQ_e6qtGsap0oHQ4yw@mail.gmail.com> <CACNAnaGOLVKR7Y4uzhuS7EB5-UMb3tS9yKL4Srn8knThk0o1kg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 14, 2017 at 1:55 PM, Kyle Evans <kevans91@ksu.edu> wrote:

> On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans <kevans91@ksu.edu> wrote:
>
>>
>> On the other hand, I think I could fairly easily implement most of these
>> into libc/regex. Here's a summary of what this option entails adding to
>> libc/regex, from what I've found:
>>
>> * Empty subexpressions(*)
>> * Add missing quantifiers to BREs: \?, \+
>> * Add branching to BREs: \|
>> * Add backreferences (\1 through \9) to EREs
>> * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]],
>> [[:space:]], and [^[:space:]] respectively
>> * Add word boundaries and anchors:
>> ** \b: word boundary
>> ** \B: not word boundary
>> ** \<: Strt of word
>> ** \>: End of word
>> ** \`: Start of subject string
>> ** \': End of subject string
>>
>> (*) I didn't actually find anything explicitly stating this as a GNU
>> extension, but it's certainly not conformant to POSIX specifications to
>> use, it gets used a tiny bit in some ports, and we implement a workaround
>> in bsdgrep(1) for the simplest case of empty expressions ("") to match
>> everything and produce zero length matches.
>>
>> The main benefit of this is not having to maintain a completely separate
>> regex parser and the potential for inconsistencies that come along with it.
>> The downside is that that would seem to promote expressions that are not
>> strictly POSIX conformant. Is this a problem? Is this a problem worth
>> worrying about?
>>
>>
> FYI- A patch showing what the implementation for all of the above into
> libc/regex looks like [1]. Some cleanup is still in order and the test set
> is not exhaustive, but this should implement all of the GNU extensions and
> it's at least functional.
>
> It will break some things (like one of the tests, for instance) that
> relied on being able to escape an ordinary character (e.g. \b) and get an
> ordinary character. This is specified as producing undefined behavior [2],
> though, so I don't feel terrible about breaking it.
>
> If this seems desirable, I can work on cleaning it up and splitting it
> into more consumable bites for FreeBSD's libc.
>
> Thanks,
>
> Kyle Evans
>
> [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff
> [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/
> xbd_chap09.html#tag_09_03_03
>

An amended version of this patch can be found here:
https://files.kyle-evans.net/freebsd/libc-gnuext-2.diff

This one introduces a REG_POSIX flag for regcomp(3) that removes the GNU
extension for a more POSIX conformant implementation along with an
amendment to regex.3 to document said flag.

Instead of removing the tests that don't fail like they should under GNU
extensions, I've restored them and added a 'P' flag to specify REG_POSIX
and marked the failing tests as such to clearly denote that they require a
more strict implementation.

Thanks,

Kyle Evans



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaHRi4RH4Staf6ZT5%2B1_ZqSBAR6shOd2=nYt3K9_A5kKZQ>