Date: Fri, 14 Apr 2017 13:55:44 -0500 From: Kyle Evans <kevans91@ksu.edu> To: <freebsd-hackers@freebsd.org> Cc: Pedro Giffuni <pfg@freebsd.org>, Ed Maste <emaste@freebsd.org> Subject: Re: Replacing libgnuregex Message-ID: <CACNAnaGOLVKR7Y4uzhuS7EB5-UMb3tS9yKL4Srn8knThk0o1kg@mail.gmail.com> In-Reply-To: <CACNAnaEmBjWudEJwvRTSqyciOp7-oRbCEQ_e6qtGsap0oHQ4yw@mail.gmail.com> References: <CACNAnaEmBjWudEJwvRTSqyciOp7-oRbCEQ_e6qtGsap0oHQ4yw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 11, 2017 at 3:20 PM, Kyle Evans <kevans91@ksu.edu> wrote: > > On the other hand, I think I could fairly easily implement most of these > into libc/regex. Here's a summary of what this option entails adding to > libc/regex, from what I've found: > > * Empty subexpressions(*) > * Add missing quantifiers to BREs: \?, \+ > * Add branching to BREs: \| > * Add backreferences (\1 through \9) to EREs > * Add \w, \W, \s, and \S corresponding to [[:alnum:]], [^[:alnum:]], > [[:space:]], and [^[:space:]] respectively > * Add word boundaries and anchors: > ** \b: word boundary > ** \B: not word boundary > ** \<: Strt of word > ** \>: End of word > ** \`: Start of subject string > ** \': End of subject string > > (*) I didn't actually find anything explicitly stating this as a GNU > extension, but it's certainly not conformant to POSIX specifications to > use, it gets used a tiny bit in some ports, and we implement a workaround > in bsdgrep(1) for the simplest case of empty expressions ("") to match > everything and produce zero length matches. > > The main benefit of this is not having to maintain a completely separate > regex parser and the potential for inconsistencies that come along with it. > The downside is that that would seem to promote expressions that are not > strictly POSIX conformant. Is this a problem? Is this a problem worth > worrying about? > > FYI- A patch showing what the implementation for all of the above into libc/regex looks like [1]. Some cleanup is still in order and the test set is not exhaustive, but this should implement all of the GNU extensions and it's at least functional. It will break some things (like one of the tests, for instance) that relied on being able to escape an ordinary character (e.g. \b) and get an ordinary character. This is specified as producing undefined behavior [2], though, so I don't feel terrible about breaking it. If this seems desirable, I can work on cleaning it up and splitting it into more consumable bites for FreeBSD's libc. Thanks, Kyle Evans [1] http://files.kyle-evans.net/freebsd/libc-gnuext.diff [2] http://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap09.html#tag_09_03_03
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaGOLVKR7Y4uzhuS7EB5-UMb3tS9yKL4Srn8knThk0o1kg>