Date: Fri, 31 Jul 2020 08:41:28 -0500 From: Kyle Evans <kevans@freebsd.org> To: Li-Wen Hsu <lwhsu@freebsd.org> Cc: src-committers <src-committers@freebsd.org>, svn-src-all <svn-src-all@freebsd.org>, svn-src-head <svn-src-head@freebsd.org>, Ngie Cooper <ngie@freebsd.org>, Alan Somers <asomers@freebsd.org> Subject: Re: svn commit: r363679 - in head: contrib/netbsd-tests/lib/libc/regex/data lib/libc/regex Message-ID: <CACNAnaHENNX2uDZ2btHyTHFPnH5gSJCnDy0Qbz6Rtie1r%2BCxBw@mail.gmail.com> In-Reply-To: <CAKBkRUygxHy%2B8Tqkoh0xY-yB93du=6ReB5ugXMF4H-KkVofaJg@mail.gmail.com> References: <202007292321.06TNLuoq087451@repo.freebsd.org> <CAKBkRUy%2BTvK6L2iRaixyPB6-OQCkHLWgo5QLiRJV1Qx9c-Md_w@mail.gmail.com> <CACNAnaHxSZAD5qs8y9DxBxdNuipoB1%2BEy6vQaWjWoBfgMoav-Q@mail.gmail.com> <CACNAnaGm9s8XKOxO0-7=civDDmmzZL9kenuLyV79H62LJnujUg@mail.gmail.com> <CAKBkRUygxHy%2B8Tqkoh0xY-yB93du=6ReB5ugXMF4H-KkVofaJg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 31, 2020 at 8:39 AM Li-Wen Hsu <lwhsu@freebsd.org> wrote: > > On Fri, Jul 31, 2020 at 9:50 AM Kyle Evans <kevans@freebsd.org> wrote: > > > > On Thu, Jul 30, 2020 at 8:47 PM Kyle Evans <kevans@freebsd.org> wrote: > > > > > > On Wed, Jul 29, 2020 at 10:53 PM Li-Wen Hsu <lwhsu@freebsd.org> wrote: > > > > > > > > On Thu, Jul 30, 2020 at 7:22 AM Kyle Evans <kevans@freebsd.org> wrote: > > > > > > > > > > Author: kevans > > > > > Date: Wed Jul 29 23:21:56 2020 > > > > > New Revision: 363679 > > > > > URL: https://svnweb.freebsd.org/changeset/base/363679 > > > > > > > > > > Log: > > > > > regex(3): Interpret many escaped ordinary characters as EESCAPE > > > > > > > > > > In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for > > > > > any character to be escaped, but "ORD_CHAR preceded by an unescaped > > > > > <backslash> character [gives undefined results]". > > > > > > > > > > Historically, we've interpreted an escaped ordinary character as the > > > > > ordinary character itself. This becomes problematic when some extensions > > > > > give special meanings to an otherwise ordinary character > > > > > (e.g. GNU's \b, \s, \w), meaning we may have two different valid > > > > > interpretations of the same sequence. > > > > > > > > > > To make this easier to deal with and given that the standard calls this > > > > > undefined, we should throw an error (EESCAPE) if we run into this scenario > > > > > to ease transition into a state where some escaped ordinaries are blessed > > > > > with a special meaning -- it will either error out or have extended > > > > > behavior, rather than have two entirely different versions of undefined > > > > > behavior that leave the consumer of regex(3) guessing as to what behavior > > > > > will be used or leaving them with false impressions. > > > > > > > > > > This change bumps the symbol version of regcomp to FBSD_1.6 and provides the > > > > > old escape semantics for legacy applications, just in case one has an older > > > > > application that would immediately turn into a pumpkin because of an > > > > > extraneous escape that's embedded or otherwise critical to its operation. > > > > > > > > > > This is the final piece needed before enhancing libregex with GNU extensions > > > > > and flipping the switch on bsdgrep. > > > > > > > > > > [1] http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/ > > > > > > > > > > PR: 229925 (exp-run, courtesy of antoine) > > > > > Differential Revision: https://reviews.freebsd.org/D10510 > > > > > > > > > > Modified: > > > > > head/contrib/netbsd-tests/lib/libc/regex/data/meta.in > > > > > head/contrib/netbsd-tests/lib/libc/regex/data/subexp.in > > > > > head/lib/libc/regex/Symbol.map > > > > > head/lib/libc/regex/regcomp.c > > > > > > > > I think there are 3 test cases need to be modified after this change: > > > > > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/lib.googletest.gtest_main/googletest-port-test/main/ > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/usr.bin.diff/diff_test/side_by_side/ > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/usr.bin.sed/sed2_test/hex_subst/ > > > > > > > > > > CC'ing asomers@ and ngie@, because ISTR they have some googletest stock. > > > > > > Testing my libregex GNU extensions revealed that I'm really not ready > > > to commit that just yet. We have two options here for googletest: > > > > > > 1. Disable it and create a PR to be fixed when my changes are done, > > > hopefully by the end of the week, or > > > 2. Fix the expressions in > > > contrib/googletest/googletest/test/googletest-port-test.cc to be POSIX > > > compliant and upstream that. > > > > > > #2 is generally a replacement of \w -> [[:alnum:]] and \W -> > > > [^[:alnum:]] and maybe \s -> [[:space:]]. > > > > > > > Sorry, to be more precise: disable it meaning expect failure of that > > specific test or something similar. > > I think there's no need to let a known issue generate lots of failure > reports for more than 24 hours, I suggest let's go with 1) first. For > 2), It's also good that both libregex and googletest can aware the > difference between POSIX and GNU extensions, but I am not sure how > upstream thinks about this. Still worth trying, though. > Sure- if you have time and no one objects, please proceed with #1 (no time at the moment myself) and I'll get it fixed this weekend, even if I have to hold back implementation of some of the GNU extensions to nab the few googletest's tests care about. Thanks, Kyle Evans
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACNAnaHENNX2uDZ2btHyTHFPnH5gSJCnDy0Qbz6Rtie1r%2BCxBw>