From owner-svn-src-all@freebsd.org Fri Jul 31 13:41:41 2020 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 75CDD37A693; Fri, 31 Jul 2020 13:41:41 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4BJ7jd2bWCz4n8f; Fri, 31 Jul 2020 13:41:41 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) (Authenticated sender: kevans) by smtp.freebsd.org (Postfix) with ESMTPSA id 3D0FC29813; Fri, 31 Jul 2020 13:41:41 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: by mail-qv1-f47.google.com with SMTP id y11so11127226qvl.4; Fri, 31 Jul 2020 06:41:41 -0700 (PDT) X-Gm-Message-State: AOAM531u63mqUJrLnme24ScjhVrGXa3rusZI+2FsIyszPfLkqnEc1j3h GVLxGugy+f901TInkm2DqT1hk9wGMbROXp4NM2o= X-Google-Smtp-Source: ABdhPJzjjygKY6TabhfCfjbJkwc89wlJerQvCvdnjkXyMSopCm3kp3zL+etgd6Ad4kJ1v0qcJ21nrfUKa838DRtzvpE= X-Received: by 2002:a0c:b310:: with SMTP id s16mr3985039qve.5.1596202900817; Fri, 31 Jul 2020 06:41:40 -0700 (PDT) MIME-Version: 1.0 References: <202007292321.06TNLuoq087451@repo.freebsd.org> In-Reply-To: From: Kyle Evans Date: Fri, 31 Jul 2020 08:41:28 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: svn commit: r363679 - in head: contrib/netbsd-tests/lib/libc/regex/data lib/libc/regex To: Li-Wen Hsu Cc: src-committers , svn-src-all , svn-src-head , Ngie Cooper , Alan Somers Content-Type: text/plain; charset="UTF-8" X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Jul 2020 13:41:41 -0000 On Fri, Jul 31, 2020 at 8:39 AM Li-Wen Hsu wrote: > > On Fri, Jul 31, 2020 at 9:50 AM Kyle Evans wrote: > > > > On Thu, Jul 30, 2020 at 8:47 PM Kyle Evans wrote: > > > > > > On Wed, Jul 29, 2020 at 10:53 PM Li-Wen Hsu wrote: > > > > > > > > On Thu, Jul 30, 2020 at 7:22 AM Kyle Evans wrote: > > > > > > > > > > Author: kevans > > > > > Date: Wed Jul 29 23:21:56 2020 > > > > > New Revision: 363679 > > > > > URL: https://svnweb.freebsd.org/changeset/base/363679 > > > > > > > > > > Log: > > > > > regex(3): Interpret many escaped ordinary characters as EESCAPE > > > > > > > > > > In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for > > > > > any character to be escaped, but "ORD_CHAR preceded by an unescaped > > > > > character [gives undefined results]". > > > > > > > > > > Historically, we've interpreted an escaped ordinary character as the > > > > > ordinary character itself. This becomes problematic when some extensions > > > > > give special meanings to an otherwise ordinary character > > > > > (e.g. GNU's \b, \s, \w), meaning we may have two different valid > > > > > interpretations of the same sequence. > > > > > > > > > > To make this easier to deal with and given that the standard calls this > > > > > undefined, we should throw an error (EESCAPE) if we run into this scenario > > > > > to ease transition into a state where some escaped ordinaries are blessed > > > > > with a special meaning -- it will either error out or have extended > > > > > behavior, rather than have two entirely different versions of undefined > > > > > behavior that leave the consumer of regex(3) guessing as to what behavior > > > > > will be used or leaving them with false impressions. > > > > > > > > > > This change bumps the symbol version of regcomp to FBSD_1.6 and provides the > > > > > old escape semantics for legacy applications, just in case one has an older > > > > > application that would immediately turn into a pumpkin because of an > > > > > extraneous escape that's embedded or otherwise critical to its operation. > > > > > > > > > > This is the final piece needed before enhancing libregex with GNU extensions > > > > > and flipping the switch on bsdgrep. > > > > > > > > > > [1] http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/ > > > > > > > > > > PR: 229925 (exp-run, courtesy of antoine) > > > > > Differential Revision: https://reviews.freebsd.org/D10510 > > > > > > > > > > Modified: > > > > > head/contrib/netbsd-tests/lib/libc/regex/data/meta.in > > > > > head/contrib/netbsd-tests/lib/libc/regex/data/subexp.in > > > > > head/lib/libc/regex/Symbol.map > > > > > head/lib/libc/regex/regcomp.c > > > > > > > > I think there are 3 test cases need to be modified after this change: > > > > > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/lib.googletest.gtest_main/googletest-port-test/main/ > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/usr.bin.diff/diff_test/side_by_side/ > > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/usr.bin.sed/sed2_test/hex_subst/ > > > > > > > > > > CC'ing asomers@ and ngie@, because ISTR they have some googletest stock. > > > > > > Testing my libregex GNU extensions revealed that I'm really not ready > > > to commit that just yet. We have two options here for googletest: > > > > > > 1. Disable it and create a PR to be fixed when my changes are done, > > > hopefully by the end of the week, or > > > 2. Fix the expressions in > > > contrib/googletest/googletest/test/googletest-port-test.cc to be POSIX > > > compliant and upstream that. > > > > > > #2 is generally a replacement of \w -> [[:alnum:]] and \W -> > > > [^[:alnum:]] and maybe \s -> [[:space:]]. > > > > > > > Sorry, to be more precise: disable it meaning expect failure of that > > specific test or something similar. > > I think there's no need to let a known issue generate lots of failure > reports for more than 24 hours, I suggest let's go with 1) first. For > 2), It's also good that both libregex and googletest can aware the > difference between POSIX and GNU extensions, but I am not sure how > upstream thinks about this. Still worth trying, though. > Sure- if you have time and no one objects, please proceed with #1 (no time at the moment myself) and I'll get it fixed this weekend, even if I have to hold back implementation of some of the GNU extensions to nab the few googletest's tests care about. Thanks, Kyle Evans