From owner-svn-src-all@freebsd.org Thu Jul 30 04:07:24 2020 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4A8A0376DDA; Thu, 30 Jul 2020 04:07:24 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4BHH1S1Fgjz4g9m; Thu, 30 Jul 2020 04:07:24 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) (Authenticated sender: kevans) by smtp.freebsd.org (Postfix) with ESMTPSA id 08CF51AA0D; Thu, 30 Jul 2020 04:07:24 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: by mail-qk1-f169.google.com with SMTP id b79so24447833qkg.9; Wed, 29 Jul 2020 21:07:24 -0700 (PDT) X-Gm-Message-State: AOAM533wh2E6fJ1coHodcf27G9zZDVdh8NL/DpwxMF6EJX6PicVM0gTB xqB8FvwdhFiF1WuImSodAzjIJT68xX3LYdtLKsg= X-Google-Smtp-Source: ABdhPJzlioR3HP9WPy2waGWRKu7tmUcCHQLgarVmnUN40oWQqt0F0T1U4VrOn6Qn5X1odmGayChQZ3PdA8eDsOGxV1I= X-Received: by 2002:a05:620a:628:: with SMTP id 8mr37632630qkv.103.1596082043533; Wed, 29 Jul 2020 21:07:23 -0700 (PDT) MIME-Version: 1.0 References: <202007292321.06TNLuoq087451@repo.freebsd.org> In-Reply-To: From: Kyle Evans Date: Wed, 29 Jul 2020 23:07:12 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: svn commit: r363679 - in head: contrib/netbsd-tests/lib/libc/regex/data lib/libc/regex To: Li-Wen Hsu Cc: Kyle Evans , src-committers , svn-src-all , svn-src-head Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jul 2020 04:07:24 -0000 Sorry, on mobile, so doubling down on bad formatting by top-posting... The sed/diff tests are easy to fix, will do those in about 8/9 hours. The Google test failure is interesting- this expression has clearly been wrong and getting the wrong results, so we've caught a legitimate issue here. I think the best path forward for that one is to commit my libregex extensions and link that baby up so that \w works. Thanks, Kyle Evans On Wed, Jul 29, 2020, 22:53 Li-Wen Hsu wrote: > On Thu, Jul 30, 2020 at 7:22 AM Kyle Evans wrote: > > > > Author: kevans > > Date: Wed Jul 29 23:21:56 2020 > > New Revision: 363679 > > URL: https://svnweb.freebsd.org/changeset/base/363679 > > > > Log: > > regex(3): Interpret many escaped ordinary characters as EESCAPE > > > > In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows > for > > any character to be escaped, but "ORD_CHAR preceded by an unescaped > > character [gives undefined results]". > > > > Historically, we've interpreted an escaped ordinary character as the > > ordinary character itself. This becomes problematic when some > extensions > > give special meanings to an otherwise ordinary character > > (e.g. GNU's \b, \s, \w), meaning we may have two different valid > > interpretations of the same sequence. > > > > To make this easier to deal with and given that the standard calls this > > undefined, we should throw an error (EESCAPE) if we run into this > scenario > > to ease transition into a state where some escaped ordinaries are > blessed > > with a special meaning -- it will either error out or have extended > > behavior, rather than have two entirely different versions of undefined > > behavior that leave the consumer of regex(3) guessing as to what > behavior > > will be used or leaving them with false impressions. > > > > This change bumps the symbol version of regcomp to FBSD_1.6 and > provides the > > old escape semantics for legacy applications, just in case one has an > older > > application that would immediately turn into a pumpkin because of an > > extraneous escape that's embehttps:// > ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/lib.googletest.gtest_main/googletest-port-test/main/dded > or otherwise critical to its operation. > > > > This is the final piece needed before enhancing libregex with GNU > extensions > > and flipping the switch on bsdgrep. > > > > [1] http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/ > > > > PR: 229925 (exp-run, courtesy of antoine) > > Differential Revision: https://reviews.freebsd.org/D10510 > > > > Modified: > > head/contrib/netbsd-tests/lib/libc/regex/data/meta.in > > head/contrib/netbsd-tests/lib/libc/regex/data/subexp.in > > head/lib/libc/regex/Symbol.map > > head/lib/libc/regex/regcomp.c > > I think there are 3 test cases need to be modified after this change: > > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/lib.googletest.gtest_main/googletest-port-test/main/ > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/usr.bin.diff/diff_test/side_by_side/ > > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/16011/testReport/junit/usr.bin.sed/sed2_test/hex_subst/ > > Please help to check them, thanks! > > Li-Wen >