From nobody Fri Jul 9 14:36:40 2021 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id BB6368D56F1 for ; Fri, 9 Jul 2021 14:36:42 +0000 (UTC) (envelope-from se@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GLwhp4jnGz3lJx; Fri, 9 Jul 2021 14:36:42 +0000 (UTC) (envelope-from se@freebsd.org) Received: from Stefans-MBP-449.fritz.box (p200300cd5f09870068fde36880f7c2a5.dip0.t-ipconnect.de [IPv6:2003:cd:5f09:8700:68fd:e368:80f7:c2a5]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: se/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 0E34FB574; Fri, 9 Jul 2021 14:36:41 +0000 (UTC) (envelope-from se@freebsd.org) To: "Rodney W. Grimes" , Warner Losh Cc: "freebsd-arch@freebsd.org" References: <202107091321.169DLTZY041684@gndrsh.dnsmgr.net> From: Stefan Esser Subject: Re: FreeBSD awk behavior change proposal Message-ID: <621331d0-b7bb-0365-23f7-999dd7155c19@freebsd.org> Date: Fri, 9 Jul 2021 16:36:40 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@freebsd.org MIME-Version: 1.0 In-Reply-To: <202107091321.169DLTZY041684@gndrsh.dnsmgr.net> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="dJDvvnd1LRP4KQ30mox9kZQ3vvK1popsX" X-ThisMailContainsUnwantedMimeParts: N This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --dJDvvnd1LRP4KQ30mox9kZQ3vvK1popsX Content-Type: multipart/mixed; boundary="aK93F3nj3f2dNOZVfqt9IvaPwTuUOH2EG"; protected-headers="v1" From: Stefan Esser To: "Rodney W. Grimes" , Warner Losh Cc: "freebsd-arch@freebsd.org" Message-ID: <621331d0-b7bb-0365-23f7-999dd7155c19@freebsd.org> Subject: Re: FreeBSD awk behavior change proposal References: <202107091321.169DLTZY041684@gndrsh.dnsmgr.net> In-Reply-To: <202107091321.169DLTZY041684@gndrsh.dnsmgr.net> --aK93F3nj3f2dNOZVfqt9IvaPwTuUOH2EG Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Am 09.07.21 um 15:21 schrieb Rodney W. Grimes: >> Greetings, >> >> I've posted https://reviews.freebsd.org/D31114 which eliminates the l= ast >> delta we have from upstream one-true-awk. This delta has basically bee= n >> rejected by upstream as being a really bad idea. Let me give some >> background. >> >> In 2005, FreeBSD changed one-true-awk to honor the locale's collating = order. >> https://svnweb.freebsd.org/base/head/usr.bin/awk/b.c.diff?annotate=3D1= 46322&pathrev=3D201988 >> This was billed as a temporary patch. It was also compatible with >> the then-current behavior of gawk. That temporary patch has lasted 16 >> years now. >> >> However, IEEE Std 1003.1-2008 changed the behaivor of ranges in regula= r >> expressions outside of the "C" and "POSIX" locales to be undefined. >> >> Starting in 2011, gawk 4.0 stopped using the locale for the range >> regular expressions and used the traditional behavior only. The >> maintainer had grown weary of answering why '[A-Z]' would sometimes >> match lower-case expressions. The details about are explained here: >> https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.= html >> >> To restore compatibility with other implementaitons of awk, revert thi= s >> patch. FreeBSD is the odd-system out. It also has the nice side effect= >> of eliminating the last of our differences with upstream one-true-awk.= >> >> I'd like to commit the change at least to -current. Ideally, I'd like = to MFC >> the change. I believe better compatibility with gawk and other awk >> implementations justifies this change in behavior because the current >> behavior is outside the mainstream enough to be considered a bug. >> >> I'd like to solicit input before I do this, however. >=20 > My only concern on this is does anything in the ports system get > tickled by this change, I know its a pita, but maybe have an exp > run done? I reviewed and accepted the differential, and by examination= > I do not see how this could cause an issue now, so Meh give it a long > back in -current and things should be ok. While possible in theory, I do not see how the ports system could be affected in practice. Ports are built in a C/POSIX locale on the official builders, and thus using a different locale and collating sequence on a user's system could break the port, but should never be a requirement. I have checked the port Makefiles for occurrences of LANG or LC_* outside specific command invocations (e.g. to set the locale for a sort command). These are the results: - ${USE_LOCALE} is used in bsd.port.mk, but the only case where a locale other than C or en_US.UTF-8 is specified is shells/fd which has USE_LOCALE=3Dja (i.e. does not specify an encoding). - ${ELIXIR_LOCALE} is used to set LANG and LC_ALL for USES=3Delixir. But ELIXIR_LOCALE is only ever set to en_US.UTF-8, AFAICT. - print/libpaper explicitly requests LANG=3DC LC_ALL=3DC for AWK. - The only port that requests a locale that is not en_US.UTF-8, en_US.ISO8859-1, or C is textproc/te-hunspell, which uses LANG=3Dte_IN.utf8 LC_ALL=3Dte_IN.utf8 to execute wordlist2hunspell, but only for this single shell script that does not invoke AWK and which does internally use LC_ALL=3DC for sort and uniq to make those not depend on an externally set locale. All other cases where LC_* or LANG are used in port Makefiles are in e.g. EXTRACT_CMD, TEST_ENV or in patch files, but those do enforce a C or C.UTF-8 locale (or en_US.*) and thus have no effect on the proposed change to AWK (besides often only setting the locale for a TAR file extraction). If an exp-run is planned for other reasons, using the modified AWK could be thrown in as a little risk modification. But I do not see any possible effect on the ports system, after performing a grep for LANG and LC_* on the Makefiles and patch files. Regards, STefan --aK93F3nj3f2dNOZVfqt9IvaPwTuUOH2EG-- --dJDvvnd1LRP4KQ30mox9kZQ3vvK1popsX Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature" -----BEGIN PGP SIGNATURE----- wsB5BAABCAAjFiEEo3HqZZwL7MgrcVMTR+u171r99UQFAmDoXvgFAwAAAAAACgkQR+u171r99UQ9 5wf/Uht4XKAbMIUdEx677UWmpFlICGwHfi9KZFVn3oAHFdRi8QeeziLcLjyPIFiuqUdRTD8gPRft 1h9HyIAAJIBSrbr1Hf5KlERGtY0TgIOLWEvvpc5JviD6yFkcYkluW4dC4mdWzqYxUJlHIcXBFxDL 29WmXXNMUUvNL9MzPuXZxaLd7zCbskPv6zVj91yr4oQ1n8bPEb3/zIrWmEciI7nRTCm01mpEtZ76 2VXmYWM8TNk1K95oe71bZ5W2zauob3SgYNNE6Xqs66vVkRB6ul/9IeMDZ4DEsUyaeZtbJrmZl0kB POw9T098FqqWgEmd85kRa/hZe+2tqrCKA+lk+pTMQg== =Ph1M -----END PGP SIGNATURE----- --dJDvvnd1LRP4KQ30mox9kZQ3vvK1popsX--