Date: Tue, 10 Nov 2015 23:26:37 +0100 From: Baptiste Daroussin <bapt@FreeBSD.org> To: arch@FreeBSD.org Cc: ache@FreeBSD.org, marino@FreeBSD.org Subject: Question about ASCII and nl_langinfo (locale work) Message-ID: <20151110222636.GN10134@ivaldir.etoilebsd.net>
next in thread | raw e-mail | index | archive | help
--mkHYMT4O8DyWoHkb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi all, When merging the new collation, the locales has been reworked. ache@ raised a good point about LOCALE C and POSIX and by extension the locales US-ASCII: should we take the opportunity to change that: First a desciption of the situation: nl_langinfo is not normalised each OS can return the encoding they want. While it is pretty obvious about what should be returned for for regular encodings (iso-8859* or UTF-8), for C and POSIX locales, FreeBSD used to return US-ASCII (and does it again since today). Lots of third party application (python, perl, tcl etc) tries to figure out the encoding by matching against a table of "known" output of nl_langinfo() The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does not. which means tcl is not able to determine what encoding is needed for the C and POSIX locales. On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most application knows what linux returns. That means we need to teach all upstream about US-ASCII all the time. The proposals are: - Do not change what we have always done. - Change it to something that makes sense "C" (what we tried with "POSIX" which was a very bad idea, but "C" seems to be commonly recognised by application as ASCII) - Let's report the same as Linux, that will simplify portability - Let's be obvious and report ASCII (also commonly recognised by applications) The next question is if we change the above, would it make sense to also report ASCII for ASCII locales: - en_AU.US-ASCII - en_CA.US-ASCII - en_GB.US-ASCII - en_NZ.US-ASCII - en_US.US-ASCII - en_ZA.US-ASCII Which would require some work or should we make them return ASCII or even ANSI_X3.4-1968. Please share your opinion here Best regards, Bapt --mkHYMT4O8DyWoHkb Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlZCbxwACgkQ8kTtMUmk6Ez/4gCgiMNoUGncG+seIgNwrgnKpv7J X0UAoKz2dTIiak5OmV7hXTFDtwrSiwJ0 =jqIF -----END PGP SIGNATURE----- --mkHYMT4O8DyWoHkb--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151110222636.GN10134>