Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Nov 2015 23:26:37 +0100
From:      Baptiste Daroussin <bapt@FreeBSD.org>
To:        arch@FreeBSD.org
Cc:        ache@FreeBSD.org, marino@FreeBSD.org
Subject:   Question about ASCII and nl_langinfo (locale work)
Message-ID:  <20151110222636.GN10134@ivaldir.etoilebsd.net>

next in thread | raw e-mail | index | archive | help

--mkHYMT4O8DyWoHkb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi all,

When merging the new collation, the locales has been reworked.

ache@ raised a good point about LOCALE C and POSIX and by extension the locales
US-ASCII: should we take the opportunity to change that:

First a desciption of the situation: nl_langinfo is not normalised each OS can
return the encoding they want. While it is pretty obvious about what should be
returned for for regular encodings (iso-8859* or UTF-8), for C and POSIX
locales, FreeBSD used to return US-ASCII (and does it again since today).

Lots of third party application (python, perl, tcl etc) tries to figure out the
encoding by matching against a table of "known" output of nl_langinfo()

The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does
not. which means tcl is not able to determine what encoding is needed for the C
and POSIX locales.

On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most
application knows what linux returns.

That means we need to teach all upstream about US-ASCII all the time.

The proposals are:
- Do not change what we have always done.
- Change it to something that makes sense "C" (what we tried with "POSIX" which
  was a very bad idea, but "C" seems to be commonly recognised by application as
  ASCII)
- Let's report the same as Linux, that will simplify portability
- Let's be obvious and report ASCII (also commonly recognised by applications)

The next question is if we change the above, would it make sense to also report
ASCII for ASCII locales:
- en_AU.US-ASCII
- en_CA.US-ASCII
- en_GB.US-ASCII
- en_NZ.US-ASCII
- en_US.US-ASCII
- en_ZA.US-ASCII

Which would require some work or should we make them return ASCII or even
ANSI_X3.4-1968.

Please share your opinion here

Best regards,
Bapt

--mkHYMT4O8DyWoHkb
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlZCbxwACgkQ8kTtMUmk6Ez/4gCgiMNoUGncG+seIgNwrgnKpv7J
X0UAoKz2dTIiak5OmV7hXTFDtwrSiwJ0
=jqIF
-----END PGP SIGNATURE-----

--mkHYMT4O8DyWoHkb--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151110222636.GN10134>