From owner-freebsd-arch@freebsd.org Tue Nov 10 22:26:42 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4A811A2CE29 for ; Tue, 10 Nov 2015 22:26:42 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 29C72148D for ; Tue, 10 Nov 2015 22:26:42 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 27532A2CE28; Tue, 10 Nov 2015 22:26:42 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0CEE6A2CE27 for ; Tue, 10 Nov 2015 22:26:42 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com [IPv6:2a00:1450:400c:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A38DA148A; Tue, 10 Nov 2015 22:26:41 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: by wmvv187 with SMTP id v187so30717828wmv.1; Tue, 10 Nov 2015 14:26:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=XPH2Gwpt+E4sRfSI8wFrm0w8LBy2/7HmIIwrhc50cK4=; b=zMDgAP+NHTR2jPPieNgvmD2J9cLvocmfyRE7uJhIQ0qfg/2Re4yVc0QtoktJbJblU4 PVX73pve8LAjTX3IvtGNi2PCLJOcCOONR/BGHOsF2Gz69rqqM62Hh6NSFfaZCZ69NgO9 p+c67EQfpBO+Suei2CQ7ehgm25I/0IzSdaiIzlATIXLbz2plaNKwoMO5SGVILhS9AKIg AkxMwBRLfrJS1BSonWxx1HbSjHQpzShOhoqDS5AoggBu4AVr+GycEZX8xUC6NFO7U/gV g7+H+cTlDtFnY694Lt3Blg2YMjGKZWx3BuK+zS38irVkJdEw2xyduNAvpDfQFkZqMJAk K8Iw== X-Received: by 10.194.114.70 with SMTP id je6mr6409921wjb.7.1447194400154; Tue, 10 Nov 2015 14:26:40 -0800 (PST) Received: from ivaldir.etoilebsd.net ([2001:41d0:8:db4c::1]) by smtp.gmail.com with ESMTPSA id at4sm5803963wjc.9.2015.11.10.14.26.39 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 10 Nov 2015 14:26:39 -0800 (PST) Sender: Baptiste Daroussin Date: Tue, 10 Nov 2015 23:26:37 +0100 From: Baptiste Daroussin To: arch@FreeBSD.org Cc: ache@FreeBSD.org, marino@FreeBSD.org Subject: Question about ASCII and nl_langinfo (locale work) Message-ID: <20151110222636.GN10134@ivaldir.etoilebsd.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="mkHYMT4O8DyWoHkb" Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Nov 2015 22:26:42 -0000 --mkHYMT4O8DyWoHkb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi all, When merging the new collation, the locales has been reworked. ache@ raised a good point about LOCALE C and POSIX and by extension the locales US-ASCII: should we take the opportunity to change that: First a desciption of the situation: nl_langinfo is not normalised each OS can return the encoding they want. While it is pretty obvious about what should be returned for for regular encodings (iso-8859* or UTF-8), for C and POSIX locales, FreeBSD used to return US-ASCII (and does it again since today). Lots of third party application (python, perl, tcl etc) tries to figure out the encoding by matching against a table of "known" output of nl_langinfo() The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does not. which means tcl is not able to determine what encoding is needed for the C and POSIX locales. On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most application knows what linux returns. That means we need to teach all upstream about US-ASCII all the time. The proposals are: - Do not change what we have always done. - Change it to something that makes sense "C" (what we tried with "POSIX" which was a very bad idea, but "C" seems to be commonly recognised by application as ASCII) - Let's report the same as Linux, that will simplify portability - Let's be obvious and report ASCII (also commonly recognised by applications) The next question is if we change the above, would it make sense to also report ASCII for ASCII locales: - en_AU.US-ASCII - en_CA.US-ASCII - en_GB.US-ASCII - en_NZ.US-ASCII - en_US.US-ASCII - en_ZA.US-ASCII Which would require some work or should we make them return ASCII or even ANSI_X3.4-1968. Please share your opinion here Best regards, Bapt --mkHYMT4O8DyWoHkb Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlZCbxwACgkQ8kTtMUmk6Ez/4gCgiMNoUGncG+seIgNwrgnKpv7J X0UAoKz2dTIiak5OmV7hXTFDtwrSiwJ0 =jqIF -----END PGP SIGNATURE----- --mkHYMT4O8DyWoHkb--