From owner-freebsd-hackers@freebsd.org Wed Mar 8 08:40:48 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97D3FD01255 for ; Wed, 8 Mar 2017 08:40:48 +0000 (UTC) (envelope-from bapt@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 75CB5126B; Wed, 8 Mar 2017 08:40:48 +0000 (UTC) (envelope-from bapt@freebsd.org) Received: by freefall.freebsd.org (Postfix, from userid 1235) id BD6532A00; Wed, 8 Mar 2017 08:40:47 +0000 (UTC) Date: Wed, 8 Mar 2017 09:40:47 +0100 From: Baptiste Daroussin To: Xin Li Cc: "freebsd-hackers@freebsd.org" , theraven@freebsd.org, d@delphij.net Subject: Re: Why en_US.UTF-8 locale consider a < A? Message-ID: <20170308084047.qc2j3vnrh5hycg32@ivaldir.net> References: <062a0098-1975-6d2b-b017-f623e46ca20b@delphij.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ndygghjg7rmiqrog" Content-Disposition: inline In-Reply-To: <062a0098-1975-6d2b-b017-f623e46ca20b@delphij.net> User-Agent: NeoMutt/20170225 (1.8.0) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Mar 2017 08:40:48 -0000 --ndygghjg7rmiqrog Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 08, 2017 at 12:28:16AM -0800, Xin Li wrote: > Hi, >=20 > I recently noticed that when LANG and LC_CTYPE are set to en_US.UTF-8, > the following file: >=20 > %%%%% > 1 > 2 > A > a > B > b > %%%% >=20 > I got: >=20 > $ LANG=3DC LC_CTYPE=3DC sort testcase > 1 > 2 > A > B > a > b > $ LANG=3Den_US.UTF-8 LC_CTYPE=3Den_US.UTF-8 sort testcase > 1 > 2 > a > A > b > B >=20 > Is this result correct? It matches some Debian behavior but not macOS > behavior. Yes the result is correct, macOS does not have unicode collation if you wan= t to match the macos behaviour you have to set LC_COLLATE=3DC Best regards, Bapt --ndygghjg7rmiqrog Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEgOTj3suS2urGXVU3Y4mL3PG3PloFAli/w4cACgkQY4mL3PG3 Plr9zg/8Dow6c0uY5UYnTpKsXs3K3JXX3DCCHbzi+dhYquG6Mi38xke3AkF2HVCg sc+ZvBsB8c0uBhYNIqqeIhYf3OIYjX20Jwky7FPso59eENUEXClw6RJ95PS74KdP odT2p3h0FLlMtBwbn9X5PDr4hiAnE/zGU1fsoj3ZII662YjUAxJHcyoFiXW/7P9X e2d6M0NPpru7ualgzD4JhOqE0ZR7u9tHAawWHZYHlArSE0yJZaz7EMAEaRPqstag mP3klTVUrdJc+7uvExI7dE+8fDHYexzv8cF1RmYhE81D+XAkE6G9ylZplKIuSy7D 7GPFfNWX4/F5eTBkqrXBtkI7j6D+4CEScqQ5r92AB3/7exDTfaAE9bDsaErRLVEF Ioxj/68dZWlOJn8M5mXwwKkJEEBx6EapTTrHHP4Rb9EFO430roigq5AMRRGcGW9i AfIsg/cuPzyqx8PTRZmeQEbDKYTsSxnrig4QDQZBfoSpcofimqry1u0MIeXWdKQ6 Dua3GHMRVYo+m7D57d9QVKk1kTMk8tZUUmGulpA94f3RY3V+3BsRDQUe/sgAKFIi OYdPesKoZGbMKg6KxPfRBoDEGFVtMtQKN8BmU6jU/s6URTvFegoDz+CMJMCbyDgh iEKkwgZpAJYgUcWbSOdc1v23p8rZ6KQiPOcnoax7yWwcZuMI4UA= =CZAb -----END PGP SIGNATURE----- --ndygghjg7rmiqrog--