From owner-freebsd-stable@freebsd.org Wed Feb 21 12:16:51 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 550E1F253DE for ; Wed, 21 Feb 2018 12:16:51 +0000 (UTC) (envelope-from allbery.b@gmail.com) Received: from mail-qt0-x22f.google.com (mail-qt0-x22f.google.com [IPv6:2607:f8b0:400d:c0d::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DECDC76201 for ; Wed, 21 Feb 2018 12:16:50 +0000 (UTC) (envelope-from allbery.b@gmail.com) Received: by mail-qt0-x22f.google.com with SMTP id g14so1592072qti.2 for ; Wed, 21 Feb 2018 04:16:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=hnIoW3idTSYPIY5jFP2J7+/JEGXduW3nw5tKfDK3vhQ=; b=RSZUSF8Hmv2HqqYlER3Sa6++4ZyEZXHW6WDvVwUjKCebNIgkPWtMM97cjtb1f29nWU uuUi2v32D1Fjp8OBYn3ma+Uy2LeUVKx355E6tbqUzfbOXRt+bG7fcgAOP2pODrQ1TXaA rwZLc0z+s3Puowkzrdz7uCnOwrUQOYwWu1r/JLrKXsACHTHK19bpZDP5cZoBb96UZmqD 8yehoFP1D9SFMCi+4srPCKeazsEzJSOLDQsqpUgK5MCajpLHNUbmG23Bmk0lGc0HfTbY ypRPPJ5TCKwqIUu7jN0j2DmSFdAWFPOiHP9SPzpi73Ty+SRvNr6JVPcoxwWCJKmUBu+d 95SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=hnIoW3idTSYPIY5jFP2J7+/JEGXduW3nw5tKfDK3vhQ=; b=Uwhishavyr0v6fgcGa+Cdk+QotlvPkRpldlmAVJhAy574YQEwsNcIaLSBsLuvJdY2Q OS+s6BHyWgizGwgP45VrFRIQdqRUiFWlbfc63GbTLRJK/IeH5WHYMY4WqG58PrCBBBFf 3585qUnRNVejG8zCiWGKjl3la4OWfMQhx9Ek23/SpNqtjapS0aOaVRY4ZmvIG17y+ft6 JQOpl+NrGnNZyNmsDIIL6jrVpXAQuzbT4fTP0oVe9dcFhY41Axtd6kuezKwZCPoXjfLl ln7DR5nbRlYBWD/0701hAdmdib7Lrq7esEIxlaGnaYzfZ2tAxwvShhvGm97cA5PtmTuT Vvrw== X-Gm-Message-State: APf1xPDVuHCfiJmGDlkAaKaiPuxPrQErSHLeq3ZI1vSsJ1q1H4Qm52Cn XqfBClleeoKxTtoybG7fZZfIlaYIELnZ3jgRTrMXdQ== X-Google-Smtp-Source: AH8x226I3UeitZeR0J4jMIyjrOxtWypdlpYQ+ZzdbFW66hq7w0dJuxRo9GcGxz201XD3z7lf7vb/HhA2cp1qEzhRodE= X-Received: by 10.237.58.225 with SMTP id o88mr4758429qte.307.1519215410380; Wed, 21 Feb 2018 04:16:50 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.36.24 with HTTP; Wed, 21 Feb 2018 04:16:49 -0800 (PST) In-Reply-To: <20180221120811.GA75251@klump.hjerdalen.lokalnett> References: <20180218230251.GA60727@klump.hjerdalen.lokalnett> <20180219081129.GB62932@klump.hjerdalen.lokalnett> <20180220230822.GA72560@klump.hjerdalen.lokalnett> <20180221120811.GA75251@klump.hjerdalen.lokalnett> From: Brandon Allbery Date: Wed, 21 Feb 2018 07:16:49 -0500 Message-ID: Subject: Re: Locale problem updating 10.3 to 11.1 To: Eivind Nicolay Evensen Cc: freebsd-stable Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Feb 2018 12:16:51 -0000 A locale mapping is basically a lookup table (with complications for things like =C3=9F). A single-byte lookup table will be 256 entries, each holding = one or more (because of combining characters) Unicode codepoints representing the mapping from the locale character set to the underlying common character set (Unicode). (There may also be a reverse lookup table for mapping Unicode codepoints to locale codepoints.) Without this, every program would have to deal directly with every possible character set. With it, code can use Unicode internally and let the locale system map to what to display, or in the other direction from what it has read to the common representation. (Complications include things like: depending on encoding/locale details, German lowercase =C3=9F will uppercase to either SS or =E1=BA=9E. And that'= s one of the simpler ones; for some locales, things can get *really* weird. Not to mention fun stuff like Arabic having 4 representations of every character: initial, medial, final, standalone.) Locale handling is seriously *nasty*. Unicode is also pretty nasty... but it mostly manages the superset of individual locale nastinesses in about as logical a way as possible given that locales are fundamentally illogical: very few of them were designed, most grew organically and without regard for rules or logic. (Esperanto locales being an exception... but even Esperanto has developed some organic extensions with actual usage. It's how humans work.) On Wed, Feb 21, 2018 at 7:08 AM, Eivind Nicolay Evensen < eivinde@terraplane.org> wrote: > On Wed, Feb 21, 2018 at 01:03:01AM -0500, Brandon Allbery wrote: > > On Tue, Feb 20, 2018 at 6:08 PM, Eivind Nicolay Evensen < > > eivinde@terraplane.org> wrote: > > > > > However, since it was mentioned in a note starting with > > > "Add support for unicode collation" I most likely didn't even read it > > > since I'll never touch unicode. > > > > > > > If you ever use anything other than LANG=3DC, you *are* touching Unicod= e. > > Well, I don't see multibyte characters with 8859-1, and > multibyte is what I don't tolerate. I didn't even know > that unicode could be single-byte character only sets. > > > > > -- > Eivind > --=20 brandon s allbery kf8nh sine nomine associate= s allbery.b@gmail.com ballbery@sinenomine.ne= t unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.ne= t