Date: Wed, 20 Jul 2016 16:07:41 +0200 From: Baptiste Daroussin <bapt@FreeBSD.org> To: Jonathan Anderson <jonathan@FreeBSD.org> Cc: Tim =?utf-8?Q?=C4=8Cas?= <darkuranium@gmail.com>, freebsd-current@freebsd.org Subject: Re: UTF-8 by default? Message-ID: <20160720140741.yi7vfgmmqtg6eprx@ivaldir.etoilebsd.net> In-Reply-To: <B68D48ED-66CA-4E5B-8ED2-555B397AC73E@FreeBSD.org> References: <CANd9X8f5wHvdwN_XZ2y0qsiydYyb=NKLXF0k65S0_TiuWHeGKA@mail.gmail.com> <B68D48ED-66CA-4E5B-8ED2-555B397AC73E@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--2wdpd5drrm4uufok Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 20, 2016 at 10:47:45AM -0230, Jonathan Anderson wrote: > On 20 Jul 2016, at 9:13, Tim =C4=8Cas wrote: >=20 > > So, without further ado: > > 1) What are the reasons that UTF-8 isn't the default yet? > > 2) Would it be possible to make this the default in 11.0? What about > > 12.0? > > 3) Assuming an effort is started towards making UTF-8 the default, > > what changes would be required? >=20 > At least according to one of my students (who makes more extensive use of > i18n than I do), enabling UTF-8 by default is pretty straightforward: >=20 > https://github.com/musec/freebsd/wiki/Common-setup#utf-8-support the LC_COLLATE=3DC is not needed anymore with freebsd 11+ >=20 > If there's anything missing there, I'd love to hear about it. >=20 Lot of work has been done during the 11.0 development the following issues = were fixed: /bin/sh not able to handle utf-8 (fixed by fixing the bug in libedit) no unicode collation: fixed but still very fresh code vi: there was a potential corruption when opening a file in an encoding whi= ch is not unicode in a unicode env, now is does not corrupt anything anymore but = still says it is unhappy finger(1) has been fixed for multibytes names (I know noone care about that= one :)) On the list of still known issues: * important: - csh does not handle unicode - regex in libc: it does not handle unicode right (except if I have missed something) and needs to be either fixed either switch to libtre + custom patches (there was a summer of code about it long ago and dfly went that way) - unicode support in our old groff is pretty bad, I plan to replace it wi= th heirloom-doctools which does handle unicode propertly (as far I have te= sted at least) - edit(1) does not handle multibyte * medium (minor?) - login(1) does not handle unicode properly * minor: - lots of base tools (minor one like nl and friends are not multibyte aware in lot of cases, probably merging the work done by Ingo Schwarze = on those tools on OpenBSD might be useful, but I have no plan to do it) - vi needs improvement in multiencoding support I haven't checked the lat= est modification on vi upstream about that There might be more, but that is all that comes out of my head right now Best regards, Bapt --2wdpd5drrm4uufok Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXj4WtAAoJEGOJi9zxtz5ayIIQAOIRxyPn99omd0XTr0pUmm78 kpx+aNrC8uKNauTeW5ElwbEx6ieDdvaZ8BP4L97edSr537AC3aCUaYKIqF3Ai34X ztPOAc7XubJRHpPx4/4GfnjXnzBQs+Cq0rMtcJ/VYDgxGYnkwFjYMcKW3QbzEU3I m0ksrXlpJ6AL15mKgBnnjdHn1QEQxAR6pZt/O/W9aFFXDcKRzMm9Nraqh90JclUM bKe6hlWRN8QFlbGU7+MFl3Yt/iXb8CPO/gpDEdoKh6pMkeLk50Hp+eQ/esH39x7R y3rHid8QfgRjsQVaABEnXjDyR11CNER6cT0mdZm6KHVG6P1ijqG8XlG/9cXXKQ8h EEnXQCqJSeio4U2cIJiasesPlJmgOnOvVFnVu98pf/qj0tHLmRViFFbQ6ap3XZmk FBMYVrMxfan8NdUwChbiO/er5dznd746nOFhEpGaeGkOv4p4ZrvjiF0JtUgwq2LQ oSr50NV8VaZnyLkL6b+4mhsI2H0Ef+smi6/b5KZuLr4Foe+u2FOhLKoP8E3Y9Dif sPuPi9BVCBCRV6jJ3U1dqr0o/rsvjzO5n931JPHCWx+7pT3dFKs1h8/s9vUiGFIV KXPNp3PPlggHnvr3J5YHgmsyBjwZ1Oy0GLfCwCZ0z9EUjwbfgquPKJJAHJwnHaOs pbtomIcStNTuqFJhQ8Rz =4m7z -----END PGP SIGNATURE----- --2wdpd5drrm4uufok--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160720140741.yi7vfgmmqtg6eprx>