Date: Wed, 20 Jul 2016 16:07:41 +0200 From: Baptiste Daroussin <bapt@FreeBSD.org> To: Jonathan Anderson <jonathan@FreeBSD.org> Cc: Tim =?utf-8?Q?=C4=8Cas?= <darkuranium@gmail.com>, freebsd-current@freebsd.org Subject: Re: UTF-8 by default? Message-ID: <20160720140741.yi7vfgmmqtg6eprx@ivaldir.etoilebsd.net> In-Reply-To: <B68D48ED-66CA-4E5B-8ED2-555B397AC73E@FreeBSD.org> References: <CANd9X8f5wHvdwN_XZ2y0qsiydYyb=NKLXF0k65S0_TiuWHeGKA@mail.gmail.com> <B68D48ED-66CA-4E5B-8ED2-555B397AC73E@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On Wed, Jul 20, 2016 at 10:47:45AM -0230, Jonathan Anderson wrote: > On 20 Jul 2016, at 9:13, Tim Čas wrote: > > > So, without further ado: > > 1) What are the reasons that UTF-8 isn't the default yet? > > 2) Would it be possible to make this the default in 11.0? What about > > 12.0? > > 3) Assuming an effort is started towards making UTF-8 the default, > > what changes would be required? > > At least according to one of my students (who makes more extensive use of > i18n than I do), enabling UTF-8 by default is pretty straightforward: > > https://github.com/musec/freebsd/wiki/Common-setup#utf-8-support the LC_COLLATE=C is not needed anymore with freebsd 11+ > > If there's anything missing there, I'd love to hear about it. > Lot of work has been done during the 11.0 development the following issues were fixed: /bin/sh not able to handle utf-8 (fixed by fixing the bug in libedit) no unicode collation: fixed but still very fresh code vi: there was a potential corruption when opening a file in an encoding which is not unicode in a unicode env, now is does not corrupt anything anymore but still says it is unhappy finger(1) has been fixed for multibytes names (I know noone care about that one :)) On the list of still known issues: * important: - csh does not handle unicode - regex in libc: it does not handle unicode right (except if I have missed something) and needs to be either fixed either switch to libtre + custom patches (there was a summer of code about it long ago and dfly went that way) - unicode support in our old groff is pretty bad, I plan to replace it with heirloom-doctools which does handle unicode propertly (as far I have tested at least) - edit(1) does not handle multibyte * medium (minor?) - login(1) does not handle unicode properly * minor: - lots of base tools (minor one like nl and friends are not multibyte aware in lot of cases, probably merging the work done by Ingo Schwarze on those tools on OpenBSD might be useful, but I have no plan to do it) - vi needs improvement in multiencoding support I haven't checked the latest modification on vi upstream about that There might be more, but that is all that comes out of my head right now Best regards, Bapt [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXj4WtAAoJEGOJi9zxtz5ayIIQAOIRxyPn99omd0XTr0pUmm78 kpx+aNrC8uKNauTeW5ElwbEx6ieDdvaZ8BP4L97edSr537AC3aCUaYKIqF3Ai34X ztPOAc7XubJRHpPx4/4GfnjXnzBQs+Cq0rMtcJ/VYDgxGYnkwFjYMcKW3QbzEU3I m0ksrXlpJ6AL15mKgBnnjdHn1QEQxAR6pZt/O/W9aFFXDcKRzMm9Nraqh90JclUM bKe6hlWRN8QFlbGU7+MFl3Yt/iXb8CPO/gpDEdoKh6pMkeLk50Hp+eQ/esH39x7R y3rHid8QfgRjsQVaABEnXjDyR11CNER6cT0mdZm6KHVG6P1ijqG8XlG/9cXXKQ8h EEnXQCqJSeio4U2cIJiasesPlJmgOnOvVFnVu98pf/qj0tHLmRViFFbQ6ap3XZmk FBMYVrMxfan8NdUwChbiO/er5dznd746nOFhEpGaeGkOv4p4ZrvjiF0JtUgwq2LQ oSr50NV8VaZnyLkL6b+4mhsI2H0Ef+smi6/b5KZuLr4Foe+u2FOhLKoP8E3Y9Dif sPuPi9BVCBCRV6jJ3U1dqr0o/rsvjzO5n931JPHCWx+7pT3dFKs1h8/s9vUiGFIV KXPNp3PPlggHnvr3J5YHgmsyBjwZ1Oy0GLfCwCZ0z9EUjwbfgquPKJJAHJwnHaOs pbtomIcStNTuqFJhQ8Rz =4m7z -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160720140741.yi7vfgmmqtg6eprx>
