Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Jul 2016 11:33:14 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        bapt@FreeBSD.org
Cc:        jonathan@FreeBSD.org, darkuranium@gmail.com, freebsd-current@freebsd.org
Subject:   Re: UTF-8 by default?
Message-ID:  <201607201833.u6KIXEpB054887@gw.catspoiler.org>
In-Reply-To: <20160720140741.yi7vfgmmqtg6eprx@ivaldir.etoilebsd.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 20 Jul, Baptiste Daroussin wrote:
> On Wed, Jul 20, 2016 at 10:47:45AM -0230, Jonathan Anderson wrote:
>> On 20 Jul 2016, at 9:13, Tim Čas wrote:
>> 
>> > So, without further ado:
>> > 1) What are the reasons that UTF-8 isn't the default yet?
>> > 2) Would it be possible to make this the default in 11.0? What about
>> > 12.0?
>> > 3) Assuming an effort is started towards making UTF-8 the default,
>> > what changes would be required?
>> 
>> At least according to one of my students (who makes more extensive use of
>> i18n than I do), enabling UTF-8 by default is pretty straightforward:
>> 
>> https://github.com/musec/freebsd/wiki/Common-setup#utf-8-support
> 
> the LC_COLLATE=C is not needed anymore with freebsd 11+
>> 
>> If there's anything missing there, I'd love to hear about it.
>> 
> 
> Lot of work has been done during the 11.0 development the following issues were
> fixed:
> 
> /bin/sh not able to handle utf-8 (fixed by fixing the bug in libedit)
> no unicode collation: fixed but still very fresh code
> vi: there was a potential corruption when opening a file in an encoding which is
> not unicode in a unicode env, now is does not corrupt anything anymore but still
> says it is unhappy
> finger(1) has been fixed for multibytes names (I know noone care about that one
> :))
> 
> On the list of still known issues:
> * important:
>   - csh does not handle unicode
>   - regex in libc: it does not handle unicode right (except if I have missed
>     something) and needs to be either fixed either switch to libtre + custom
>     patches (there was a summer of code about it long ago and dfly went that
>     way)
>   - unicode support in our old groff is pretty bad, I plan to replace it with
>     heirloom-doctools which does handle unicode propertly (as far I have tested
>     at least)
>   - edit(1) does not handle multibyte
> 
> * medium (minor?)
>   - login(1) does not handle unicode properly
> 
> * minor:
>   - lots of base tools (minor one like nl and friends are not multibyte
>     aware in lot of cases, probably merging the work done by Ingo Schwarze on
>     those tools on OpenBSD might be useful, but I have no plan to do it)
>   - vi needs improvement in multiencoding support I haven't checked the latest
>     modification on vi upstream about that
> 
> There might be more, but that is all that comes out of my head right now

wc(1) has problems with its multibyte support pointed out by Coverity
as I recall.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201607201833.u6KIXEpB054887>