From owner-freebsd-current@freebsd.org  Wed Jul 20 18:33:26 2016
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5DF28B9FBB3
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Wed, 20 Jul 2016 18:33:26 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1A2DB1AE3;
 Wed, 20 Jul 2016 18:33:26 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
 by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6KIXEpB054887;
 Wed, 20 Jul 2016 11:33:18 -0700 (PDT)
 (envelope-from truckman@FreeBSD.org)
Message-Id: <201607201833.u6KIXEpB054887@gw.catspoiler.org>
Date: Wed, 20 Jul 2016 11:33:14 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
Subject: Re: UTF-8 by default?
To: bapt@FreeBSD.org
cc: jonathan@FreeBSD.org, darkuranium@gmail.com, freebsd-current@freebsd.org
In-Reply-To: <20160720140741.yi7vfgmmqtg6eprx@ivaldir.etoilebsd.net>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=iso-8859-2
Content-Transfer-Encoding: 8BIT
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Jul 2016 18:33:26 -0000

On 20 Jul, Baptiste Daroussin wrote:
> On Wed, Jul 20, 2016 at 10:47:45AM -0230, Jonathan Anderson wrote:
>> On 20 Jul 2016, at 9:13, Tim Èas wrote:
>> 
>> > So, without further ado:
>> > 1) What are the reasons that UTF-8 isn't the default yet?
>> > 2) Would it be possible to make this the default in 11.0? What about
>> > 12.0?
>> > 3) Assuming an effort is started towards making UTF-8 the default,
>> > what changes would be required?
>> 
>> At least according to one of my students (who makes more extensive use of
>> i18n than I do), enabling UTF-8 by default is pretty straightforward:
>> 
>> https://github.com/musec/freebsd/wiki/Common-setup#utf-8-support
> 
> the LC_COLLATE=C is not needed anymore with freebsd 11+
>> 
>> If there's anything missing there, I'd love to hear about it.
>> 
> 
> Lot of work has been done during the 11.0 development the following issues were
> fixed:
> 
> /bin/sh not able to handle utf-8 (fixed by fixing the bug in libedit)
> no unicode collation: fixed but still very fresh code
> vi: there was a potential corruption when opening a file in an encoding which is
> not unicode in a unicode env, now is does not corrupt anything anymore but still
> says it is unhappy
> finger(1) has been fixed for multibytes names (I know noone care about that one
> :))
> 
> On the list of still known issues:
> * important:
>   - csh does not handle unicode
>   - regex in libc: it does not handle unicode right (except if I have missed
>     something) and needs to be either fixed either switch to libtre + custom
>     patches (there was a summer of code about it long ago and dfly went that
>     way)
>   - unicode support in our old groff is pretty bad, I plan to replace it with
>     heirloom-doctools which does handle unicode propertly (as far I have tested
>     at least)
>   - edit(1) does not handle multibyte
> 
> * medium (minor?)
>   - login(1) does not handle unicode properly
> 
> * minor:
>   - lots of base tools (minor one like nl and friends are not multibyte
>     aware in lot of cases, probably merging the work done by Ingo Schwarze on
>     those tools on OpenBSD might be useful, but I have no plan to do it)
>   - vi needs improvement in multiencoding support I haven't checked the latest
>     modification on vi upstream about that
> 
> There might be more, but that is all that comes out of my head right now

wc(1) has problems with its multibyte support pointed out by Coverity
as I recall.