From owner-freebsd-current@freebsd.org Wed Jul 20 18:33:26 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5DF28B9FBB3 for ; Wed, 20 Jul 2016 18:33:26 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1A2DB1AE3; Wed, 20 Jul 2016 18:33:26 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6KIXEpB054887; Wed, 20 Jul 2016 11:33:18 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607201833.u6KIXEpB054887@gw.catspoiler.org> Date: Wed, 20 Jul 2016 11:33:14 -0700 (PDT) From: Don Lewis Subject: Re: UTF-8 by default? To: bapt@FreeBSD.org cc: jonathan@FreeBSD.org, darkuranium@gmail.com, freebsd-current@freebsd.org In-Reply-To: <20160720140741.yi7vfgmmqtg6eprx@ivaldir.etoilebsd.net> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=iso-8859-2 Content-Transfer-Encoding: 8BIT X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jul 2016 18:33:26 -0000 On 20 Jul, Baptiste Daroussin wrote: > On Wed, Jul 20, 2016 at 10:47:45AM -0230, Jonathan Anderson wrote: >> On 20 Jul 2016, at 9:13, Tim Čas wrote: >> >> > So, without further ado: >> > 1) What are the reasons that UTF-8 isn't the default yet? >> > 2) Would it be possible to make this the default in 11.0? What about >> > 12.0? >> > 3) Assuming an effort is started towards making UTF-8 the default, >> > what changes would be required? >> >> At least according to one of my students (who makes more extensive use of >> i18n than I do), enabling UTF-8 by default is pretty straightforward: >> >> https://github.com/musec/freebsd/wiki/Common-setup#utf-8-support > > the LC_COLLATE=C is not needed anymore with freebsd 11+ >> >> If there's anything missing there, I'd love to hear about it. >> > > Lot of work has been done during the 11.0 development the following issues were > fixed: > > /bin/sh not able to handle utf-8 (fixed by fixing the bug in libedit) > no unicode collation: fixed but still very fresh code > vi: there was a potential corruption when opening a file in an encoding which is > not unicode in a unicode env, now is does not corrupt anything anymore but still > says it is unhappy > finger(1) has been fixed for multibytes names (I know noone care about that one > :)) > > On the list of still known issues: > * important: > - csh does not handle unicode > - regex in libc: it does not handle unicode right (except if I have missed > something) and needs to be either fixed either switch to libtre + custom > patches (there was a summer of code about it long ago and dfly went that > way) > - unicode support in our old groff is pretty bad, I plan to replace it with > heirloom-doctools which does handle unicode propertly (as far I have tested > at least) > - edit(1) does not handle multibyte > > * medium (minor?) > - login(1) does not handle unicode properly > > * minor: > - lots of base tools (minor one like nl and friends are not multibyte > aware in lot of cases, probably merging the work done by Ingo Schwarze on > those tools on OpenBSD might be useful, but I have no plan to do it) > - vi needs improvement in multiencoding support I haven't checked the latest > modification on vi upstream about that > > There might be more, but that is all that comes out of my head right now wc(1) has problems with its multibyte support pointed out by Coverity as I recall.