Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Jan 2015 22:17:33 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        Peter Jeremy <peter@rulingia.com>
Cc:        arch@FreeBSD.org, Baptiste Daroussin <bapt@FreeBSD.org>, Jordan Hubbard <jkh@ixsystems.com>
Subject:   Re: [RFC] Set the default locale to en_US.UTF-8
Message-ID:  <20150125191733.GS3698@zxy.spb.ru>
In-Reply-To: <20150125185951.GC23253@server.rulingia.com>
References:  <20150124143357.GI81001@ivaldir.etoilebsd.net> <20150125143243.GB76051@zxy.spb.ru> <7B1D8345-248B-4C44-9568-079BA29614C2@ixsystems.com> <20150125155000.GD76051@zxy.spb.ru> <20150125185951.GC23253@server.rulingia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 26, 2015 at 05:59:51AM +1100, Peter Jeremy wrote:

> On 2015-Jan-25 18:50:00 +0300, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:
> >On Sun, Jan 25, 2015 at 06:58:13AM -0800, Jordan Hubbard wrote:
> >> > On Jan 25, 2015, at 6:32 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:
> >> > 
> >> > NO! Please, NOT!
> >> > Not all bytestring allowed in UTF-8, as result -- unpedicable failed
> >> > execution of sed, grep, vi, ed and etc.
> 
> I switched to en_AU.UTF-8 about 5 years ago with relatively little pain
> (though I had very little non-ASCII text).
> 
> The downside of UTF-8 in that random non-ASCII bytestrings are unlikely to
> be valid UTF-8 and will therefore get rejected.  About the only time I get
> bitten by this is that my random password generator:
>   dd if=/dev/random bs=32 count=1 | tr -cd '!-~'
> will die with an "tr: Illegal byte sequence" and needs a "LC_ALL=C" to
> placate it.

Yes, I now remeber -- other case will be tr.

> >I am years use ru_RU.KOI8-R. Now I try use ru_RU.UTF8 and got some
> >issuse (on 10-STABLE). 9.x and OS may have dufferent version of
> >software and don't touch this.
> 
> Once you've started using any 8-bit locale, switching to UTF-8 (or any
> other 8-bit locale) will be a PITA because you need to re-encode everything.
> And, since it's very difficult to run with multiple locales, you need to
> do a complete sweep when you change locales.  If you are running into
> specific issues with incorrect handling of ru_RU.UTF8, that is a bug and
> you need to report it.

No, I don't have incorrect handling of ru_RU.UTF8 (for correct UTF8
files), I have trouble with processing non-utf files (like example
with tr)

> Note that we're talking about changing the default - you already override
> the default so it won't affect you.

This is dangerous change -- you can lost data, incorrectly proccess
previosly correctly processed files (script with tr and etc.). And
this may be surprised for you.

> >This is (change from one-byte tu multi-bytes locale) may be do
> >individualy, after inspecting systems. This is may be OK for new
> >install, but not [automatic] for update/upgrade.
> 
> Either an existing system has already overridden the default locale, so
> changing the default will have no impact, or the treatment of non-ASCII
> data is currently undefined so changing the default is changing undefined
> behaviour to explicitly warning the the user that they have problems with
> their data.

Currently defaul locale is C. This locale accept any byte string. UTF8
locale may be more strict. This is may be break existing systems.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150125191733.GS3698>