Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Jan 2015 22:17:33 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        Peter Jeremy <peter@rulingia.com>
Cc:        arch@FreeBSD.org, Baptiste Daroussin <bapt@FreeBSD.org>, Jordan Hubbard <jkh@ixsystems.com>
Subject:   Re: [RFC] Set the default locale to en_US.UTF-8
Message-ID:  <20150125191733.GS3698@zxy.spb.ru>
In-Reply-To: <20150125185951.GC23253@server.rulingia.com>
References:  <20150124143357.GI81001@ivaldir.etoilebsd.net> <20150125143243.GB76051@zxy.spb.ru> <7B1D8345-248B-4C44-9568-079BA29614C2@ixsystems.com> <20150125155000.GD76051@zxy.spb.ru> <20150125185951.GC23253@server.rulingia.com>

index | next in thread | previous in thread | raw e-mail

On Mon, Jan 26, 2015 at 05:59:51AM +1100, Peter Jeremy wrote:

> On 2015-Jan-25 18:50:00 +0300, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:
> >On Sun, Jan 25, 2015 at 06:58:13AM -0800, Jordan Hubbard wrote:
> >> > On Jan 25, 2015, at 6:32 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:
> >> > 
> >> > NO! Please, NOT!
> >> > Not all bytestring allowed in UTF-8, as result -- unpedicable failed
> >> > execution of sed, grep, vi, ed and etc.
> 
> I switched to en_AU.UTF-8 about 5 years ago with relatively little pain
> (though I had very little non-ASCII text).
> 
> The downside of UTF-8 in that random non-ASCII bytestrings are unlikely to
> be valid UTF-8 and will therefore get rejected.  About the only time I get
> bitten by this is that my random password generator:
>   dd if=/dev/random bs=32 count=1 | tr -cd '!-~'
> will die with an "tr: Illegal byte sequence" and needs a "LC_ALL=C" to
> placate it.

Yes, I now remeber -- other case will be tr.

> >I am years use ru_RU.KOI8-R. Now I try use ru_RU.UTF8 and got some
> >issuse (on 10-STABLE). 9.x and OS may have dufferent version of
> >software and don't touch this.
> 
> Once you've started using any 8-bit locale, switching to UTF-8 (or any
> other 8-bit locale) will be a PITA because you need to re-encode everything.
> And, since it's very difficult to run with multiple locales, you need to
> do a complete sweep when you change locales.  If you are running into
> specific issues with incorrect handling of ru_RU.UTF8, that is a bug and
> you need to report it.

No, I don't have incorrect handling of ru_RU.UTF8 (for correct UTF8
files), I have trouble with processing non-utf files (like example
with tr)

> Note that we're talking about changing the default - you already override
> the default so it won't affect you.

This is dangerous change -- you can lost data, incorrectly proccess
previosly correctly processed files (script with tr and etc.). And
this may be surprised for you.

> >This is (change from one-byte tu multi-bytes locale) may be do
> >individualy, after inspecting systems. This is may be OK for new
> >install, but not [automatic] for update/upgrade.
> 
> Either an existing system has already overridden the default locale, so
> changing the default will have no impact, or the treatment of non-ASCII
> data is currently undefined so changing the default is changing undefined
> behaviour to explicitly warning the the user that they have problems with
> their data.

Currently defaul locale is C. This locale accept any byte string. UTF8
locale may be more strict. This is may be break existing systems.


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150125191733.GS3698>