Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Apr 2006 14:32:04 +0200
From:      "Tobias Svehagen" <tobias.svehagen@gmail.com>
To:        freebsd-current@freebsd.org, tjr@freebsd.org
Subject:   About gnu/93629 : GNU sort(1) tool dumps core within non-regular locale settings
Message-ID:  <ca834ac00604020532j534aa7e2l5251fdff96d26526@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
I saw that this issue was on the todo list for 6.1R so I decided to
take a look at it.

http://www.freebsd.org/cgi/query-pr.cgi?pr=3D93629

As it says in the report you can recreate the abort by doing the following

setenv LANG uk_UA.KOI8-U
setenv LC_CTYPE ja_JP.UTF-8
/usr/bin/sort

This is quite a weird problem and the it lies in that sort tries to
handle the LC_TIME values in inittables_mb() thinking that they are in
UTF format. The LC_TIME values for uk_UA.KOI8-U does not use UTF
encoding but it uses NONE as encoding. Normally this wouldn't be a
problem since the multibyte routines handle normal ascii values <=3D 7f
just fine and that's why sort works fine when setting LANG to C for
example (since Jan-Dec has no ascii > 7f).

The thing about uk_UA.KOI8-U (and some others) is that it uses ascii
values > 7f to represent the ukrainian alphabet. For example Jan in
uk_UA.KOI8-U's LC_TIME is d3 a6 de 00. When you parse that string as
UTF, d3 says that it is a multibyte of length 2 and that one works
fine (does not trigger the assertion) but then d6 also says that it is
a multibyte of length 2 and that makes mbrtowc() return -2 (see man
mbrtowc) and that's what makes the assertion go off and abort.

I don't know what I think is the best way to solve this but I think
that something should be done to make sort not abort and core dump.
One solution is of course to make sort check that LC_CTYPE and LC_TIME
is the same (or C) but maybe some people want's to have it that way
(although I don't see why).

Do you have any ideas on how this can be solved in a nice way or do
you think that the fix "set LC_CTYPE and LC_TIME to same value" is
enough?

/Tobias Svehagen



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ca834ac00604020532j534aa7e2l5251fdff96d26526>