From owner-freebsd-current@FreeBSD.ORG Sun Apr 2 12:32:06 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 06CCD16A422 for ; Sun, 2 Apr 2006 12:32:06 +0000 (UTC) (envelope-from tobias.svehagen@gmail.com) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.197]) by mx1.FreeBSD.org (Postfix) with ESMTP id 47EDF43D4C for ; Sun, 2 Apr 2006 12:32:05 +0000 (GMT) (envelope-from tobias.svehagen@gmail.com) Received: by zproxy.gmail.com with SMTP id l8so1297597nzf for ; Sun, 02 Apr 2006 05:32:04 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=iXYU+GcYQfmjxDyvJYO1JrPMRmrF7JEr6feRd1TDRLEeGKRbeAwLRm412XPQObCoasM3RX6K9gLhAH+86Msh2cHmas503QPILEZvhPkeP/ufs485yK8Qw1Zu++R4MaVI5OWUMwyynDW+X58v8QLaRx1264s/acL2gxmdA96JkDQ= Received: by 10.35.37.18 with SMTP id p18mr92404pyj; Sun, 02 Apr 2006 05:32:04 -0700 (PDT) Received: by 10.35.28.7 with HTTP; Sun, 2 Apr 2006 05:32:04 -0700 (PDT) Message-ID: Date: Sun, 2 Apr 2006 14:32:04 +0200 From: "Tobias Svehagen" To: freebsd-current@freebsd.org, tjr@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Cc: Subject: About gnu/93629 : GNU sort(1) tool dumps core within non-regular locale settings X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Apr 2006 12:32:06 -0000 I saw that this issue was on the todo list for 6.1R so I decided to take a look at it. http://www.freebsd.org/cgi/query-pr.cgi?pr=3D93629 As it says in the report you can recreate the abort by doing the following setenv LANG uk_UA.KOI8-U setenv LC_CTYPE ja_JP.UTF-8 /usr/bin/sort This is quite a weird problem and the it lies in that sort tries to handle the LC_TIME values in inittables_mb() thinking that they are in UTF format. The LC_TIME values for uk_UA.KOI8-U does not use UTF encoding but it uses NONE as encoding. Normally this wouldn't be a problem since the multibyte routines handle normal ascii values <=3D 7f just fine and that's why sort works fine when setting LANG to C for example (since Jan-Dec has no ascii > 7f). The thing about uk_UA.KOI8-U (and some others) is that it uses ascii values > 7f to represent the ukrainian alphabet. For example Jan in uk_UA.KOI8-U's LC_TIME is d3 a6 de 00. When you parse that string as UTF, d3 says that it is a multibyte of length 2 and that one works fine (does not trigger the assertion) but then d6 also says that it is a multibyte of length 2 and that makes mbrtowc() return -2 (see man mbrtowc) and that's what makes the assertion go off and abort. I don't know what I think is the best way to solve this but I think that something should be done to make sort not abort and core dump. One solution is of course to make sort check that LC_CTYPE and LC_TIME is the same (or C) but maybe some people want's to have it that way (although I don't see why). Do you have any ideas on how this can be solved in a nice way or do you think that the fix "set LC_CTYPE and LC_TIME to same value" is enough? /Tobias Svehagen