Date: Tue, 23 Jun 2020 14:42:57 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 247494] sort(1) order affected by LC_CTYPE Message-ID: <bug-247494-227-dOAHDvuAzn@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-247494-227@https.bugs.freebsd.org/bugzilla/> References: <bug-247494-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247494 --- Comment #2 from Conrad Meyer <cem@freebsd.org> --- I think the lengths printed in the bad example are correct; that is a measure of wchar_t's, whereas in LC_CTYPE=C, the length is in bytes. So it seems like it is a comparison problem. I think we invoke wstrcoll() -> bwscoll() in the latter case. bwscoll() seems to be broken for short strings: if (len1 <= offset) return ((len2 <= offset) ? 0 : -1); E.g., $ (echo a耳 ; echo a脳 ; echo a耳) | LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --debug ... ; offset=1 ; k1=<a耳>(2), k2=<a脳>(2); offset=1; s1=<a耳>, s2=<a脳>; cmp1=-256 ; offset=1 ; k1=<a脳>(2), k2=<a耳>(2); offset=1; s1=<a脳>, s2=<a耳>; cmp1=256 ; offset=1 ; k1=<a耳>(2), k2=<a耳>(2); offset=1; s1=<a耳>, s2=<a耳>; cmp1=0 a耳 a耳 a脳 The result is correct, because length (2) < offset (1). I don't know if 'offset' here is wrong, or if bswcoll is wrong. It seems like maybe it only invokes bswcoll() on strings it thinks are identical from a radix perspective. So perhaps the problem is some combination of wcstr and byte_sort in radixsort. In --mergesort mode, the result and comparisons are correct: (echo 耳 ; echo 脳 ; echo 耳) | LC_CTYPE=ja_JP.UTF-8 LC_COLLATE=C LANG=C sort --mergesort --debug Memory to be used for sorting: 17100230656 Using collate rules of C locale Byte sort is used sort_method=mergesort ; k1=<耳>(1), k2=<脳>(1); s1=<耳>, s2=<脳>; cmp1=-256 ; k1=<脳>(1), k2=<耳>(1); s1=<脳>, s2=<耳>; cmp1=256 ; k1=<耳>(1), k2=<耳>(1); s1=<耳>, s2=<耳>; cmp1=0 耳 耳 脳 Something is broken in radixsort. -- You are receiving this mail because: You are the assignee for the bug.help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-247494-227-dOAHDvuAzn>
