Date: Tue, 23 Jun 2020 14:42:57 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 247494] sort(1) order affected by LC_CTYPE Message-ID: <bug-247494-227-dOAHDvuAzn@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-247494-227@https.bugs.freebsd.org/bugzilla/> References: <bug-247494-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D247494 --- Comment #2 from Conrad Meyer <cem@freebsd.org> --- I think the lengths printed in the bad example are correct; that is a measu= re of wchar_t's, whereas in LC_CTYPE=3DC, the length is in bytes. So it seems= like it is a comparison problem. I think we invoke wstrcoll() -> bwscoll() in the latter case. bwscoll() se= ems to be broken for short strings: if (len1 <=3D offset) return ((len2 <=3D offset) ? 0 : -1); E.g., $ (echo a=E8=80=B3 ; echo a=E8=84=B3 ; echo a=E8=80=B3) | LC_CTYPE=3D= ja_JP.UTF-8 LC_COLLATE=3DC LANG=3DC sort --debug ... ; offset=3D1 ; k1=3D<a=E8=80=B3>(2), k2=3D<a=E8=84=B3>(2); offset=3D1; s1=3D<a=E8=80=B3>= , s2=3D<a=E8=84=B3>; cmp1=3D-256 ; offset=3D1 ; k1=3D<a=E8=84=B3>(2), k2=3D<a=E8=80=B3>(2); offset=3D1; s1=3D<a=E8=84=B3>= , s2=3D<a=E8=80=B3>; cmp1=3D256 ; offset=3D1 ; k1=3D<a=E8=80=B3>(2), k2=3D<a=E8=80=B3>(2); offset=3D1; s1=3D<a=E8=80=B3>= , s2=3D<a=E8=80=B3>; cmp1=3D0 a=E8=80=B3 a=E8=80=B3 a=E8=84=B3 The result is correct, because length (2) < offset (1). I don't know if 'offset' here is wrong, or if bswcoll is wrong. It seems like maybe it only invokes bswcoll() on strings it thinks are identical from a radix perspecti= ve.=20 So perhaps the problem is some combination of wcstr and byte_sort in radixs= ort. In --mergesort mode, the result and comparisons are correct: (echo =E8=80=B3 ; echo =E8=84=B3 ; echo =E8=80=B3) | LC_CTYPE=3Dja_JP.UTF-8= LC_COLLATE=3DC LANG=3DC sort --mergesort --debug Memory to be used for sorting: 17100230656 Using collate rules of C locale Byte sort is used sort_method=3Dmergesort ; k1=3D<=E8=80=B3>(1), k2=3D<=E8=84=B3>(1); s1=3D<=E8=80=B3>, s2=3D<=E8=84= =B3>; cmp1=3D-256 ; k1=3D<=E8=84=B3>(1), k2=3D<=E8=80=B3>(1); s1=3D<=E8=84=B3>, s2=3D<=E8=80= =B3>; cmp1=3D256 ; k1=3D<=E8=80=B3>(1), k2=3D<=E8=80=B3>(1); s1=3D<=E8=80=B3>, s2=3D<=E8=80= =B3>; cmp1=3D0 =E8=80=B3 =E8=80=B3 =E8=84=B3 Something is broken in radixsort. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-247494-227-dOAHDvuAzn>