Date: Tue, 23 Jun 2020 15:10:38 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 247494] sort(1) order affected by LC_CTYPE Message-ID: <bug-247494-227-MnnJMahCSf@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-247494-227@https.bugs.freebsd.org/bugzilla/> References: <bug-247494-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247494 --- Comment #4 from Conrad Meyer <cem@freebsd.org> --- Ok, so radix sort only goes byte-at-a-time; we can't allocate memory for all wchar_t space (4 GB). Here are the wchar_t representations of the two characters: echo 耳脳 | iconv -f utf-8 -t ucs-4 | hd 00000000 00 00 80 33 00 00 81 33 |...3...3....| ^ first ^ second It incorrectly looks at the least significant byte of the wchar_t, observes that 33 == 33 and invokes collate to attempt to differentiate the two strings. But using radixsort's level is wrong for bwscoll, which expects an offset in wchar_t. Since radixsort has only processed 1/4 of a wchar_t, this is a bogus offset. I'm not sure how our radixsort is supposed to work, honestly. It seems pretty broken, even for ASCII. It should be able to bucket multiple keys that share a character per level, but it doesn't — it falls back on comparison in that case. -- You are receiving this mail because: You are the assignee for the bug.help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-247494-227-MnnJMahCSf>
