Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Jun 2020 15:10:38 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 247494] sort(1) order affected by LC_CTYPE
Message-ID:  <bug-247494-227-MnnJMahCSf@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-247494-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-247494-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D247494

--- Comment #4 from Conrad Meyer <cem@freebsd.org> ---
Ok, so radix sort only goes byte-at-a-time; we can't allocate memory for all
wchar_t space (4 GB).  Here are the wchar_t representations of the two
characters:

echo =E8=80=B3=E8=84=B3 | iconv -f utf-8 -t ucs-4 | hd
00000000  00 00 80 33 00 00 81 33                           |...3...3....|
          ^ first     ^ second

It incorrectly looks at the least significant byte of the wchar_t, observes
that 33 =3D=3D 33 and invokes collate to attempt to differentiate the two s=
trings.=20
But using radixsort's level is wrong for bwscoll, which expects an offset in
wchar_t.  Since radixsort has only processed 1/4 of a wchar_t, this is a bo=
gus
offset.

I'm not sure how our radixsort is supposed to work, honestly.  It seems pre=
tty
broken, even for ASCII.  It should be able to bucket multiple keys that sha=
re a
character per level, but it doesn't =E2=80=94 it falls back on comparison i=
n that case.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-247494-227-MnnJMahCSf>