From owner-freebsd-bugs@freebsd.org Tue Jun 23 16:15:05 2020 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E8803331A4F for ; Tue, 23 Jun 2020 16:15:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 49rrw95xrdz3WG9 for ; Tue, 23 Jun 2020 16:15:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id CA583331A4D; Tue, 23 Jun 2020 16:15:05 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id CA198331A4C for ; Tue, 23 Jun 2020 16:15:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49rrw9527Zz3WCy for ; Tue, 23 Jun 2020 16:15:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id A3D2B2335A for ; Tue, 23 Jun 2020 16:15:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 05NGF5Lu088598 for ; Tue, 23 Jun 2020 16:15:05 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 05NGF5PM088597 for bugs@FreeBSD.org; Tue, 23 Jun 2020 16:15:05 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 247494] sort(1) order affected by LC_CTYPE Date: Tue, 23 Jun 2020 16:15:05 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 12.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: cem@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2020 16:15:06 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D247494 --- Comment #8 from Conrad Meyer --- With this second patch: (echo =E8=80=B3 ; echo =E8=84=B3 ; echo =E8=80=B3) | LC_CTYPE=3Dja_JP.UTF-8= LC_COLLATE=3DC LANG=3DC sort --radixsort --debug Using collate rules of C locale Byte sort is used sort_method=3Dradixsort ; k1=3D<=E8=80=B3>(1), k2=3D<=E8=84=B3>(1); s1=3D<=E8=80=B3>, s2=3D<=E8=84= =B3>; cmp1=3D-256 ; k1=3D<=E8=84=B3>(1), k2=3D<=E8=80=B3>(1); s1=3D<=E8=84=B3>, s2=3D<=E8=80= =B3>; cmp1=3D256 ; k1=3D<=E8=80=B3>(1), k2=3D<=E8=80=B3>(1); s1=3D<=E8=80=B3>, s2=3D<=E8=80= =B3>; cmp1=3D0 =E8=80=B3 =E8=80=B3 =E8=84=B3 Which seems correct. In C mode: (echo =E8=80=B3 ; echo =E8=84=B3 ; echo =E8=80=B3) | LC_CTYPE=3DC LC_COLLAT= E=3DC LANG=3DC sort --radixsort --debug Using collate rules of C locale Byte sort is used sort_method=3Dradixsort ; offset=3D1 ; k1=3D<=E8=80=B3>(3), k2=3D<=E8=84=B3>(3); offset=3D1; s1=3D<=E8=80=B3>, s= 2=3D<=E8=84=B3>; cmp1=3D-4 ; offset=3D1 ; k1=3D<=E8=84=B3>(3), k2=3D<=E8=80=B3>(3); offset=3D1; s1=3D<=E8=84=B3>, s= 2=3D<=E8=80=B3>; cmp1=3D4 ; offset=3D1 ; k1=3D<=E8=80=B3>(3), k2=3D<=E8=80=B3>(3); offset=3D1; s1=3D<=E8=80=B3>, s= 2=3D<=E8=80=B3>; cmp1=3D0 =E8=80=B3 =E8=80=B3 =E8=84=B3 @@ -317,6 +339,7 @@ free_sort_level(struct sort_level *sl) static void run_sort_level_next(struct sort_level *sl) { + const size_t wcfact =3D (MB_CUR_MAX =3D=3D 1) ? 1 : sizeof(wchar_t); struct sort_level *slc; size_t i, sln, tosort_num; @@ -333,8 +360,16 @@ run_sort_level_next(struct sort_level *sl) sort_left_dec(1); goto end; case (2): + /* + * Radixsort only processes a single byte at a time. In wc= har + * mode, this can be a subset of the length of a character. + * list_coll_offset() offset is in units of wchar, not byte= s. + * So to calculate the offset, we must divide by + * sizeof(wchar_t) and round down to the index of the first + * character this level references. + */ if (list_coll_offset(&(sl->tosort[0]), &(sl->tosort[1]), - sl->level) > 0) { + sl->level / wcfact) > 0) { sl->sorted[sl->start_position++] =3D sl->tosort[1]; sl->sorted[sl->start_position] =3D sl->tosort[0]; } else { @@ -348,7 +383,13 @@ run_sort_level_next(struct sort_level *sl) if (TINY_NODE(sl) || (sl->level > 15)) { listcoll_t func; - func =3D get_list_call_func(sl->level); + /* + * Collate comparison offset is in units of + * character-width, so we must divide the level (by= tes) + * by operating character width (wchar_t or char). = See + * longer comment above. + */ + func =3D get_list_call_func(sl->level / wcfact); sl->leaves =3D sl->tosort; sl->leaves_num =3D sl->tosort_num; --=20 You are receiving this mail because: You are the assignee for the bug.=