From owner-freebsd-bugs@freebsd.org Thu May 13 13:25:20 2021 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id CDB5B640951 for ; Thu, 13 May 2021 13:25:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 4Fgspm5Dtdz4tNl for ; Thu, 13 May 2021 13:25:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id B3BB2640C00; Thu, 13 May 2021 13:25:20 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B1F02640950 for ; Thu, 13 May 2021 13:25:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fgspm4cKhz4tNk for ; Thu, 13 May 2021 13:25:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 902DF1F91B for ; Thu, 13 May 2021 13:25:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 14DDPKAU029512 for ; Thu, 13 May 2021 13:25:20 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 14DDPKrb029511 for bugs@FreeBSD.org; Thu, 13 May 2021 13:25:20 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 255840] __get_locale() is inefficient Date: Thu, 13 May 2021 13:25:20 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: markj@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 May 2021 13:25:20 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D255840 Bug ID: 255840 Summary: __get_locale() is inefficient Product: Base System Version: CURRENT Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: bugs@FreeBSD.org Reporter: markj@FreeBSD.org In libc we have: 195 /**=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20 196 * Returns the current locale for this thread, or the global locale if = none is=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 197 * set. The caller does not have to free the locale. The return value from=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20 198 * this call is not guaranteed to remain valid after the locale changes= .=20 As=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 199 * such, this should only be called within libc functions.=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20 200 */=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20 201 static inline locale_t __get_locale(void)=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 202 {=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20 203=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20 204 if (!__has_thread_locale) {=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 205 return (&__xlocale_global_locale);=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 206 }=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 207 return (__thread_locale ? __thread_locale : &__xlocale_global_locale);=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 208 } Here, __has_thread_locale and __xlocale_globale_locale are global variables= .=20 In the common case, !__has_thread_locale is true. __thread_locale is a thread-local variable. This function is called any time MB_CUR_MAX is loaded, which may happen frequently (see PR 255551 for example). On main, __get_locale() compiles to this: 0x000000080115e300 <+0>: push %rbp 0x000000080115e301 <+1>: mov %rsp,%rbp 0x000000080115e304 <+4>: push %rbx 0x000000080115e305 <+5>: push %rax 0x000000080115e306 <+6>: mov 0x113fbb(%rip),%rbx # 0x80127= 22c8 0x000000080115e30d <+13>: data16 lea 0x113fa3(%rip),%rdi # 0x8012722b8 0x000000080115e315 <+21>: data16 data16 rex.W call 0x8012654b0 <__tls_get_addr@plt> 0x000000080115e31d <+29>: mov (%rax),%rax 0x000000080115e320 <+32>: test %rax,%rax 0x000000080115e323 <+35>: mov 0x113e6e(%rip),%rcx # 0x80127= 2198 0x000000080115e32a <+42>: cmove %rcx,%rax 0x000000080115e32e <+46>: cmpl $0x0,(%rbx) 0x000000080115e331 <+49>: cmove %rcx,%rax 0x000000080115e335 <+53>: mov 0x18(%rax),%rax 0x000000080115e339 <+57>: mov 0x70(%rax),%eax 0x000000080115e33c <+60>: add $0x8,%rsp 0x000000080115e340 <+64>: pop %rbx 0x000000080115e341 <+65>: pop %rbp 0x000000080115e342 <+66>: ret=20=20=20=20 In particular, the address of __thread_locale is obtained even if it isn't going to be used because no threads have set a per-thread locale using uselocale(3). But to obtain this address we have to call into rtld, and the call has a significant cost: a program which performs the comparison MB_CUR= _MAX =3D=3D 1 500,000,000 times runs in about 2.7s on my workstation. With libc modified to split the test of __thread_locale into a separate function, the runtime is reduced to 1.0s. I'm not quite sure why clang compiles __get_locale() this way. I presume it= 's to avoid branches, but it's quite suboptimal. --=20 You are receiving this mail because: You are the assignee for the bug.=