Date: Thu, 22 Feb 2024 01:23:08 +0000 From: Jessica Clarke <jrtc27@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: "src-committers@freebsd.org" <src-committers@freebsd.org>, "dev-commits-src-all@freebsd.org" <dev-commits-src-all@freebsd.org>, "dev-commits-src-main@freebsd.org" <dev-commits-src-main@freebsd.org> Subject: Re: git: 8271d9b99a3b - main - libsys: remove usage of pthread_once and _once_stub Message-ID: <DD27C896-F134-4BC7-97F8-0A13C0F00114@freebsd.org> In-Reply-To: <ZdafOihWozCdmcTx@kib.kiev.ua> References: <202402210029.41L0TOH5000231@gitrepo.freebsd.org> <964A29A2-4C51-4037-8EBE-320008D48AE0@freebsd.org> <ZdYGEGyT3949IZRC@kib.kiev.ua> <4715B319-B7DE-4D06-9F27-00CFE5AF89A7@freebsd.org> <ZdafOihWozCdmcTx@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 22 Feb 2024, at 01:11, Konstantin Belousov <kostikbel@gmail.com> = wrote: >=20 > On Wed, Feb 21, 2024 at 05:23:10PM +0000, Jessica Clarke wrote: >> On 21 Feb 2024, at 14:17, Konstantin Belousov <kostikbel@gmail.com> = wrote: >>>=20 >>> On Wed, Feb 21, 2024 at 12:51:04AM +0000, Jessica Clarke wrote: >>>> On 21 Feb 2024, at 00:29, Konstantin Belousov <kib@FreeBSD.org> = wrote: >>>>>=20 >>>>> The branch main has been updated by kib: >>>>>=20 >>>>> URL: = https://cgit.FreeBSD.org/src/commit/?id=3D8271d9b99a3b98c662ee9a6257a14428= 4b7e1728 >>>>>=20 >>>>> commit 8271d9b99a3b98c662ee9a6257a144284b7e1728 >>>>> Author: Konstantin Belousov <kib@FreeBSD.org> >>>>> AuthorDate: 2024-02-20 14:45:29 +0000 >>>>> Commit: Konstantin Belousov <kib@FreeBSD.org> >>>>> CommitDate: 2024-02-21 00:26:11 +0000 >>>>>=20 >>>>> libsys: remove usage of pthread_once and _once_stub >>>>>=20 >>>>> that existed in auxv.c, use simple bool gate instead. This leaves = a >>>>> small window if two threads try to call _elf_aux_info(3) = simultaneously. >>>>> The situation is safe because auxv parsing is really idempotent. = The >>>>> parsed data is the same, and we store atomic types (int/long/ptr) = so >>>>> double-init does not matter. >>>>=20 >>>> You still need to load acquire and store release aux_once though, >>>> otherwise you can see aux_once as true yet read the pre-initialised >>>> data. In practice that=E2=80=99s surely very hard to hit, but the = code as >>>> written is now wrong. Also, idempotence should probably be made >>>> unnecessary by using 0/1/2 state for uninitialised/initialising/ >>>> initialised, as it=E2=80=99s still technically UB from a C AM = perspective due >>>> to not being data race free if two threads initialise at the same = time. >>>> Better to just do the correct thing rather than risk things going = wrong. >>>=20 >>> There is too much to handle 'in process' state for loosing thread, I = need >>> the whole libthr machinery. >>=20 >> What do you need libthr for? In pseudo-C: >>=20 >> x =3D load_acquire(&aux_once) >> if (__predict_true(x =3D=3D 2)) >> return; >> if (x =3D=3D 1 || !compare_exchange_strong_acquire(&aux_once, &x, 1)) = { >> while (x !=3D 2) { >> yield(); >> x =3D load_acquire(&aux_once) >> } >> return; >> } >> /* initialise as before */ >> store_release(&aux_once, 2); >>=20 >> I believe that=E2=80=99s all you need. Or compare exchange 0 to 1 as = the >> initial operation; makes the source code shorter at the expense of a >> more expensive fast path: >>=20 >> x =3D 0; >> if (__predict_true(!compare_exchange_strong_acquire(&aux_once, &x, = 1)) { >> while (__predict_false(x !=3D 2)) { >> yield(); >> x =3D load_acquire(&aux_once) >> } >> return; >> } >> /* initialise as before */ >> store_release(&aux_once, 2); >>=20 >> I probably have bugs in the above, but you get the gist. > The bug in the fragment above is with the yield(). If low-priority = thread > enters the '1' (in progress) block, and then is preempted by = high-priority > thread also entering init_auxv(), the process would never make a = progress. >=20 > This is why I said that we need libthr (or umtx), to use real locking = and > move the waiting thread off cpu. In kernel, yield can be used in = similar > situations because we can bump the priority, although it is tricky. Yes, priority inversion is an issue here. But this is (without all the C++ abstraction) how libcxxrt implements __cxa_guard_acquire, so if it=E2=80=99s good enough for C++ constructors for static storage = duration objects declared at local scope, surely it=E2=80=99s also good enough = for aux_once? And if it=E2=80=99s not good enough for aux_once then libcxxrt = should be deemed broken... One could easily adapt the above to use UMTX_OP_WAIT/WAKE though. Jess >>> I added the fences, thanks for noting. >>=20 >> Thanks. >>=20 >> Jess >>=20 >>> WRT being UB from pure C, we already have much more assumptions = about >>> atomicity.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DD27C896-F134-4BC7-97F8-0A13C0F00114>