Date: Mon, 29 Jan 2024 16:47:57 +0100 From: Guido Falsi <mad@madpilot.net> To: Warner Losh <imp@bsdimp.com>, Nathan Reilly-list <lists@nreilly.com> Cc: emulation@freebsd.org, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, freebsd-pkg@freebsd.org Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) Message-ID: <990427ae-0491-463e-92c7-c74700deb6fa@madpilot.net> In-Reply-To: <e434f5a7-5730-498e-b54d-b01310f95f7a@madpilot.net> References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <CANCZdfr-0w2EMa=_hFT3p4gFSDO-P1Yf8Vb-1eLiwRVomo1Jfg@mail.gmail.com> <a1845758-3535-4aa0-9274-d3b13dd3801b@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> <D2DD631F-8AED-48B7-8FB3-86F93BA707F2@nreilly.com> <CANCZdfqELPcaCj-d%2BLj_qocR6gMiHp1RL1Y92myq=TnR-W6Y1w@mail.gmail.com> <e434f5a7-5730-498e-b54d-b01310f95f7a@madpilot.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 29/01/24 09:26, Guido Falsi wrote: > On 29/01/24 02:10, Warner Losh wrote: >> >> >> On Sun, Jan 28, 2024 at 4:45 PM Nathan Reilly-list <lists@nreilly.com >> <mailto:lists@nreilly.com>> wrote: >> >> >> >>> On 29 Jan 2024, at 8:43 am, Guido Falsi <mad@madpilot.net >>> <mailto:mad@madpilot.net>> wrote: >>> On 28/01/24 22:34, Guido Falsi wrote: >>>> On 28/01/24 22:23, Warner Losh wrote: >>>>> On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <mad@madpilot.net >>>>> <mailto:mad@madpilot.net> <mailto:mad@madpilot.net >>>>> <mailto:mad@madpilot.net>>> wrote: >>>>> >>>>> On 28/01/24 15:15, Guido Falsi wrote: >>>>> [snip] >>>>> > Creating repository in /tmp/packages: 0% >>>>> > >>>>> >>>>> BTW, forgot to mention last time this worked without issue >>>>> was around >>>>> 20th December. >>>>> >>>>> >>>>> I think this is a bsd-user issue. There is a race somewhere in >>>>> that code that causes the hangs. I'd love a reproducible test >>>>> case that is somewhat smaller than python... there are bigger >>>>> races with the newer stuff and I've not had the time to chase it >>>>> there either. 😞 >>>> First of all thanks for your feedback. It encourages me having >>>> someone else with better knowledge about this confirm that a race >>>> condition is actually a possible cause! >>>> Strange this has not been happening up to mid December. >>>> My main and fully reproducible use case is actually mostly with >>>> pkg. >>>> at the end of the run poudriere runs `pkg repo` to create the >>>> meta files and sign the repo. It forks itself (ncpus + 2 I guess, >>>> even forcing it to 1 worker I see three processes), and then >>>> locks up, with all the processes stopping using CPU (ps output is >>>> in my message) >>>> I guess this can be reproduced with any poudriere repo with at >>>> least more than ncpus packages in it. can also be reproduced >>>> using `poudriere pkgclean -u <etc>` >>>> If that does not work I'm not sure how to reproduce it in other >>>> ways, but I can try writing some code mocking what pkg seems to >>>> be doing, not an expert at such things, though. >>> >>> In case it helps further norrow doen things, It looks like the >>> lockup is happening somewhere around here: >>> >>> >>> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778> >>> >>> and/or in the pkg_create_repo_worker() function here: >>> >>> >>> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341> >>> >>> >>> (I'm trying to spare you the time needed to find the actual code >>> being executed, I guess you would have identified this in a few >>> minutes yourself, but I'm trying to make myself useful) >> >> >> There appears to be a GitHub issue for poudriere with this, but >> seems to be looking in another direction. >> >> https://github.com/freebsd/poudriere/issues/1009 >> <https://github.com/freebsd/poudriere/issues/1009> >> > > This one looks quite similar. > > In my case the ports/pkg are aligned between host and jail, in fact I > have built them from the exact same git checkout. > > I noticed pkg head has been converted to using pthreads instead of fork, > maybe that could help. I will make time to perform some testing. Thanks for pointing me here, it looks like this was "it", in that by fixing this issue it uses native pkg-static, and sidesteps the issue. Unluckily there ARE qemu races and lockups that prevent arm64 pkg-static binary to be correctly emulated by qemu-user-static. such conditions also cause sporadic failures in some ports being built. I filed a PR with a fix for that issue: https://github.com/freebsd/poudriere/pull/1115 -- Guido Falsi <mad@madpilot.net>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?990427ae-0491-463e-92c7-c74700deb6fa>