From nobody Mon Jan 29 15:47:57 2024 X-Original-To: emulation@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TNt485mRPz58wGN; Mon, 29 Jan 2024 15:48:08 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mail.madpilot.net (vogon.madpilot.net [159.69.1.99]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TNt474vNhz4jDp; Mon, 29 Jan 2024 15:48:07 +0000 (UTC) (envelope-from mad@madpilot.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=madpilot.net header.s=bjowvop61wgh header.b="n pvkqHQ"; dmarc=pass (policy=quarantine) header.from=madpilot.net; spf=pass (mx1.freebsd.org: domain of mad@madpilot.net designates 159.69.1.99 as permitted sender) smtp.mailfrom=mad@madpilot.net Received: from mail (mail [IPv6:fd5c:5351:d272::3]) by mail.madpilot.net (Postfix) with ESMTP id 4TNt404yLdz6dPK; Mon, 29 Jan 2024 16:48:00 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :content-language:references:from:from:subject:subject:date:date :message-id:received; s=bjowvop61wgh; t=1706543278; x= 1708357679; bh=9UEWlINe9QcufUzkwjCdZY2ErMlEqcXZ6hRYWUSihgs=; b=n pvkqHQ00jx2DwtiSot4W33EBKvxbTxdBbg5maYDZgK1vHXvWw9aTGG8QrpIJTOJO xmsLJOAHS0wwKPUeV1IYof3h9RoxtXDaw+Hj5fkTC9yVWiKqZPQ9fucawgZov6Wf xXlPoIuMTpeeohQhFA6aeDmSdnQ4pwIQQFysxM36BpN49NftTrp5RRLrC3Atn3w5 ynoG+1WQW3O/zpzJ4Rv1VAnCYdaXFs9HQ4bEALsrbWPnuS25HkA/+OCwLUDpx8oK NHK7hZUfk58vs8clO8Ck0z5KFtbocuH6GfRVzRkj8VIhl7Qij5ddAPEXoI96OXEY L0FCVoaw61BoWkjZQwEmQ== Received: from mail.madpilot.net ([IPv6:fd5c:5351:d272::3]) by mail (mail.madpilot.net [IPv6:fd5c:5351:d272::3]) (amavisd-new, port 10026) with ESMTP id ANWgLlsmXZ6F; Mon, 29 Jan 2024 16:47:58 +0100 (CET) Message-ID: <990427ae-0491-463e-92c7-c74700deb6fa@madpilot.net> Date: Mon, 29 Jan 2024 16:47:57 +0100 Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) From: Guido Falsi To: Warner Losh , Nathan Reilly-list Cc: emulation@freebsd.org, "freebsd-arm@freebsd.org" , freebsd-pkg@freebsd.org References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> Content-Language: en-US, it Autocrypt: addr=mad@madpilot.net; keydata= xsBNBE+G+l0BCADi/WBQ0aRJfnE7LBPsM0G3m/m3Yx7OPu4iYFvS84xawmRHtCNjWIntsxuX fptkmEo3Rsw816WUrek8dxoUAYdHd+EcpBcnnDzfDH5LW/TZ4gbrFezrHPdRp7wdxi23GN80 qPwHEwXuF0X4Wy5V0OO8B6VT/nA0ADYnBDhXS52HGIJ/GCUjgqJn+phDTdCFLvrSFdmgx4Wl c0W5Z1p5cmDF9l8L/hc959AeyNf7I9dXnjekGM9gVv7UDUYzCifR3U8T0fnfdMmS8NeI9NC+ wuREpRO4lKOkTnj9TtQJRiptlhcHQiAlG1cFqs7EQo57Tqq6cxD1FycZJLuC32bGbgalABEB AAHNHkd1aWRvIEZhbHNpIDxtYWRAbWFkcGlsb3QubmV0PsLAeQQTAQgAIwIbAwIeAQIXgAUL CQgHAwUVCgkICwQWAgMBBQJS79AgAhkBAAoJEBrmhg5Wy9KTc0kH/RO64ORBlTbTHaUaOj8F Je5O5NU2Pt9Cyt5ZWBRvxntr1zPTJGKRPS9ihlIfqT4ZvEngQGp57EUyFbCpI0UWasTerImM tt5WACnGmCzUTB39UXx8Oy4b1EgWeTJQ747e/F1mQLXTNa6ijRBE9fYlTb4gAkPN88/wVV9v 3PZozKLTg16ghBzHM/P7Lk8L7clPEZChX1FTa/6eSt3nvzfCuTMZbBPJF/ph+q1KyPqRgVfh tyhu5dvgMoPz/ni41IfeSrkJTD5RXzdyGR9q4Z1NYeBsLkRjC4LxKAP5KqUsvlOUjKvO1byj ApYdMarol+IGkaSk9e3zVYAJkWKjn/ni8XbOwU0EUxB7QQEQAKFhrDceoPdK/IHDSmoj6SQY isvM7VdhcleS7E9DoEAVt7yMbf6HbbMVTTY6ckvwTWQssywLBXNVqxgc4WLJjzfUhgef+WE7 5M3+WFYlOVQLGZY/zEVgma1raYnOHNAOzeHLDmEXjbZP6vGAeDyBbGfQPpE7qGYZ7ubeT3Xw QO+PklcCrvOPj2ZPcAxGNS2xVU/LzONqCrJqLMJSIcCdsbiSP4G5PnDFHtMokaTY6OEr8OEQ fOAerhcHUa/z7Uu8YtmaqKH+QGkE/WEgaRqSiTnv0JOTD+DxehaqvoKPPZ++2NpCZMHB2i6A /xifmQwEiIjEXtcueBRzkNUQkxhqZyS13SrhocL9ydtaVPBzZatAEjUDDEJmAMLVFs45qfyh MiNapHJo2n3MW/E5omqCvEkDdWX/en3P7CK2TemeaDghMsgkNKax/z0wNo5UZCkOPOz0xpNi UilOVbkuezZZNg65741qee2lfXhQIaZ66yT7hphc/N/z3PIAtLeze4u1VR2EXAuZ2sWAdlKC NTlJMsaU/x70BV11Wd/ypnVzM68dfdQIIAj1iMFAD/lXGlEUmKXg5Ov2VQDlTntQoanCYrAg +8CttPzjrydgLZFq3hrtQmfc0se5yv1WHS69+BsUOG09RvvawUDZxUjW19kyeN9THaNRgow3 kSuArUp6zSmJABEBAAHCwF8EGAEIAAkFAlMQe0ECGwwACgkQGuaGDlbL0pMN5wgA4bCkX/qw EVC06ToeR6C2putmSWQMgpDaqrv65Hubo+QGmg2P4ewTYQQ4g6oYWS03qHxqVVWhKz7FjfrV +dH8qbCLfSgIcvdBha7ayGZVrsiuMLKGbw36fcmkZPpSDOfHcP0XH8Z+u9CWj0xUkTxAlZ/7 i6gYSUpG2JWNtdmE/X8VVEyXusCLwy0K0BI60A/4dRTIX3C4QKrJ3ZbUXegz70ynjHf+lQMZ 9IZKASoRMuS5FozPQh6abvmwZEPdf5I9riUElzvHrqJ8Bx0t3Pujdoth+yNHpnBxrtO8LkQd rQ58P0SwcaIX33T2U9pG8bhu5YVR88FQ8OQ0cEsPBpDncg== In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: - X-Spamd-Result: default: False [-2.00 / 15.00]; MISSING_MIME_VERSION(2.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; DMARC_POLICY_ALLOW(-0.50)[madpilot.net,quarantine]; R_DKIM_ALLOW(-0.20)[madpilot.net:s=bjowvop61wgh]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; ASN(0.00)[asn:24940, ipnet:159.69.0.0/16, country:DE]; TO_DN_EQ_ADDR_SOME(0.00)[]; MISSING_XM_UA(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DKIM_TRACE(0.00)[madpilot.net:+]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[emulation@freebsd.org,freebsd-arm@freebsd.org,freebsd-pkg@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; SUBJECT_HAS_QUESTION(0.00)[] X-Rspamd-Queue-Id: 4TNt474vNhz4jDp List-Id: Development of Emulators of other operating systems List-Archive: https://lists.freebsd.org/archives/freebsd-emulation List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-emulation@freebsd.org On 29/01/24 09:26, Guido Falsi wrote: > On 29/01/24 02:10, Warner Losh wrote: >> >> >> On Sun, Jan 28, 2024 at 4:45 PM Nathan Reilly-list > > wrote: >> >> >> >>>     On 29 Jan 2024, at 8:43 am, Guido Falsi >>     > wrote: >>>     On 28/01/24 22:34, Guido Falsi wrote: >>>>     On 28/01/24 22:23, Warner Losh wrote: >>>>>     On Sun, Jan 28, 2024, 12:38 PM Guido Falsi >>>>     >>>>     >> wrote: >>>>> >>>>>         On 28/01/24 15:15, Guido Falsi wrote: >>>>>         [snip] >>>>>          > Creating repository in /tmp/packages:   0% >>>>>          > >>>>> >>>>>         BTW, forgot to mention last time this worked without issue >>>>>     was around >>>>>         20th December. >>>>> >>>>> >>>>>     I think this is a bsd-user issue. There is a race somewhere in >>>>>     that code that causes the hangs. I'd love a reproducible test >>>>>     case that is somewhat smaller than python... there are bigger >>>>>     races with the newer stuff and I've not had the time to chase it >>>>>     there either. 😞 >>>>     First of all thanks for your feedback. It encourages me having >>>>     someone else with better knowledge about this confirm that a race >>>>     condition is actually a possible cause! >>>>     Strange this has not been happening up to mid December. >>>>     My main and fully reproducible use case is actually mostly with >>>> pkg. >>>>     at the end of the run poudriere runs `pkg repo` to create the >>>>     meta files and sign the repo. It forks itself (ncpus + 2 I guess, >>>>     even forcing it to 1 worker I see three processes), and then >>>>     locks up, with all the processes stopping using CPU (ps output is >>>>     in my message) >>>>     I guess this can be reproduced with any poudriere repo with at >>>>     least more than ncpus packages in it. can also be reproduced >>>>     using `poudriere pkgclean -u ` >>>>     If that does not work I'm not sure how to reproduce it in other >>>>     ways, but I can try  writing some code mocking what pkg seems to >>>>     be doing, not an expert at such things, though. >>> >>>     In case it helps further norrow doen things, It looks like the >>>     lockup is happening somewhere around here: >>> >>> >>> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 >>> >>>     and/or in the pkg_create_repo_worker() function here: >>> >>> >>> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 >>> >>> >>>     (I'm trying to spare you the time needed to find the actual code >>>     being executed, I guess you would have identified this in a few >>>     minutes yourself, but I'm trying to make myself useful) >> >> >>     There appears to be a GitHub issue for poudriere with this, but >>     seems to be looking in another direction. >> >>     https://github.com/freebsd/poudriere/issues/1009 >>     >> > > This one looks quite similar. > > In my case the ports/pkg are aligned between host and jail, in fact I > have built them from the exact same git checkout. > > I noticed pkg head has been converted to using pthreads instead of fork, > maybe that could help. I will make time to perform some testing. Thanks for pointing me here, it looks like this was "it", in that by fixing this issue it uses native pkg-static, and sidesteps the issue. Unluckily there ARE qemu races and lockups that prevent arm64 pkg-static binary to be correctly emulated by qemu-user-static. such conditions also cause sporadic failures in some ports being built. I filed a PR with a fix for that issue: https://github.com/freebsd/poudriere/pull/1115 -- Guido Falsi