From nobody Sun Jan 28 21:43:27 2024 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TNQ0j2JxRz59M6N; Sun, 28 Jan 2024 21:43:33 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mail.madpilot.net (vogon.madpilot.net [IPv6:2a01:4f8:1c1c:11e5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TNQ0g0glfz4Mk4; Sun, 28 Jan 2024 21:43:31 +0000 (UTC) (envelope-from mad@madpilot.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=madpilot.net header.s=bjowvop61wgh header.b="l 9lpGVB"; dmarc=pass (policy=quarantine) header.from=madpilot.net; spf=pass (mx1.freebsd.org: domain of mad@madpilot.net designates 2a01:4f8:1c1c:11e5::1 as permitted sender) smtp.mailfrom=mad@madpilot.net Received: from mail (mail [IPv6:fd5c:5351:d272::3]) by mail.madpilot.net (Postfix) with ESMTP id 4TNQ0f1s1Qz6g9X; Sun, 28 Jan 2024 22:43:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:from:from:content-language:subject:subject:date:date :message-id:received; s=bjowvop61wgh; t=1706478208; x= 1708292609; bh=vBI2Qy+rw34UDlVm+kKtnXMPb2StiOt3EGZlI6jndA8=; b=l 9lpGVBdvkFiAS0HNeJJC1rjRQPZ9HpZw7lagS82BTuZwLnOYmjGBz6/84+wn2imT A3a+cSa2eIgg5HdGMKEYKcAznuHraopDlSlIMbuhWXCP0a6jxotmTngNIW4+8qmP s2esVAiVAcIQRDeH0jrOdFKCVen44Uv66RHHxYJP/HJRfZWOyJH9ixN9DbcJ4BWt bXZXPlMCTEnP1S5PhPSpT817x7VUKNWLnoBve2nTKta8evRoZOuDMJGpvLdPQ+mf GDms+gcJIj68veSD2001kEIkmI18weRqCsHPLBioU5RO5FMMRRTpFAETw4XkvBRk XEk+YK6wGnxCavaLmRT9w== Received: from mail.madpilot.net ([IPv6:fd5c:5351:d272::3]) by mail (mail.madpilot.net [IPv6:fd5c:5351:d272::3]) (amavisd-new, port 10026) with ESMTP id TLYOBTSnW0Yw; Sun, 28 Jan 2024 22:43:28 +0100 (CET) Message-ID: <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> Date: Sun, 28 Jan 2024 22:43:27 +0100 Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) Content-Language: en-US From: Guido Falsi To: Warner Losh Cc: emulation@freebsd.org, "freebsd-arm@freebsd.org" References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> Autocrypt: addr=mad@madpilot.net; keydata= xsBNBE+G+l0BCADi/WBQ0aRJfnE7LBPsM0G3m/m3Yx7OPu4iYFvS84xawmRHtCNjWIntsxuX fptkmEo3Rsw816WUrek8dxoUAYdHd+EcpBcnnDzfDH5LW/TZ4gbrFezrHPdRp7wdxi23GN80 qPwHEwXuF0X4Wy5V0OO8B6VT/nA0ADYnBDhXS52HGIJ/GCUjgqJn+phDTdCFLvrSFdmgx4Wl c0W5Z1p5cmDF9l8L/hc959AeyNf7I9dXnjekGM9gVv7UDUYzCifR3U8T0fnfdMmS8NeI9NC+ wuREpRO4lKOkTnj9TtQJRiptlhcHQiAlG1cFqs7EQo57Tqq6cxD1FycZJLuC32bGbgalABEB AAHNHkd1aWRvIEZhbHNpIDxtYWRAbWFkcGlsb3QubmV0PsLAeAQTAQIAIgUCT4b6XQIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQGuaGDlbL0pOWigf/YVTVf3+ZRnzeGP7CjGV1 Wrrxzjc8h8W64NZasV0XLHGFjl5MYwtm9jJ9gbL8Ubtqstey7lYpjOk2fG6YDhY5eptWCpR6 1QqYrioukhCfKbodSk6PnIZcx719nJVK2P7ihdFEN78TavpBwqIf9hGEcKkMpbRFQv1mYvXD hKVwQGY+8bkH/a/pAWmIyD4qMfKCMurH5DexxEt5SYWu5BB5hd/DWyZ0wuZ+F79KMPzLBPJW 5cpdLNbrvenSqFZGJEGhtTp7GFJJr6lTy8VLBArxmFHiY5jGyR45eZEGDcz86FfGgvPnnpi7 aNCc/ROdF7fnZYPh8uZGGjQbd4EYK4xMzc7BTQRTEHtBARAAoWGsNx6g90r8gcNKaiPpJBiK y8ztV2FyV5LsT0OgQBW3vIxt/odtsxVNNjpyS/BNZCyzLAsFc1WrGBzhYsmPN9SGB5/5YTvk zf5YViU5VAsZlj/MRWCZrWtpic4c0A7N4csOYReNtk/q8YB4PIFsZ9A+kTuoZhnu5t5PdfBA 74+SVwKu84+PZk9wDEY1LbFVT8vM42oKsmoswlIhwJ2xuJI/gbk+cMUe0yiRpNjo4Svw4RB8 4B6uFwdRr/PtS7xi2Zqoof5AaQT9YSBpGpKJOe/Qk5MP4PF6Fqq+go89n77Y2kJkwcHaLoD/ GJ+ZDASIiMRe1y54FHOQ1RCTGGpnJLXdKuGhwv3J21pU8HNlq0ASNQMMQmYAwtUWzjmp/KEy I1qkcmjafcxb8TmiaoK8SQN1Zf96fc/sIrZN6Z5oOCEyyCQ0prH/PTA2jlRkKQ487PTGk2JS KU5VuS57Nlk2DrnvjWp57aV9eFAhpnrrJPuGmFz83/Pc8gC0t7N7i7VVHYRcC5naxYB2UoI1 OUkyxpT/HvQFXXVZ3/KmdXMzrx191AggCPWIwUAP+VcaURSYpeDk6/ZVAOVOe1ChqcJisCD7 wK20/OOvJ2AtkWreGu1CZ9zSx7nK/VYdLr34GxQ4bT1G+9rBQNnFSNbX2TJ431Mdo1GCjDeR K4CtSnrNKYkAEQEAAcLAXwQYAQgACQUCUxB7QQIbDAAKCRAa5oYOVsvSkw3nCADhsKRf+rAR ULTpOh5HoLam62ZJZAyCkNqqu/rke5uj5AaaDY/h7BNhBDiDqhhZLTeofGpVVaErPsWN+tX5 0fypsIt9KAhy90GFrtrIZlWuyK4wsoZvDfp9yaRk+lIM58dw/Rcfxn670JaPTFSRPECVn/uL qBhJSkbYlY212YT9fxVUTJe6wIvDLQrQEjrQD/h1FMhfcLhAqsndltRd6DPvTKeMd/6VAxn0 hkoBKhEy5LkWjM9CHppu+bBkQ91/kj2uJQSXO8euonwHHS3c+6N2i2H7I0emcHGu07wuRB2t Dnw/RLBxohffdPZT2kbxuG7lhVHzwVDw5DRwSw8GkOdy In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: - X-Spamd-Result: default: False [-1.99 / 15.00]; MISSING_MIME_VERSION(2.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.995]; DMARC_POLICY_ALLOW(-0.50)[madpilot.net,quarantine]; R_DKIM_ALLOW(-0.20)[madpilot.net:s=bjowvop61wgh]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/32, country:DE]; TO_DN_EQ_ADDR_SOME(0.00)[]; MISSING_XM_UA(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DKIM_TRACE(0.00)[madpilot.net:+]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[emulation@freebsd.org,freebsd-arm@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; SUBJECT_HAS_QUESTION(0.00)[] X-Rspamd-Queue-Id: 4TNQ0g0glfz4Mk4 List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org On 28/01/24 22:34, Guido Falsi wrote: > On 28/01/24 22:23, Warner Losh wrote: >> >> >> On Sun, Jan 28, 2024, 12:38 PM Guido Falsi > > wrote: >> >>     On 28/01/24 15:15, Guido Falsi wrote: >>      > Hi all, again, >>      > >>      > I have some more findings about this, I'm top posting because the >>     old >>      > message is not really that much relevant anymore. >>      > >>      > I'm now running a machine with head (commit >>      > b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere >>     installed >>      > in it, machine is amd64, with an arm64 jail, 14.0-RELEASE, >> installed >>      > from official distribution binaries (https download method), with >>     cross >>      > tools. >>      > >>      > To make sure everything is aligned I rebuild everything: updated >>     head, >>      > rebuild cross tools in the jail, recompiled all ports for the host >>      > architecture and force reinstalled them, especially >>     qemu-user-static, >>      > cleaned up all packages for the arm64 jail. >>      > >>      > If I missed something important please point it out. >>      > >>      > I have made some more tests and I'm getting python failures in >>     poudriere >>      > like the one described below from time to time (don't have hard >>     stats >>      > but feels like 50% chance). If I get past that it usually is >> able to >>      > build all the not many packages, but locks up at: >>      > >>      > Creating repository in /tmp/packages:   0% >>      > >> >>     BTW, forgot to mention last time this worked without issue was around >>     20th December. >> >> >> I think this is a bsd-user issue. There is a race somewhere in that >> code that causes the hangs. I'd love a reproducible test case that is >> somewhat smaller than python... there are bigger races with the newer >> stuff and I've not had the time to chase it there either. 😞 > > First of all thanks for your feedback. It encourages me having someone > else with better knowledge about this confirm that a race condition is > actually a possible cause! > > Strange this has not been happening up to mid December. > > My main and fully reproducible use case is actually mostly with pkg. > > at the end of the run poudriere runs `pkg repo` to create the meta files > and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it > to 1 worker I see three processes), and then locks up, with all the > processes stopping using CPU (ps output is in my message) > > I guess this can be reproduced with any poudriere repo with at least > more than ncpus packages in it. can also be reproduced using `poudriere > pkgclean -u ` > > If that does not work I'm not sure how to reproduce it in other ways, > but I can try  writing some code mocking what pkg seems to be doing, not > an expert at such things, though. > In case it helps further norrow doen things, It looks like the lockup is happening somewhere around here: https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 and/or in the pkg_create_repo_worker() function here: https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 (I'm trying to spare you the time needed to find the actual code being executed, I guess you would have identified this in a few minutes yourself, but I'm trying to make myself useful) -- Guido Falsi