From nobody Sun Jan 28 14:15:56 2024 X-Original-To: emulation@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TND516Pfcz58fvL; Sun, 28 Jan 2024 14:16:37 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mail.madpilot.net (vogon.madpilot.net [159.69.1.99]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TND506gXyz57yS; Sun, 28 Jan 2024 14:16:36 +0000 (UTC) (envelope-from mad@madpilot.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=madpilot.net header.s=bjowvop61wgh header.b="D XbesPe"; dmarc=pass (policy=quarantine) header.from=madpilot.net; spf=pass (mx1.freebsd.org: domain of mad@madpilot.net designates 159.69.1.99 as permitted sender) smtp.mailfrom=mad@madpilot.net Received: from mail (mail [IPv6:fd5c:5351:d272::3]) by mail.madpilot.net (Postfix) with ESMTP id 4TND4H1qHzz6fL4; Sun, 28 Jan 2024 15:15:59 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :content-language:references:from:from:subject:subject:date:date :message-id:received; s=bjowvop61wgh; t=1706451357; x= 1708265758; bh=6i1JPJVn0NDGonSa+fopcBkHZYgpc/ajneNCiXAiIjk=; b=D XbesPeSSwO9WpJtpUhfh7Uwl71osnuTg1fa87GMlZ2BVqTdWwwYlrNfrm0rcCLyT /qmNcYTRAYWt75ICrCuE/WUG5yd0vSdlL4b6x3QEX9KHYmsBJAjqJytizEqy0rJL Pmni2H2U7JvFTBWZPwgK8H23/gucIs4QimrqOfMh6mk9b0YUZ2AJg3d6dcVMlVw4 Er4qV3JfZ0PcLL6O9OB9ZC6xjHNi7a4yikMA+Zux4z7bT5+Vkp7DyRPPv7uWgHLH JnGEgtEQJKV/34pmiKU+7pjnhgKAq06+X0rdEE0J4ueumlZx20DnxQEt28PhGyK3 Pam9ruAL8hUcIvWZFEo1w== Received: from mail.madpilot.net ([IPv6:fd5c:5351:d272::3]) by mail (mail.madpilot.net [IPv6:fd5c:5351:d272::3]) (amavisd-new, port 10026) with ESMTP id c0TspYulRFUS; Sun, 28 Jan 2024 15:15:57 +0100 (CET) Message-ID: <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> Date: Sun, 28 Jan 2024 15:15:56 +0100 Subject: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) From: Guido Falsi To: emulation@FreeBSD.org, "freebsd-arm@freebsd.org" References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> Content-Language: en-US Autocrypt: addr=mad@madpilot.net; keydata= xsBNBE+G+l0BCADi/WBQ0aRJfnE7LBPsM0G3m/m3Yx7OPu4iYFvS84xawmRHtCNjWIntsxuX fptkmEo3Rsw816WUrek8dxoUAYdHd+EcpBcnnDzfDH5LW/TZ4gbrFezrHPdRp7wdxi23GN80 qPwHEwXuF0X4Wy5V0OO8B6VT/nA0ADYnBDhXS52HGIJ/GCUjgqJn+phDTdCFLvrSFdmgx4Wl c0W5Z1p5cmDF9l8L/hc959AeyNf7I9dXnjekGM9gVv7UDUYzCifR3U8T0fnfdMmS8NeI9NC+ wuREpRO4lKOkTnj9TtQJRiptlhcHQiAlG1cFqs7EQo57Tqq6cxD1FycZJLuC32bGbgalABEB AAHNHkd1aWRvIEZhbHNpIDxtYWRAbWFkcGlsb3QubmV0PsLAeAQTAQIAIgUCT4b6XQIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQGuaGDlbL0pOWigf/YVTVf3+ZRnzeGP7CjGV1 Wrrxzjc8h8W64NZasV0XLHGFjl5MYwtm9jJ9gbL8Ubtqstey7lYpjOk2fG6YDhY5eptWCpR6 1QqYrioukhCfKbodSk6PnIZcx719nJVK2P7ihdFEN78TavpBwqIf9hGEcKkMpbRFQv1mYvXD hKVwQGY+8bkH/a/pAWmIyD4qMfKCMurH5DexxEt5SYWu5BB5hd/DWyZ0wuZ+F79KMPzLBPJW 5cpdLNbrvenSqFZGJEGhtTp7GFJJr6lTy8VLBArxmFHiY5jGyR45eZEGDcz86FfGgvPnnpi7 aNCc/ROdF7fnZYPh8uZGGjQbd4EYK4xMzc7BTQRTEHtBARAAoWGsNx6g90r8gcNKaiPpJBiK y8ztV2FyV5LsT0OgQBW3vIxt/odtsxVNNjpyS/BNZCyzLAsFc1WrGBzhYsmPN9SGB5/5YTvk zf5YViU5VAsZlj/MRWCZrWtpic4c0A7N4csOYReNtk/q8YB4PIFsZ9A+kTuoZhnu5t5PdfBA 74+SVwKu84+PZk9wDEY1LbFVT8vM42oKsmoswlIhwJ2xuJI/gbk+cMUe0yiRpNjo4Svw4RB8 4B6uFwdRr/PtS7xi2Zqoof5AaQT9YSBpGpKJOe/Qk5MP4PF6Fqq+go89n77Y2kJkwcHaLoD/ GJ+ZDASIiMRe1y54FHOQ1RCTGGpnJLXdKuGhwv3J21pU8HNlq0ASNQMMQmYAwtUWzjmp/KEy I1qkcmjafcxb8TmiaoK8SQN1Zf96fc/sIrZN6Z5oOCEyyCQ0prH/PTA2jlRkKQ487PTGk2JS KU5VuS57Nlk2DrnvjWp57aV9eFAhpnrrJPuGmFz83/Pc8gC0t7N7i7VVHYRcC5naxYB2UoI1 OUkyxpT/HvQFXXVZ3/KmdXMzrx191AggCPWIwUAP+VcaURSYpeDk6/ZVAOVOe1ChqcJisCD7 wK20/OOvJ2AtkWreGu1CZ9zSx7nK/VYdLr34GxQ4bT1G+9rBQNnFSNbX2TJ431Mdo1GCjDeR K4CtSnrNKYkAEQEAAcLAXwQYAQgACQUCUxB7QQIbDAAKCRAa5oYOVsvSkw3nCADhsKRf+rAR ULTpOh5HoLam62ZJZAyCkNqqu/rke5uj5AaaDY/h7BNhBDiDqhhZLTeofGpVVaErPsWN+tX5 0fypsIt9KAhy90GFrtrIZlWuyK4wsoZvDfp9yaRk+lIM58dw/Rcfxn670JaPTFSRPECVn/uL qBhJSkbYlY212YT9fxVUTJe6wIvDLQrQEjrQD/h1FMhfcLhAqsndltRd6DPvTKeMd/6VAxn0 hkoBKhEy5LkWjM9CHppu+bBkQ91/kj2uJQSXO8euonwHHS3c+6N2i2H7I0emcHGu07wuRB2t Dnw/RLBxohffdPZT2kbxuG7lhVHzwVDw5DRwSw8GkOdy In-Reply-To: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: - X-Spamd-Result: default: False [-2.00 / 15.00]; MISSING_MIME_VERSION(2.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[madpilot.net,quarantine]; R_DKIM_ALLOW(-0.20)[madpilot.net:s=bjowvop61wgh]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; TO_DN_EQ_ADDR_SOME(0.00)[]; MISSING_XM_UA(0.00)[]; ASN(0.00)[asn:24940, ipnet:159.69.0.0/16, country:DE]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[madpilot.net:+]; MID_RHS_MATCH_FROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MLMMJ_DEST(0.00)[emulation@FreeBSD.org,freebsd-arm@freebsd.org]; SUBJECT_HAS_QUESTION(0.00)[] X-Rspamd-Queue-Id: 4TND506gXyz57yS List-Id: Development of Emulators of other operating systems List-Archive: https://lists.freebsd.org/archives/freebsd-emulation List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-emulation@freebsd.org Hi all, again, I have some more findings about this, I'm top posting because the old message is not really that much relevant anymore. I'm now running a machine with head (commit b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere installed in it, machine is amd64, with an arm64 jail, 14.0-RELEASE, installed from official distribution binaries (https download method), with cross tools. To make sure everything is aligned I rebuild everything: updated head, rebuild cross tools in the jail, recompiled all ports for the host architecture and force reinstalled them, especially qemu-user-static, cleaned up all packages for the arm64 jail. If I missed something important please point it out. I have made some more tests and I'm getting python failures in poudriere like the one described below from time to time (don't have hard stats but feels like 50% chance). If I get past that it usually is able to build all the not many packages, but locks up at: Creating repository in /tmp/packages: 0% with nCPUs processes like this: > ps -ax | grep -i pkg 91287 1 I+J 0:00.02 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91288 1 I+J 0:00.02 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91289 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91290 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91291 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91292 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91293 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91294 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91295 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91296 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91297 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91298 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91299 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key 91300 1 I+J 0:00.00 /usr/local/bin/qemu-aarch64-static /.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key And this has hit me 100% of the time up to now. Looks like it is pkg spawning ncpu processes, I'm looking at reducing them, just in case this can sidestep the race/lockup. My suspect is there is some race with quemu-user-static or the APIs it is using, that is triggered by pkg-repo. How can I investigate this? I'm able to reproduce it 100% of the time. BTW these are the pkgs I'm building at present: dns/unbound net-mgmt/vmutils net/kea sysutils/htop sysutils/node_exporter sysutils/tmux (vmutils and node_exporter are go packages and are being skipped since go fails, but I keep them in the list, since I can grab binaries from the official repos, htop I'm going to drop in the near future) Thanks in advance, any help appreciated, especially any suggestions for where to look at and investigation to understand if this is a local problem, or some issue with base/qemu. On 24/01/24 22:10, Guido Falsi wrote: > Hi, > > I recently see a strange failure with python 3.9 in poudriere, it was > not happening a few weeks ago. > > I'm building in poudriere on a head machine running amd64, with a > poudriere jail for arm64, via qemu-user-static. The jail is running 14.0. > > I'm not sure what is going on. > > It fails in the packaging phase with a bunch of errors like: > > =========================================================================== > =================================================== > ===== env: 'PKG_NOTES=build_timestamp ports_top_git_hash > ports_top_checkout_unclean port_git_hash port_checkout_unclean built_by' > 'PKG_NOTE_build_timestamp=2024-01-24T17:07:52+0000' > 'PKG_NOTE_ports_top_git_hash=0816fdcb6ce8' > 'PKG_NOTE_ports_top_checkout_unclean=no' > 'PKG_NOTE_port_git_hash=0816fdcb6ce8' > 'PKG_NOTE_port_checkout_unclean=no' > 'PKG_NOTE_built_by=poudriere-git-3.4.1' NO_DEPENDS=yes USER=root UID=0 > GID=0 > ===>  Building packages for python39-3.9.18 > ===>   Building python39-3.9.18 > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/imaplib.cpython-39.opt-2.pyc:No such file or directory > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/imghdr.cpython-39.opt-2.pyc:No such file or directory > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/imp.cpython-39.opt-2.pyc:No such file or directory > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/inspect.cpython-39.opt-2.pyc:No such file or directory > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/io.cpython-39.opt-2.pyc:No such file or directory > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/ipaddress.cpython-39.opt-2.pyc:No such file or directory > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/mailbox.cpython-39.opt-2.pyc:No such file or directory > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/mailcap.cpython-39.opt-2.pyc:No such file or directory > pkg-static: Unable to access file > /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/mimetypes.cpython-39.opt-2.pyc:No such file or directory > > > > (it's all about 'opt-2.pyc' files) > > > What could have changed? Maybe I'm doing something wrong? Maybe I'm > hitting some qemu-user-static issue on head? > > > Any help appreciated. > > > (full log available if needed) > -- Guido Falsi From nobody Sun Jan 28 19:37:55 2024 X-Original-To: emulation@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TNMD66hvMz599Lk; Sun, 28 Jan 2024 19:38:14 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mail.madpilot.net (vogon.madpilot.net [IPv6:2a01:4f8:1c1c:11e5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TNMD510bKz4rZF; Sun, 28 Jan 2024 19:38:13 +0000 (UTC) (envelope-from mad@madpilot.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=madpilot.net header.s=bjowvop61wgh header.b="n 1iPibM"; dmarc=pass (policy=quarantine) header.from=madpilot.net; spf=pass (mx1.freebsd.org: domain of mad@madpilot.net designates 2a01:4f8:1c1c:11e5::1 as permitted sender) smtp.mailfrom=mad@madpilot.net Received: from mail (mail [IPv6:fd5c:5351:d272::3]) by mail.madpilot.net (Postfix) with ESMTP id 4TNMCq424Nz6fM8; Sun, 28 Jan 2024 20:37:59 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :content-language:references:from:from:subject:subject:date:date :message-id:received; s=bjowvop61wgh; t=1706470676; x= 1708285077; bh=Xg1HXIDh8bkgtOsNbq8i5Uo7BJtwweocewvHUJa5g10=; b=n 1iPibM+r2SOk4CRlMq6BAqhY3EqkIZkQFEre3A+gtr8ur87Jr+53PVjnEINQE610 m49t5j9VDgW5vauZboL1kQOCcJD4plS/9+hum56NFbQgh8kZCqU5yQGQIQ+HZsMw mR50e0VBAVvbc0p6UIAGW16i7uY9OaB0rhHGfq1lpmvFSxa/xrLOpMGYWDpjAs3O zt3cH2XPFMlB+Bn+MZZUM8gdGyCAmke3REvM7zwaSZ16HW8jf/VsDKc7bF2j3Xmh aJW1DYgFDK/FZyZnMpgENBxzm47wQ4wlT/ip7H/xVbYYvTHrDHwDiJ6UXyylIOYt TSlL6IQQef0Oz0e1wz+OA== Received: from mail.madpilot.net ([IPv6:fd5c:5351:d272::3]) by mail (mail.madpilot.net [IPv6:fd5c:5351:d272::3]) (amavisd-new, port 10026) with ESMTP id HqIZkalL-iU1; Sun, 28 Jan 2024 20:37:56 +0100 (CET) Message-ID: <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> Date: Sun, 28 Jan 2024 20:37:55 +0100 Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) From: Guido Falsi To: emulation@FreeBSD.org, "freebsd-arm@freebsd.org" References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> Content-Language: en-US Autocrypt: addr=mad@madpilot.net; keydata= xsBNBE+G+l0BCADi/WBQ0aRJfnE7LBPsM0G3m/m3Yx7OPu4iYFvS84xawmRHtCNjWIntsxuX fptkmEo3Rsw816WUrek8dxoUAYdHd+EcpBcnnDzfDH5LW/TZ4gbrFezrHPdRp7wdxi23GN80 qPwHEwXuF0X4Wy5V0OO8B6VT/nA0ADYnBDhXS52HGIJ/GCUjgqJn+phDTdCFLvrSFdmgx4Wl c0W5Z1p5cmDF9l8L/hc959AeyNf7I9dXnjekGM9gVv7UDUYzCifR3U8T0fnfdMmS8NeI9NC+ wuREpRO4lKOkTnj9TtQJRiptlhcHQiAlG1cFqs7EQo57Tqq6cxD1FycZJLuC32bGbgalABEB AAHNHkd1aWRvIEZhbHNpIDxtYWRAbWFkcGlsb3QubmV0PsLAeAQTAQIAIgUCT4b6XQIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQGuaGDlbL0pOWigf/YVTVf3+ZRnzeGP7CjGV1 Wrrxzjc8h8W64NZasV0XLHGFjl5MYwtm9jJ9gbL8Ubtqstey7lYpjOk2fG6YDhY5eptWCpR6 1QqYrioukhCfKbodSk6PnIZcx719nJVK2P7ihdFEN78TavpBwqIf9hGEcKkMpbRFQv1mYvXD hKVwQGY+8bkH/a/pAWmIyD4qMfKCMurH5DexxEt5SYWu5BB5hd/DWyZ0wuZ+F79KMPzLBPJW 5cpdLNbrvenSqFZGJEGhtTp7GFJJr6lTy8VLBArxmFHiY5jGyR45eZEGDcz86FfGgvPnnpi7 aNCc/ROdF7fnZYPh8uZGGjQbd4EYK4xMzc7BTQRTEHtBARAAoWGsNx6g90r8gcNKaiPpJBiK y8ztV2FyV5LsT0OgQBW3vIxt/odtsxVNNjpyS/BNZCyzLAsFc1WrGBzhYsmPN9SGB5/5YTvk zf5YViU5VAsZlj/MRWCZrWtpic4c0A7N4csOYReNtk/q8YB4PIFsZ9A+kTuoZhnu5t5PdfBA 74+SVwKu84+PZk9wDEY1LbFVT8vM42oKsmoswlIhwJ2xuJI/gbk+cMUe0yiRpNjo4Svw4RB8 4B6uFwdRr/PtS7xi2Zqoof5AaQT9YSBpGpKJOe/Qk5MP4PF6Fqq+go89n77Y2kJkwcHaLoD/ GJ+ZDASIiMRe1y54FHOQ1RCTGGpnJLXdKuGhwv3J21pU8HNlq0ASNQMMQmYAwtUWzjmp/KEy I1qkcmjafcxb8TmiaoK8SQN1Zf96fc/sIrZN6Z5oOCEyyCQ0prH/PTA2jlRkKQ487PTGk2JS KU5VuS57Nlk2DrnvjWp57aV9eFAhpnrrJPuGmFz83/Pc8gC0t7N7i7VVHYRcC5naxYB2UoI1 OUkyxpT/HvQFXXVZ3/KmdXMzrx191AggCPWIwUAP+VcaURSYpeDk6/ZVAOVOe1ChqcJisCD7 wK20/OOvJ2AtkWreGu1CZ9zSx7nK/VYdLr34GxQ4bT1G+9rBQNnFSNbX2TJ431Mdo1GCjDeR K4CtSnrNKYkAEQEAAcLAXwQYAQgACQUCUxB7QQIbDAAKCRAa5oYOVsvSkw3nCADhsKRf+rAR ULTpOh5HoLam62ZJZAyCkNqqu/rke5uj5AaaDY/h7BNhBDiDqhhZLTeofGpVVaErPsWN+tX5 0fypsIt9KAhy90GFrtrIZlWuyK4wsoZvDfp9yaRk+lIM58dw/Rcfxn670JaPTFSRPECVn/uL qBhJSkbYlY212YT9fxVUTJe6wIvDLQrQEjrQD/h1FMhfcLhAqsndltRd6DPvTKeMd/6VAxn0 hkoBKhEy5LkWjM9CHppu+bBkQ91/kj2uJQSXO8euonwHHS3c+6N2i2H7I0emcHGu07wuRB2t Dnw/RLBxohffdPZT2kbxuG7lhVHzwVDw5DRwSw8GkOdy In-Reply-To: <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: - X-Spamd-Result: default: False [-2.00 / 15.00]; MISSING_MIME_VERSION(2.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[madpilot.net,quarantine]; R_DKIM_ALLOW(-0.20)[madpilot.net:s=bjowvop61wgh]; R_SPF_ALLOW(-0.20)[+mx:c]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/32, country:DE]; MIME_TRACE(0.00)[0:+]; SUBJECT_HAS_QUESTION(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MLMMJ_DEST(0.00)[emulation@FreeBSD.org,freebsd-arm@freebsd.org]; DKIM_TRACE(0.00)[madpilot.net:+] X-Rspamd-Queue-Id: 4TNMD510bKz4rZF List-Id: Development of Emulators of other operating systems List-Archive: https://lists.freebsd.org/archives/freebsd-emulation List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-emulation@freebsd.org On 28/01/24 15:15, Guido Falsi wrote: > Hi all, again, > > I have some more findings about this, I'm top posting because the old > message is not really that much relevant anymore. > > I'm now running a machine with head (commit > b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere installed > in it, machine is amd64, with an arm64 jail, 14.0-RELEASE, installed > from official distribution binaries (https download method), with cross > tools. > > To make sure everything is aligned I rebuild everything: updated head, > rebuild cross tools in the jail, recompiled all ports for the host > architecture and force reinstalled them, especially qemu-user-static, > cleaned up all packages for the arm64 jail. > > If I missed something important please point it out. > > I have made some more tests and I'm getting python failures in poudriere > like the one described below from time to time (don't have hard stats > but feels like 50% chance). If I get past that it usually is able to > build all the not many packages, but locks up at: > > Creating repository in /tmp/packages:   0% > BTW, forgot to mention last time this worked without issue was around 20th December. -- Guido Falsi From nobody Sun Jan 28 21:00:47 2024 X-Original-To: emulation@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TNP3M5r4Nz59Hkc for ; Sun, 28 Jan 2024 21:00:47 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TNP3M3C74z4D5q for ; Sun, 28 Jan 2024 21:00:47 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1706475647; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=NQwFZ60Jybu3G9vjTF5cM+SO0ZV0oQi//ygZQUKqoq8=; b=tjuF+Vc4X6Z6fDondPaHu8lwvU6Th8ESjiY3nfC++xgrqtqPCIcEO4IjGaMK+4t2UzszKk GOJ/csdvgZ2EgIJX9q0d58OcDM3DcP/6D2aVrL+E25nUwTB4bwIEBhdrq+d+/dgBW4ddwy MLaphJPpK/HgRpJpDhKbrnbTblsuQitcezcf8PwW9IDaCnrsuU/U8UOVBppLCsrBVw7U1X KgP6CyT7FJl82hylF6knG1YmIe17OcvtKGnymAO30vHzL8rKYFVgqNOCY5QlWxPzqeEsmC YomI7Jyz8pEf0RcDpfnR4DObRYsjLcEJGRbODt/k8/mcWChmqyA5cRdQC/7opQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1706475647; a=rsa-sha256; cv=none; b=wZ1R/b295WHOeT9bDtMsHFIBz0swal07iTSCvyDicxEFa/zkxyh2Dv+7EG2HdyTiBcnHiY JCxIBprtbkUVY3LwthlGNTB0vlR48DkK7Zg3XvRJlZ9I7xYK7G5BsDhm4y/JUx7T3oIzji rW+8YXq34FcJm+itSZkgtpT5EiTnu0wHiRIllIqj3mtr6JNiPs+idCJqX2dBFcQuy1i4fQ YBV4n3b5UDEN0R1lzXeiZBxbjrFIciFgiw/X+mEnh6c4yTzQtXi+99+KywrcSbpjSyPDWL bl59EFaaboP9baMQyQTfU9iVhDpm2OKhZee0IxlwsWrCCkLmxWnUkf7cr1MkYw== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4TNP3M2JWnzNYk for ; Sun, 28 Jan 2024 21:00:47 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 40SL0l3k077152 for ; Sun, 28 Jan 2024 21:00:47 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Received: (from bugzilla@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 40SL0l6p077151 for emulation@FreeBSD.org; Sun, 28 Jan 2024 21:00:47 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <202401282100.40SL0l6p077151@kenobi.freebsd.org> X-Authentication-Warning: kenobi.freebsd.org: bugzilla set sender to bugzilla-noreply@FreeBSD.org using -f From: bugzilla-noreply@FreeBSD.org To: emulation@FreeBSD.org Subject: Problem reports for emulation@FreeBSD.org that need special attention Date: Sun, 28 Jan 2024 21:00:47 +0000 List-Id: Development of Emulators of other operating systems List-Archive: https://lists.freebsd.org/archives/freebsd-emulation List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-emulation@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="17064756473.02eCBba1.69338" Content-Transfer-Encoding: 7bit --17064756473.02eCBba1.69338 Date: Sun, 28 Jan 2024 21:00:47 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 264835 | Linux: USB-IP tool fails to run: setsockopt - IP_ Open | 274330 | emulators/linux_base-c7: update package message New | 269934 | emulators/qemu-user-static does not support capab Open | 219913 | emulators/virtualbox-ose-kmod: if the MAXCPU opti 4 problems total for which you should take action. --17064756473.02eCBba1.69338 Date: Sun, 28 Jan 2024 21:00:47 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8"
The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status      |    Bug Id | Description
------------+-----------+---------------------------------------------------
Open        |    264835 | Linux: USB-IP tool fails to run: setsockopt - IP_
Open        |    274330 | emulators/linux_base-c7: update package message
New         |    269934 | emulators/qemu-user-static does not support capab
Open        |    219913 | emulators/virtualbox-ose-kmod: if the MAXCPU opti

4 problems total for which you should take action.
--17064756473.02eCBba1.69338-- From nobody Sun Jan 28 21:34:34 2024 X-Original-To: emulation@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TNPpX2HJsz59LHm; Sun, 28 Jan 2024 21:34:44 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mail.madpilot.net (vogon.madpilot.net [IPv6:2a01:4f8:1c1c:11e5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TNPpX0vmCz4LYh; Sun, 28 Jan 2024 21:34:44 +0000 (UTC) (envelope-from mad@madpilot.net) Authentication-Results: mx1.freebsd.org; none Received: from mail (mail [IPv6:fd5c:5351:d272::3]) by mail.madpilot.net (Postfix) with ESMTP id 4TNPpR0PjBz6g9X; Sun, 28 Jan 2024 22:34:39 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :from:from:references:content-language:subject:subject:date:date :message-id:received; s=bjowvop61wgh; t=1706477675; x= 1708292076; bh=5Cqr57rf3mXR1Ah5VgWPVeynOHP4RKyAAEqEesVgvHs=; b=p 93OMI3Ynz9j0QkJvl/ek+ZAwTFnbBM4bntfkaagPdfC/MfhEqI1YLrlCYR+EWnPX 4tVHU2Ybshp09BDkr4xYybyER8Mhy6Bd7o1/+YNPNqkUJiXKRtzKQRYrsZ4Q9227 ElsNywITZ7sT6R74c+3LcdqIkChVzU7nRXg8BMBqMwgiGPVhwQn8qlhZg/irRpOE 5nkTlLN0sJr5n5eHJhv0XnY9tPTnZPb8UU1LjSswocw+CfOQm0dklo0h3qEe3+Dc NHvqfCxw2DdFJ23taAU3lOZlpJvqo+K7LjE5zUw3+SrMWLxZAyWSjMUU8NggwLgS c9erE3OwH/MBtLxRk6CLA== Received: from mail.madpilot.net ([IPv6:fd5c:5351:d272::3]) by mail (mail.madpilot.net [IPv6:fd5c:5351:d272::3]) (amavisd-new, port 10026) with ESMTP id ScflYP6VeZzW; Sun, 28 Jan 2024 22:34:35 +0100 (CET) Message-ID: Date: Sun, 28 Jan 2024 22:34:34 +0100 Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) Content-Language: en-US To: Warner Losh Cc: emulation@freebsd.org, "freebsd-arm@freebsd.org" References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> From: Guido Falsi Autocrypt: addr=mad@madpilot.net; keydata= xsBNBE+G+l0BCADi/WBQ0aRJfnE7LBPsM0G3m/m3Yx7OPu4iYFvS84xawmRHtCNjWIntsxuX fptkmEo3Rsw816WUrek8dxoUAYdHd+EcpBcnnDzfDH5LW/TZ4gbrFezrHPdRp7wdxi23GN80 qPwHEwXuF0X4Wy5V0OO8B6VT/nA0ADYnBDhXS52HGIJ/GCUjgqJn+phDTdCFLvrSFdmgx4Wl c0W5Z1p5cmDF9l8L/hc959AeyNf7I9dXnjekGM9gVv7UDUYzCifR3U8T0fnfdMmS8NeI9NC+ wuREpRO4lKOkTnj9TtQJRiptlhcHQiAlG1cFqs7EQo57Tqq6cxD1FycZJLuC32bGbgalABEB AAHNHkd1aWRvIEZhbHNpIDxtYWRAbWFkcGlsb3QubmV0PsLAeAQTAQIAIgUCT4b6XQIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQGuaGDlbL0pOWigf/YVTVf3+ZRnzeGP7CjGV1 Wrrxzjc8h8W64NZasV0XLHGFjl5MYwtm9jJ9gbL8Ubtqstey7lYpjOk2fG6YDhY5eptWCpR6 1QqYrioukhCfKbodSk6PnIZcx719nJVK2P7ihdFEN78TavpBwqIf9hGEcKkMpbRFQv1mYvXD hKVwQGY+8bkH/a/pAWmIyD4qMfKCMurH5DexxEt5SYWu5BB5hd/DWyZ0wuZ+F79KMPzLBPJW 5cpdLNbrvenSqFZGJEGhtTp7GFJJr6lTy8VLBArxmFHiY5jGyR45eZEGDcz86FfGgvPnnpi7 aNCc/ROdF7fnZYPh8uZGGjQbd4EYK4xMzc7BTQRTEHtBARAAoWGsNx6g90r8gcNKaiPpJBiK y8ztV2FyV5LsT0OgQBW3vIxt/odtsxVNNjpyS/BNZCyzLAsFc1WrGBzhYsmPN9SGB5/5YTvk zf5YViU5VAsZlj/MRWCZrWtpic4c0A7N4csOYReNtk/q8YB4PIFsZ9A+kTuoZhnu5t5PdfBA 74+SVwKu84+PZk9wDEY1LbFVT8vM42oKsmoswlIhwJ2xuJI/gbk+cMUe0yiRpNjo4Svw4RB8 4B6uFwdRr/PtS7xi2Zqoof5AaQT9YSBpGpKJOe/Qk5MP4PF6Fqq+go89n77Y2kJkwcHaLoD/ GJ+ZDASIiMRe1y54FHOQ1RCTGGpnJLXdKuGhwv3J21pU8HNlq0ASNQMMQmYAwtUWzjmp/KEy I1qkcmjafcxb8TmiaoK8SQN1Zf96fc/sIrZN6Z5oOCEyyCQ0prH/PTA2jlRkKQ487PTGk2JS KU5VuS57Nlk2DrnvjWp57aV9eFAhpnrrJPuGmFz83/Pc8gC0t7N7i7VVHYRcC5naxYB2UoI1 OUkyxpT/HvQFXXVZ3/KmdXMzrx191AggCPWIwUAP+VcaURSYpeDk6/ZVAOVOe1ChqcJisCD7 wK20/OOvJ2AtkWreGu1CZ9zSx7nK/VYdLr34GxQ4bT1G+9rBQNnFSNbX2TJ431Mdo1GCjDeR K4CtSnrNKYkAEQEAAcLAXwQYAQgACQUCUxB7QQIbDAAKCRAa5oYOVsvSkw3nCADhsKRf+rAR ULTpOh5HoLam62ZJZAyCkNqqu/rke5uj5AaaDY/h7BNhBDiDqhhZLTeofGpVVaErPsWN+tX5 0fypsIt9KAhy90GFrtrIZlWuyK4wsoZvDfp9yaRk+lIM58dw/Rcfxn670JaPTFSRPECVn/uL qBhJSkbYlY212YT9fxVUTJe6wIvDLQrQEjrQD/h1FMhfcLhAqsndltRd6DPvTKeMd/6VAxn0 hkoBKhEy5LkWjM9CHppu+bBkQ91/kj2uJQSXO8euonwHHS3c+6N2i2H7I0emcHGu07wuRB2t Dnw/RLBxohffdPZT2kbxuG7lhVHzwVDw5DRwSw8GkOdy In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4TNPpX0vmCz4LYh X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/32, country:DE] List-Id: Development of Emulators of other operating systems List-Archive: https://lists.freebsd.org/archives/freebsd-emulation List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-emulation@freebsd.org On 28/01/24 22:23, Warner Losh wrote: > > > On Sun, Jan 28, 2024, 12:38 PM Guido Falsi > wrote: > > On 28/01/24 15:15, Guido Falsi wrote: > > Hi all, again, > > > > I have some more findings about this, I'm top posting because the > old > > message is not really that much relevant anymore. > > > > I'm now running a machine with head (commit > > b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere > installed > > in it, machine is amd64, with an arm64 jail, 14.0-RELEASE, installed > > from official distribution binaries (https download method), with > cross > > tools. > > > > To make sure everything is aligned I rebuild everything: updated > head, > > rebuild cross tools in the jail, recompiled all ports for the host > > architecture and force reinstalled them, especially > qemu-user-static, > > cleaned up all packages for the arm64 jail. > > > > If I missed something important please point it out. > > > > I have made some more tests and I'm getting python failures in > poudriere > > like the one described below from time to time (don't have hard > stats > > but feels like 50% chance). If I get past that it usually is able to > > build all the not many packages, but locks up at: > > > > Creating repository in /tmp/packages:   0% > > > > BTW, forgot to mention last time this worked without issue was around > 20th December. > > > I think this is a bsd-user issue. There is a race somewhere in that code > that causes the hangs. I'd love a reproducible test case that is > somewhat smaller than python... there are bigger races with the newer > stuff and I've not had the time to chase it there either. 😞 First of all thanks for your feedback. It encourages me having someone else with better knowledge about this confirm that a race condition is actually a possible cause! Strange this has not been happening up to mid December. My main and fully reproducible use case is actually mostly with pkg. at the end of the run poudriere runs `pkg repo` to create the meta files and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it to 1 worker I see three processes), and then locks up, with all the processes stopping using CPU (ps output is in my message) I guess this can be reproduced with any poudriere repo with at least more than ncpus packages in it. can also be reproduced using `poudriere pkgclean -u ` If that does not work I'm not sure how to reproduce it in other ways, but I can try writing some code mocking what pkg seems to be doing, not an expert at such things, though. -- Guido Falsi From nobody Sun Jan 28 21:43:27 2024 X-Original-To: emulation@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TNQ0j2JxRz59M6N; Sun, 28 Jan 2024 21:43:33 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mail.madpilot.net (vogon.madpilot.net [IPv6:2a01:4f8:1c1c:11e5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TNQ0g0glfz4Mk4; Sun, 28 Jan 2024 21:43:31 +0000 (UTC) (envelope-from mad@madpilot.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=madpilot.net header.s=bjowvop61wgh header.b="l 9lpGVB"; dmarc=pass (policy=quarantine) header.from=madpilot.net; spf=pass (mx1.freebsd.org: domain of mad@madpilot.net designates 2a01:4f8:1c1c:11e5::1 as permitted sender) smtp.mailfrom=mad@madpilot.net Received: from mail (mail [IPv6:fd5c:5351:d272::3]) by mail.madpilot.net (Postfix) with ESMTP id 4TNQ0f1s1Qz6g9X; Sun, 28 Jan 2024 22:43:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:from:from:content-language:subject:subject:date:date :message-id:received; s=bjowvop61wgh; t=1706478208; x= 1708292609; bh=vBI2Qy+rw34UDlVm+kKtnXMPb2StiOt3EGZlI6jndA8=; b=l 9lpGVBdvkFiAS0HNeJJC1rjRQPZ9HpZw7lagS82BTuZwLnOYmjGBz6/84+wn2imT A3a+cSa2eIgg5HdGMKEYKcAznuHraopDlSlIMbuhWXCP0a6jxotmTngNIW4+8qmP s2esVAiVAcIQRDeH0jrOdFKCVen44Uv66RHHxYJP/HJRfZWOyJH9ixN9DbcJ4BWt bXZXPlMCTEnP1S5PhPSpT817x7VUKNWLnoBve2nTKta8evRoZOuDMJGpvLdPQ+mf GDms+gcJIj68veSD2001kEIkmI18weRqCsHPLBioU5RO5FMMRRTpFAETw4XkvBRk XEk+YK6wGnxCavaLmRT9w== Received: from mail.madpilot.net ([IPv6:fd5c:5351:d272::3]) by mail (mail.madpilot.net [IPv6:fd5c:5351:d272::3]) (amavisd-new, port 10026) with ESMTP id TLYOBTSnW0Yw; Sun, 28 Jan 2024 22:43:28 +0100 (CET) Message-ID: <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> Date: Sun, 28 Jan 2024 22:43:27 +0100 Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) Content-Language: en-US From: Guido Falsi To: Warner Losh Cc: emulation@freebsd.org, "freebsd-arm@freebsd.org" References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> Autocrypt: addr=mad@madpilot.net; keydata= xsBNBE+G+l0BCADi/WBQ0aRJfnE7LBPsM0G3m/m3Yx7OPu4iYFvS84xawmRHtCNjWIntsxuX fptkmEo3Rsw816WUrek8dxoUAYdHd+EcpBcnnDzfDH5LW/TZ4gbrFezrHPdRp7wdxi23GN80 qPwHEwXuF0X4Wy5V0OO8B6VT/nA0ADYnBDhXS52HGIJ/GCUjgqJn+phDTdCFLvrSFdmgx4Wl c0W5Z1p5cmDF9l8L/hc959AeyNf7I9dXnjekGM9gVv7UDUYzCifR3U8T0fnfdMmS8NeI9NC+ wuREpRO4lKOkTnj9TtQJRiptlhcHQiAlG1cFqs7EQo57Tqq6cxD1FycZJLuC32bGbgalABEB AAHNHkd1aWRvIEZhbHNpIDxtYWRAbWFkcGlsb3QubmV0PsLAeAQTAQIAIgUCT4b6XQIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQGuaGDlbL0pOWigf/YVTVf3+ZRnzeGP7CjGV1 Wrrxzjc8h8W64NZasV0XLHGFjl5MYwtm9jJ9gbL8Ubtqstey7lYpjOk2fG6YDhY5eptWCpR6 1QqYrioukhCfKbodSk6PnIZcx719nJVK2P7ihdFEN78TavpBwqIf9hGEcKkMpbRFQv1mYvXD hKVwQGY+8bkH/a/pAWmIyD4qMfKCMurH5DexxEt5SYWu5BB5hd/DWyZ0wuZ+F79KMPzLBPJW 5cpdLNbrvenSqFZGJEGhtTp7GFJJr6lTy8VLBArxmFHiY5jGyR45eZEGDcz86FfGgvPnnpi7 aNCc/ROdF7fnZYPh8uZGGjQbd4EYK4xMzc7BTQRTEHtBARAAoWGsNx6g90r8gcNKaiPpJBiK y8ztV2FyV5LsT0OgQBW3vIxt/odtsxVNNjpyS/BNZCyzLAsFc1WrGBzhYsmPN9SGB5/5YTvk zf5YViU5VAsZlj/MRWCZrWtpic4c0A7N4csOYReNtk/q8YB4PIFsZ9A+kTuoZhnu5t5PdfBA 74+SVwKu84+PZk9wDEY1LbFVT8vM42oKsmoswlIhwJ2xuJI/gbk+cMUe0yiRpNjo4Svw4RB8 4B6uFwdRr/PtS7xi2Zqoof5AaQT9YSBpGpKJOe/Qk5MP4PF6Fqq+go89n77Y2kJkwcHaLoD/ GJ+ZDASIiMRe1y54FHOQ1RCTGGpnJLXdKuGhwv3J21pU8HNlq0ASNQMMQmYAwtUWzjmp/KEy I1qkcmjafcxb8TmiaoK8SQN1Zf96fc/sIrZN6Z5oOCEyyCQ0prH/PTA2jlRkKQ487PTGk2JS KU5VuS57Nlk2DrnvjWp57aV9eFAhpnrrJPuGmFz83/Pc8gC0t7N7i7VVHYRcC5naxYB2UoI1 OUkyxpT/HvQFXXVZ3/KmdXMzrx191AggCPWIwUAP+VcaURSYpeDk6/ZVAOVOe1ChqcJisCD7 wK20/OOvJ2AtkWreGu1CZ9zSx7nK/VYdLr34GxQ4bT1G+9rBQNnFSNbX2TJ431Mdo1GCjDeR K4CtSnrNKYkAEQEAAcLAXwQYAQgACQUCUxB7QQIbDAAKCRAa5oYOVsvSkw3nCADhsKRf+rAR ULTpOh5HoLam62ZJZAyCkNqqu/rke5uj5AaaDY/h7BNhBDiDqhhZLTeofGpVVaErPsWN+tX5 0fypsIt9KAhy90GFrtrIZlWuyK4wsoZvDfp9yaRk+lIM58dw/Rcfxn670JaPTFSRPECVn/uL qBhJSkbYlY212YT9fxVUTJe6wIvDLQrQEjrQD/h1FMhfcLhAqsndltRd6DPvTKeMd/6VAxn0 hkoBKhEy5LkWjM9CHppu+bBkQ91/kj2uJQSXO8euonwHHS3c+6N2i2H7I0emcHGu07wuRB2t Dnw/RLBxohffdPZT2kbxuG7lhVHzwVDw5DRwSw8GkOdy In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: - X-Spamd-Result: default: False [-1.99 / 15.00]; MISSING_MIME_VERSION(2.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.995]; DMARC_POLICY_ALLOW(-0.50)[madpilot.net,quarantine]; R_DKIM_ALLOW(-0.20)[madpilot.net:s=bjowvop61wgh]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/32, country:DE]; TO_DN_EQ_ADDR_SOME(0.00)[]; MISSING_XM_UA(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DKIM_TRACE(0.00)[madpilot.net:+]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[emulation@freebsd.org,freebsd-arm@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; SUBJECT_HAS_QUESTION(0.00)[] X-Rspamd-Queue-Id: 4TNQ0g0glfz4Mk4 List-Id: Development of Emulators of other operating systems List-Archive: https://lists.freebsd.org/archives/freebsd-emulation List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-emulation@freebsd.org On 28/01/24 22:34, Guido Falsi wrote: > On 28/01/24 22:23, Warner Losh wrote: >> >> >> On Sun, Jan 28, 2024, 12:38 PM Guido Falsi > > wrote: >> >>     On 28/01/24 15:15, Guido Falsi wrote: >>      > Hi all, again, >>      > >>      > I have some more findings about this, I'm top posting because the >>     old >>      > message is not really that much relevant anymore. >>      > >>      > I'm now running a machine with head (commit >>      > b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere >>     installed >>      > in it, machine is amd64, with an arm64 jail, 14.0-RELEASE, >> installed >>      > from official distribution binaries (https download method), with >>     cross >>      > tools. >>      > >>      > To make sure everything is aligned I rebuild everything: updated >>     head, >>      > rebuild cross tools in the jail, recompiled all ports for the host >>      > architecture and force reinstalled them, especially >>     qemu-user-static, >>      > cleaned up all packages for the arm64 jail. >>      > >>      > If I missed something important please point it out. >>      > >>      > I have made some more tests and I'm getting python failures in >>     poudriere >>      > like the one described below from time to time (don't have hard >>     stats >>      > but feels like 50% chance). If I get past that it usually is >> able to >>      > build all the not many packages, but locks up at: >>      > >>      > Creating repository in /tmp/packages:   0% >>      > >> >>     BTW, forgot to mention last time this worked without issue was around >>     20th December. >> >> >> I think this is a bsd-user issue. There is a race somewhere in that >> code that causes the hangs. I'd love a reproducible test case that is >> somewhat smaller than python... there are bigger races with the newer >> stuff and I've not had the time to chase it there either. 😞 > > First of all thanks for your feedback. It encourages me having someone > else with better knowledge about this confirm that a race condition is > actually a possible cause! > > Strange this has not been happening up to mid December. > > My main and fully reproducible use case is actually mostly with pkg. > > at the end of the run poudriere runs `pkg repo` to create the meta files > and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it > to 1 worker I see three processes), and then locks up, with all the > processes stopping using CPU (ps output is in my message) > > I guess this can be reproduced with any poudriere repo with at least > more than ncpus packages in it. can also be reproduced using `poudriere > pkgclean -u ` > > If that does not work I'm not sure how to reproduce it in other ways, > but I can try  writing some code mocking what pkg seems to be doing, not > an expert at such things, though. > In case it helps further norrow doen things, It looks like the lockup is happening somewhere around here: https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 and/or in the pkg_create_repo_worker() function here: https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 (I'm trying to spare you the time needed to find the actual code being executed, I guess you would have identified this in a few minutes yourself, but I'm trying to make myself useful) -- Guido Falsi