From owner-freebsd-current@freebsd.org Wed Mar 10 05:11:32 2021 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0ACA857DBB7 for ; Wed, 10 Mar 2021 05:11:32 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic314-22.consmr.mail.ne1.yahoo.com (sonic314-22.consmr.mail.ne1.yahoo.com [66.163.189.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4DwKtV6tScz4X7B for ; Wed, 10 Mar 2021 05:11:30 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1615353089; bh=TtD9JWavAagtxCBb29SHV0ka9pD65n95G/AvhGfBflH=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=adWgDPCeQLBG5AUszRffSG2/fedZq6BXSD/YXW5mtzmzRmmyepJMQufAZ3lJI+GZjnVuqNqXHoeVRMUVrcWpLN0TzzUDH7rrSqdnE7Z8J5/O4ggFjX1x8LFN1d7EhWRm0brAotUZl96zh+vQQd098e/y8Oz2jPzB5ZDenNqDl3zX/jkqhqwe/MivcIcjsQANNI/ptqUmCmUUUjONSrEQqskZQBSPmrSnfNb/H+MB5cVLqetObDQKFOHqqlrfkzh70zOkoShwNtSvodZ4l0z7Hmije5JhjkkqLoDP6rvS7Za+q5GJ/+iDnL3GYV25wa6FAjR+0YHvs3ojpSH36NLVQg== X-YMail-OSG: E2R7ArQVM1knT6HOMa7koh1u1JUSEQaNRUq4ANkizAbupB_6qACw2MElQwVTsVK WP9xhBNOKIc.Gz_UWEneHDMRT.vayJkOjcwdSuegUKnkG6cSUdK5Km0Yz9XgTvWF50jH.IdkAcmC w0fqEseWtmpTJOEoGLRQUYJ22nFMw1zSSqfGUG7_pl9C0PtgjBnm.tDCg5AQ5KqhPJxnTwp_0_A2 aT0eD8SthOgK5KzxakusVkfThLQWM0tLAIHq2DnH38kkO5PJXunTSepF.pNzt_GiM5Wl3FN4S_hl OpKHeXfPTDtFzR32g9tuRn7I95x_d3MMGL07ToDGpZMdpF4TRiBybSGVYoYTMr4SCDBbNMdqRkSA 80Lz1YXOr2y81eWG650nQMkg457q0bjbJCUyBQTK8LPnnnpnnrDezZL6c94SDIy_uix4rxh7kjWz 9ee6.BYLXmQk5T3EujTvlr0ZO5XTFq2Z7r1Bs4ubrWaMRD8fNaEzih1e65KboaZ2pZ.xNSQzr0Un oNZiSWlackAXsphwaViXEd_8D2lpYXjs4LzJk9bDXzVa1B_._gnHpKmci0FF.RKJ9Lyc_l0pBDpq BSwg2Y92xIk6GZqJLXNnO6YIZyuAABTw2o3RR6YnBOgrro8K2oCoey7lO4SWBG1fTfzsSO7N6bpC 5TjesOIEr6w7ifs7jtjucoAoLkSh1ANZxA8.1eTNzoDyEcIfuyN.QufHktw5Zx6Sm335kwbMlsWU B26mZEK4rbZMv2Aw.JS5lwxAIbCNJvLQpO5lELXRKsqHddjP872ej13IKuLvXak3.PxlShw2kL3i dcMWlvVdT1NVXNo1FhJuMSveRGECb.jNMy7DO62qYoJir3rFBrteOiQUvTfp1vInuOfaA_UC0c92 z1uUEi0GMMcW4Vljk1zCqbaTd_X_hR_ylhCN_SmiQ.OPCjuYMJQiD_1QTdS6V705wsk3yC4A31cm 3QP.k5YRt3nV7p9TPQW_FlgF6kkBNSTvtfdR7M5K._qc408XX5gEn7xBSAvAzRRiLx62y0Y82VjG N8DmT9Jhn85wNTIFOkmwTV223fV48O2hHr0ECRDcJIMU99WQ5h.lNtCBp_Zu9kKC0WnqIKkVvAj1 FS2jDU4C3DTmqQHRv0Mw3u2FvMndoXbd6xp1LEaB.ZXICDZK3qDHM..wKNi_hQSIC_pOzCKYy5mc _adu86gJqQr2vzmSfe50X.9uIY_jb4dmCA_iWup9eB0hNzMSkKUMol_fstkyXy.ngplJsN6q6DHJ 0knVDYd4jHrMArrGCv641YTeGsKyV8wEu9UXPR_5pykKQvf98F0GR8wpK6dLPQ4sBirjbrLtmqdk q1VcS4wxP7wcBdeqrPJATZwQTcxkac8ozrFZVLM9MFIXtYGqzn4t_kAluVaDTCzpm0Usd5QNEtyB 0UfnNjR3AXYiQqyF3Z_32BiIktHaYWK6tq3whvZ2vo1Tb_eW..VtbvpJfoJ5a4mEJIbyktxdzybL V1pOrMCbf7fU73L32zwnfknxOsAo6ifAu79kMpy7Cx.v_J718crJNF47WYJoEB8WP3yfqLbOwElQ cEVloDEhpgrmmyW_EIxgTRsL1uSKZUvJu1LgcaVbZAOLKINoRBJq2mw2e5j.CvfSYDvfsB3NTTQS wm__aeC7Z.X9G1dbBgIT_k1W8P7rUl7HM81jhJk0J8SEWcrdpre3nkq7BuT7FZFntGD6nLq3fRFg yF9IwNFCaR7g2EKf2UfeJ.pCkZmrQWMvNwrwulKdii0YazuWJBTqkgy2FIsuY2xHq_FgeR_wk00m 5B3YKxVSVumyaDj72OuRfiElWrbuBRa7DzkDO3nrf0f1DPi5VNqgUGTukCelOGj8RJvSZDUTSPpI rRoTWYGNRHgdwX5VT3ZWnjbwfOKwTJuUaKaVe2LnJ75juheikngJ5BCvzkNfVGTfCG3b3MWPfcuW pW8atHjxVx2i1g2wblKjeLxaqOSOy8VTjdVSl7P5jDQHO0T6V0qR4vc4j6vXXulsaHGJi0LyLW6R Nzbh38bWG_xlLEG440eSiqcyjqNCl0A15LExCHZbBGWnXQIF9da3aH7Ynk7pgqAnBhMJS1WXmA66 IOxbma9iRvUMEebrxysEMMEli5Yicoq62zEkViycBXxA4.FfWCVoKS6YAbNMnXZpqcskKlBMT6Ke yIFsGq0DXmCAftxR4aEreUVB_dka0eOM3inPTWLpfpIdIw4hxh9vW4HX4KD6TegVvVG4_bQ9FXfQ qnm0seW83VykF3DbV.AokzaTPlXK4x0jJmQOZwFmjgF1Ql1XMIdyc5stVR9Bx5r3N.ffKYytlE9e pdlwDGPzvAB.HCQzqlGFIwU_fHmC5ZZU0SJfgEHId8pIGvFW2Bay1lM5KMJvlcL0iQvQtXmj5HLl H_JuMlgNhdK6ur8csV9KHvpBFaAPV5DGYi.35FEThjnh5AIj46lPS_bzvSmLrpNsuIFhEnuDXxB2 AabgfcR6EkAq1YQ_eV2xaY7pLzV0EsHrF8_pliynIwGm2UXv_D01ma.ikIJQHoZ2ri6BA9luHgIS 7SuRjVgCmxyb8ZGN49xQ5XD3e_NI3yU04O.drPdonOo_O.Vb3xt4vYfLkD.aSv6Jdbb_95WTw_YA lHeWnEfxbkEND6246DkT2Ar8lHEh1VRJ64SOsQ3TTbPwCaNsiKz2S6_m9I4nyUpdgtKnWTGg8NXU 6b_yLC7t61IZ6.gyJVvK4BZkhGe18xD0.vif1a_RKW3EHApyeEm53HD8soc961Q6emqpuecV8AHq iTis01CY4vg-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic314.consmr.mail.ne1.yahoo.com with HTTP; Wed, 10 Mar 2021 05:11:29 +0000 Received: by smtp421.mail.ne1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 9c7fcc94fb660efc83590cf6e2ebef7a; Wed, 10 Mar 2021 05:11:23 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: FYI: main (bad9fa56620e based): some unexpected SIGSEGV's are tied to interrupted system calls (cortex-a57/a72 fail, cortex-a53/cortex-a7 work) From: Mark Millard In-Reply-To: Date: Tue, 9 Mar 2021 21:11:22 -0800 Cc: Konstantin Belousov Content-Transfer-Encoding: quoted-printable Message-Id: References: To: freebsd-arm , freebsd-current X-Mailer: Apple Mail (2.3654.60.0.2.21) X-Rspamd-Queue-Id: 4DwKtV6tScz4X7B X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; RBL_DBL_DONT_QUERY_IPS(0.00)[66.163.189.148:from]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[66.163.189.148:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[66.163.189.148:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[66.163.189.148:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-current] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2021 05:11:32 -0000 On 2021-Mar-9, at 19:17, Mark Millard wrote: [My only testing context for this has been main, not 13.0. But it might be a 13.0 worry.] > Using the quickest to so-far-reliably-fail type of example from > another thread I used truss to see what happens, here filtered > down to two processes that appear to be involved and only > near the failure. (The overall truss output is huge from the > prior activity in the poudriere bulk relatated activity). Also, > this initiated watching from aarch64 but the failing code is > armv7. >=20 > 83630 100199: #340(0x1,0xffffd18c,0xffffd17c) =3D 0 (0x0) > 83630 100199: #416(0x14,0xffffd1b4,0xffffd19c) =3D 0 (0x0) > 83630 100199: #7(0xffffffff,0xffffd178,0x1,0x0) =3D 0 (0x0) > 83731 100161: #240(0xffffd5f0,0xffffd5f0) =3D 0 (0x0) > 83731 100161: #1(0x0) =20 > 83731 100161: process exit, rval =3D 0 > 83630 100199: SIGNAL 20 (SIGCHLD) code=3DCLD_EXITED pid=3D83731 uid=3D0 = status=3D0 > 83630 100199: #341(0xffffd17c) ERR#4 'Interrupted = system call' > 83630 100199: SIGNAL 11 (SIGSEGV) code=3DSEGV_MAPERR trapno=3D36 = addr=3D0xffffffe1 > 83630 100199: process killed, signal =3D 11 (core dumped) >=20 > As a reminder of the lldb backtrace of the sh.core > and the like: >=20 > (lldb) bt > * thread #1, name =3D 'sh', stop reason =3D signal SIGSEGV > * frame #0: 0xffffe190 > frame #1: 0x00031aa8 sh`waitcmdloop(job=3D0x00064230) at = jobs.c:608:11 > frame #2: 0x00031a24 sh`waitcmd(argc=3D, = argv=3D) at jobs.c:554:13 > frame #3: 0x00028f54 sh`evalcommand(cmd=3D0x400ad0e4, = flags=3D, backcmd=3D0x00000000) at eval.c:1107:16 > frame #4: 0x00027800 sh`evaltree(n=3D0x400ad0e4, = flags=3D) at eval.c:289:4 > frame #5: 0x000344d0 sh`cmdloop(top=3D1) at main.c:221:4 > frame #6: 0x000342f4 sh`main(argc=3D, = argv=3D) at main.c:168:3 > frame #7: 0x0002480c sh`__start(argc=3D8, argv=3D, = env=3D, ps_strings=3D, obj=3D0x400b4004, = cleanup=3D0x40081aa0) at crt1_c.c:92:7 > (lldb) up > frame #1: 0x00031aa8 sh`waitcmdloop(job=3D0x00064230) at jobs.c:608:11 > 605 break; > 606 } > 607 } > -> 608 } while (dowait(DOWAIT_BLOCK | DOWAIT_SIG, = (struct job *)NULL) !=3D -1); > 609 =09 > 610 sig =3D pendingsig_waitcmd; > 611 pendingsig_waitcmd =3D 0; >=20 > (lldb) disass > sh`waitcmdloop: > 0x31a54 <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr} > 0x31a58 <+4>: add r11, sp, #28 > 0x31a5c <+8>: sub sp, sp, #4 > 0x31a60 <+12>: movw r6, #0x3ea0 > 0x31a64 <+16>: movw r7, #0x3e9c > 0x31a68 <+20>: movw r9, #0x4040 > 0x31a6c <+24>: movw r8, #0x3ea4 > 0x31a70 <+28>: mov r4, r0 > 0x31a74 <+32>: movt r6, #0x6 > 0x31a78 <+36>: movt r7, #0x6 > 0x31a7c <+40>: movt r9, #0x6 > 0x31a80 <+44>: mov r10, #0 > 0x31a84 <+48>: movt r8, #0x6 > 0x31a88 <+52>: cmp r4, #0 > 0x31a8c <+56>: beq 0x31ab4 ; <+96> at = jobs.c:590:37 > 0x31a90 <+60>: ldrb r0, [r4, #0x18] > 0x31a94 <+64>: cmp r0, #2 > 0x31a98 <+68>: beq 0x31b84 ; <+304> [inlined] = getjobstatus at jobs.c:575 > 0x31a9c <+72>: mov r0, #3 > 0x31aa0 <+76>: mov r1, #0 > 0x31aa4 <+80>: bl 0x32bcc ; dowait at = jobs.c:1142 > -> 0x31aa8 <+84>: cmn r0, #1 >=20 >=20 > For reference a local context around the > SIGSEGV looks like (all lines in the range > selected): >=20 > . . . > 83833 102738: fcntl(2,F_DUPFD_CLOEXEC,0xa) =3D 10 (0xa) > 83833 102738: = openat(AT_FDCWD,"/dev/null",O_WRONLY|O_CREAT|O_TRUNC,0666) =3D 3 (0x3) > 83833 102738: dup2(3,2) =3D 2 (0x2) > 83833 102738: close(3) =3D 0 (0x0) > 83833 102738: unlink("./.data.json.SYR1bCaL") =3D 0 (0x0) > 83833 102738: dup2(10,2) =3D 2 (0x2) > 83833 102738: close(10) =3D 0 (0x0) > 83833 102738: exit(0x0) =20 > 83833 102738: process exit, rval =3D 0 > 77872 100638: wait4(-1,{ EXITED,val=3D0 },0x0,0x0) =3D 83833 (0x14779) > 77872 100638: fcntl(0,F_DUPFD_CLOEXEC,0xa) =3D 10 (0xa) > 77872 100638: = openat(AT_FDCWD,"/var/run/poudriere/lock-poudriere-shared-json_top.pid",O_= RDONLY,00) =3D 3 (0x3) > 77872 100638: dup2(3,0) =3D 0 (0x0) > 77872 100638: close(3) =3D 0 (0x0) > 77872 100638: fcntl(2,F_DUPFD_CLOEXEC,0xa) =3D 11 (0xb) > 77872 100638: = openat(AT_FDCWD,"/dev/null",O_WRONLY|O_CREAT|O_TRUNC,0666) =3D 3 (0x3) > 77872 100638: dup2(3,2) =3D 2 (0x2) > 77872 100638: close(3) =3D 0 (0x0) > 77872 100638: lseek(0,0x0,SEEK_CUR) =3D 0 (0x0) > 77872 100638: read(0,"77563",1024) =3D 5 (0x5) > 77872 100638: read(0,0xffffffffb9e8,1024) =3D 0 (0x0) > 77872 100638: dup2(10,0) =3D 0 (0x0) > 77872 100638: close(10) =3D 0 (0x0) > 77872 100638: dup2(11,2) =3D 2 (0x2) > 77872 100638: close(11) =3D 0 (0x0) > 77872 100638: fcntl(2,F_DUPFD_CLOEXEC,0xa) =3D 10 (0xa) > 77872 100638: = openat(AT_FDCWD,"/dev/null",O_WRONLY|O_CREAT|O_TRUNC,0666) =3D 3 (0x3) > 77872 100638: dup2(3,2) =3D 2 (0x2) > 77872 100638: close(3) =3D 0 (0x0) > 77872 100638: = rmdir("/var/run/poudriere/lock-poudriere-shared-json_top") =3D 0 (0x0) > 77872 100638: dup2(10,2) =3D 2 (0x2) > 77872 100638: close(10) =3D 0 (0x0) > 77872 100638: sigprocmask(SIG_SETMASK,{ },0x0) =3D 0 (0x0) > 77872 100638: fcntl(2,F_DUPFD_CLOEXEC,0xa) =3D 10 (0xa) > 77872 100638: = openat(AT_FDCWD,"/dev/null",O_WRONLY|O_CREAT|O_TRUNC,0666) =3D 3 (0x3) > 77872 100638: dup2(3,2) =3D 2 (0x2) > 77872 100638: close(3) =3D 0 (0x0) > 77872 100638: sigaction(SIGINFO,{ 0x239c30 SA_RESTART ss_t },{ SIG_DFL = 0x0 ss_t }) =3D 0 (0x0) > 83731 100161: #240(0xffffd5f0,0xffffd5f0) =3D 0 (0x0) > -- UNKNOWN FreeBSD32 SYSCALL 1 -- > 83731 100161: #1(0x0) =20 > 83731 100161: process exit, rval =3D 0 > 83630 100199: SIGNAL 20 (SIGCHLD) code=3DCLD_EXITED pid=3D83731 uid=3D0 = status=3D0 > 83630 100199: #341(0xffffd17c) ERR#4 'Interrupted = system call' > 83630 100199: SIGNAL 11 (SIGSEGV) code=3DSEGV_MAPERR trapno=3D36 = addr=3D0xffffffe1 > 83630 100199: process killed, signal =3D 11 (core dumped) > 83316 100123: #7(0xffffffff,0xffffca58,0x0,0x0) =3D 83630 (0x146ae) > -- UNKNOWN FreeBSD32 SYSCALL 477 -- > 83316 100123: = #477(0x0,0x7000,0x3,0xc001002,0xffffffff,0x40401428,0x0,0x0) =3D = 1077833728 (0x403e7000) > -- UNKNOWN FreeBSD32 SYSCALL 552 -- > 83316 100123: #552(0xffffff9c,0xffffc504,0xffffc908,0x0) ERR#2 'No = such file or directory' > -- UNKNOWN FreeBSD32 SYSCALL 552 -- > 83316 100123: #552(0xffffff9c,0xffffc504,0xffffc908,0x0) ERR#2 'No = such file or directory' > -- UNKNOWN FreeBSD32 SYSCALL 552 -- > 83316 100123: #552(0xffffff9c,0xffffc504,0xffffc908,0x0) ERR#2 'No = such file or directory' > -- UNKNOWN FreeBSD32 SYSCALL 552 -- > 83316 100123: #552(0xffffff9c,0xffffc504,0xffffc908,0x0) ERR#2 'No = such file or directory' > -- UNKNOWN FreeBSD32 SYSCALL 477 -- > 83316 100123: = #477(0x0,0x1000,0x3,0xc001002,0xffffffff,0x40401428,0x0,0x0) =3D = 1077862400 (0x403ee000) > -- UNKNOWN FreeBSD32 SYSCALL 4 -- > 83316 100123: #4(0x2,0x403ee000,0x21) =3D 33 (0x21) > -- UNKNOWN FreeBSD32 SYSCALL 477 -- > 83316 100123: = #477(0x0,0x1000,0x3,0xc001002,0xffffffff,0x40401428,0x0,0x0) =3D = 1077866496 (0x403ef000) > -- UNKNOWN FreeBSD32 SYSCALL 477 -- > 83316 100123: = #477(0x0,0x1000,0x3,0xc001002,0xffffffff,0x40401428,0x0,0x0) =3D = 1077870592 (0x403f0000) > -- UNKNOWN FreeBSD32 SYSCALL 477 -- > 83316 100123: = #477(0x0,0x1000,0x3,0xc001002,0xffffffff,0x40401428,0x0,0x0) =3D = 1077874688 (0x403f1000) > -- UNKNOWN FreeBSD32 SYSCALL 4 -- > 83316 100123: #4(0x1,0x403ef000,0x2e) =3D 46 (0x2e) > -- UNKNOWN FreeBSD32 SYSCALL 542 -- > 83316 100123: #542(0xffffcd54,0x0) =3D 0 (0x0) > -- UNKNOWN FreeBSD32 SYSCALL 2 -- > 83842 100199: > 83316 100123: #2() =3D 83842 (0x14782) > -- UNKNOWN FreeBSD32 SYSCALL 6 -- > -- UNKNOWN FreeBSD32 SYSCALL 6 -- > 83316 100123: #6(0x7) =3D 0 (0x0) > 83842 100199: #6(0x5) =3D 0 (0x0) > . . . Turns out that the failure happens on the processors with out-of-order execution and the like but works on the strictly in-order cortex-a53. (For as much testing as I've done.) So it looks like some form of synchronization is missing that in-order-only does not need. (This would be the 2nd time I've run into such for FreeBSD aarch64 if it holds true. The prior example was fixed a fair time ago.) The testing status . . . Problem replicated using the following contexts to attempt the textproc/itstool build, targeting armv7 (cortex-a7): cortex-a72 aarch64 MACHHIATObin Double Shot cortex-a57 aarch64 OverDrive 1000 (No successful builds for the above 2, all stopping in configure the same way.) No problem using the following to build textproc/itstool, targeting armv7: cortex-a53 aarch64 Rock64 (armv7 on aarch64 case) cortext-a7 armv7 OrangePi+ 2ed (native armv7 case) It will take a long time to run a full poudriere bulk that will build about 200 ports, targeting the cortex-a7 on the slower cortex-a53: days. So further evidence that the cortex-a53 does not get the problem will take a while. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)