From owner-freebsd-current@freebsd.org Thu Jan 9 03:51:30 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 32CE0226052 for ; Thu, 9 Jan 2020 03:51:30 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic307-8.consmr.mail.gq1.yahoo.com (sonic307-8.consmr.mail.gq1.yahoo.com [98.137.64.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47tXGm62lrz4Xpf for ; Thu, 9 Jan 2020 03:51:28 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: 2I971LsVM1mGJw.aM5rfDnl8ZsxovralUuQBhCp3Wta6RJ.W277fq63voY0hiki ZumSlXFfwOe5LKLbyj8hNeFJMEFocqqBNKuVnvNuI_lqG3iNVVkwdXbfs9eVbluIYH69YTRaLu8T eDx_dZ8l2oerNXWzW7uves9TzP7i6iAuFvt1tri36J7cSVHrZvhkgy5UjI6BZoVf08DJ7ucZgaN7 gAv58dTF5mA7eeHzaBUEUyLf45qrC.D14iZBk6oZRdGRCU7ot6_Hz4WL926SIPpsir8dbq2qTSsc WnnXQGtE7wzTRN4BiBgmTAf3NCnhGz1rC1AL9eNJK8W0ux4NqUJVZeHiXeS5ORDq0scV5fml9tWN DHTcOmUIVV6Zd6Eh8Htm1aRhyONY8va4Jr2j8mFbHCMx3Eu065FaPuQjf0YPWJ7ze_1tscDvCYGt 5jajIlJrZlW1qr1vM7zNRU0Os5U49IrhFG.sQN8ZzpAxdlOXmt9VJDk7g5zjP.xdQFWtB0klnY5V oE7ig3R56wxcdcLCDy7zZj7cinCnA7pxF7L6s6MS_FWGyvBKZdBq2K44OF1c7zWJo.OFMgScCT7q pUmW9_nplsaN1MMQpWP3DwIJ_UaB.PlakJWGb8tOBGpbchoSI4Onl8sGZ3rUrllh_tNSBagJ2SwQ jzwdX2Byc4u8kijnzFhY20qctLw2HcYN5HBy1F96_GVT98VefzYzBYiyjBBMDpteWZDp82P.QNYS qWVJtpUAbZ0L.Ocor9YozPqITh9yy_CtFF2sl8JOrEcA2MQmEWI6.Psw7416sLHb4SDJ3HqSidCL NoBjzlQinSIZ4VEKroaQY5RIeCD166h3g0E5.32v5G5l5DNlmETTi2n4BXC8.McqdPXcdAt_zaWG G_Q7NM0QyLNclN2IUeGpyQYQQrDrqSzP2EuQP0_VrySO1s1pYG_1SPa1kAH3hbjAIupi8yaxw_O8 XgPvEqnTrMD75zNk.DlezMOJ23lLV0ipK_Z0lF26xbqQn3w1JqO0jalC07IkdvM.lBxNck1.qmTD brPbUQBgesLQMfAjeXsRDnyIUJG.gWQoLEpyRKW54dc.fXvxtZUKPAeAICR0NgLo.4qBfzU1wP3c d2T7ND.By4DX6OzO07rbyRF4fX9Eby.ZFPiFyq54BXYFgMN3pXlH.SpOF16MIBUwmrP.a1y2llVj YYazepNH7Oi9TBU.VGDngxqNdZzPQhvABWBK6IQVnRrYDsXxHkGtEVkF4E__Pp_yw02EV2aOC7ZS UDkntsAMifr1rkzEvlKXOhwpZVSLK4DhS7aK1puoFMmbrV2pYsYrWwFxM3C41_orNQ9ScH_3X6Bn RK097qkplW9E9QghdIU_MYRmdJq7xpPCrPQQcypRvDlzZ4oPyGqwCsUWc_VMUzmJ_M2ktui8doyI kmdHw08W.Sqx3vYSxGctiTtC.ea96ESwU Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.gq1.yahoo.com with HTTP; Thu, 9 Jan 2020 03:51:27 +0000 Received: by smtp417.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 866b663d1496ec5d5754fa58afb05d25; Thu, 09 Jan 2020 03:51:21 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) Subject: Re: head -356426 based 32-bit powerpc (2 sock PowerMac G4): kyua sys/vm/mlock_test:mlock__copy_on_write_vnode seems stuck From: Mark Millard In-Reply-To: <67C0401B-7135-4B79-912F-6AD2CBD60B6B@yahoo.com> Date: Wed, 8 Jan 2020 19:51:21 -0800 Cc: FreeBSD Current Content-Transfer-Encoding: quoted-printable Message-Id: <8F556550-A91F-4F47-8182-CF07F35B3EC4@yahoo.com> References: <67C0401B-7135-4B79-912F-6AD2CBD60B6B@yahoo.com> To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3608.40.2.2.4) X-Rspamd-Queue-Id: 47tXGm62lrz4Xpf X-Spamd-Bar: - X-Spamd-Result: default: False [-1.34 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.41)[-0.408,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MIME_GOOD(-0.10)[text/plain]; MV_CASE(0.50)[]; NEURAL_HAM_LONG(-0.43)[-0.434,0]; RCVD_COUNT_TWO(0.00)[2]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[32.64.137.98.list.dnswl.org : 127.0.5.0]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.00)[ip: (6.35), ipnet: 98.137.64.0/21(0.87), asn: 36647(0.70), country: US(-0.05)]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jan 2020 03:51:30 -0000 [Even the artifacts debug-style kernel gets the problem. More evidence of context extracted and supplied.] On 2020-Jan-8, at 18:09, Mark Millard wrote: > [I got kernel backtrace information, included.] >=20 > On 2020-Jan-8, at 15:12, Mark Millard wrote: >=20 >> I've attempted a head -356426 based kyua run on an >> old 2-socket PowerMac G4. The buildworld and >> buildkernel were non-debug. The context has personal >> patches, mostly for dealing with PowerMac issues. >>=20 >> It has had over 180 CPU minutes running for: >>=20 >> sys/vm/mlock_test:mlock__copy_on_write_vnode -> =20 >>=20 >> Normal seems to be way under 0.1 sec on the >> other platforms I've made runs on recently. >>=20 >> Hopefully kyua will time out and continue the >> testing at some point. >>=20 >> The 2 socket (2 cores each) G5 powerpc64 context >> did not have this problem. Nor did the armv7 or >> aarch64 examples (CortexA7, CortexA53, CortexA57, >> and CortexA72). >=20 >=20 > I finally gave up on it after 240 CPU minutes but > could not kill/stop the stuck process. So I > sync'd the file system and tried a "shutdown -r > now" and forced the power off after it got stuck > (no reboot happened). >=20 > After power-up I tried: >=20 > # kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test > sys/vm/mlock_test:mlock__copy_on_write_anon -> passed [0.017s] > sys/vm/mlock_test:mlock__copy_on_write_vnode -> =20 >=20 > and it got stuck again. >=20 > I'll note that ps -auxd shows the likes of: >=20 > root 1120 0.0 0.4 11512 8772 0 I+ 16:38 0:00.62 | | = `-- kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test > root 1124 100.0 0.1 4640 2332 - Rs 16:38 2:57.43 | | = `-- /usr/tests/sys/vm/mlock_test -vunprivileged-user=3Dtests = -r/tmp/kyua.B2pXx8/3/result.atf mlock__copy_on_write_vnode > root 1125 0.0 0.0 4640 620 - TXL 16:38 0:00.00 | | = `-- /usr/tests/sys/vm/mlock_test -vunprivileged-user=3Dtests = -r/tmp/kyua.B2pXx8/3/result.atf mlock__copy_on_write_vnode >=20 >=20 > I got a couple of backtraces from the kernel > via the ddb> prompt : >=20 > pid 1125 was in thread_suspend_switch called via ptracestop. >=20 > I've a couple of examples of pid 1124 (the CPU time > taker): (manually typed from screen images) >=20 > 0xdc9e0520: at mi_switch+0x17c > 0xdc9e0540: at critical_exit_preempt+0x7c > 0xdc9e0560: at powerpc_interrupt+0x1c4 > 0xdc9e0590: at kernel EXI trap by __syncicache+0x5c: > srr1=3D 0x209032 > r1=3D 0xdc9e0650 > cr=3D 0x8822fc22 > xer=3D 0 > ctr=3D 0 > frame=3D0xdc9e0598 > 0xdc9e0650: at 0x5ed67ec > 0xdc9e0660: at moea_sync_icache+0x118 > Note: =46rom here on down is common with the > other example backtrace: > 0xdc9e0690: at pmap_sync_icache+0x98 > 0xdc9e06b0: at vm_sync_icache+0x2c > 0xdc9e06c0: at proc_rwmem+0x13c > 0xdc9e0710: at kern_ptrace+0x76c > 0xdc9e0830: at sys_ptrace+0x12c > 0xdc9e0960: at trap+0xae8 > 0xdc9e0a10: at powerpc_interrupt+0x1ec > 0xdc9e0a40: at use SC trap by 0x4191ea48: > srr1=3D 0x209032 > r1=3D 0xffffc6d0 > cr=3D 0x28000200 > xer=3D 0 > ctr=3D 0x4191ea40 > frame=3D0xdc9e0a48 >=20 > The non-common part of the other backtrace > is: >=20 > 0xdc9e04a0: at intr_event_handle+0xd4 > 0xdc9e04e0: at powerpc_dispatch_intr+0xe0 > 0xdc9e0520: at openpic_dispatch+0x90 > 0xdc9e0540: at powerpc_interrupt+0x128 > 0xdc9e0570: at kernel EXI trap by moea_pvo_find_va: > srr1=3D 0xf032 > r1=3D 0xdc9e0630 > cr=3D 0x4822fc22 > xer=3D 0 > ctr=3D 0 > frame=3D0xdc9e0578 > 0xdc9e0630: at 0x41b76ffc > 0xdc9e0660: at moea_sync_icache+0x100 >=20 > Showing a lock chain showed just one line: >=20 > thread 100151 (pid 1124, mlock_test) running on CPU 0 >=20 > The pcpu output for cpuid 0 metioned: critnest 2 "mlock_test" > when I tried it. >=20 > (After that I did something that locked up the machine, > probably my fault.) >=20 >=20 > It does not look like I can complete a normal kyua > run for a 2-socket 32-bit powerpc. May be someone > else can for some multi-socket 32-bit powerpc to > see if this repeats. >=20 > Single-socket/single-core might prove interesting > as well. Maybe I can try such. It fails on the single-processor/single-core G4 PowerMac that I tested as well. I then installed the kernel*.txz from: https://artifact.ci.freebsd.org/snapshot/head/r356426/powerpc/powerpc/ and rebooted with it via boot -v (to avoid the memory modified-after-free for debug kernels). (I'd forgotten about debug stopping and reporting such.) It still fails: # kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test sys/vm/mlock_test:mlock__copy_on_write_anon -> passed [0.029s] sys/vm/mlock_test:mlock__copy_on_write_vnode -> =20 The debug kernel does not report anything for it. I gather that the test's internal description is: /* * Use ptrace(2) to trigger a copy-on-write fault of a read-only text = page. */ ATF_TC_WITHOUT_HEAD(mlock__copy_on_write_vnode); ATF_TC_BODY(mlock__copy_on_write_vnode, tc) { void *addr; int len; len =3D getpagesize(); addr =3D (void *)((uintptr_t)test_wired_copy_on_write & ~(len - = 1)); test_wired_copy_on_write(addr, len); } where test_wired_copy_on_write is: static void test_wired_copy_on_write(void *addr, size_t len) { int status, val; pid_t pid; pid =3D fork(); if (pid =3D=3D -1) atf_tc_fail("fork() failed: %s", strerror(errno)); if (pid =3D=3D 0) { if (mlock(addr, len) !=3D 0) _exit(1); if (ptrace(PT_TRACE_ME, 0, NULL, 0) !=3D 0) _exit(2); if (raise(SIGSTOP) !=3D 0) _exit(3); if (munlock(addr, len) !=3D 0) _exit(4); _exit(0); } ATF_REQUIRE(waitpid(pid, &status, 0) =3D=3D pid); ATF_REQUIRE_MSG(!WIFEXITED(status), "child exited with status %d", WEXITSTATUS(status)); ATF_REQUIRE(WIFSTOPPED(status)); ATF_REQUIRE(WSTOPSIG(status) =3D=3D SIGSTOP); errno =3D 0; val =3D ptrace(PT_READ_D, pid, addr, 0); ATF_REQUIRE(errno =3D=3D 0); ATF_REQUIRE(ptrace(PT_WRITE_D, pid, addr, val) =3D=3D 0); ATF_REQUIRE(ptrace(PT_CONTINUE, pid, (caddr_t)1, 0) =3D=3D 0); ATF_REQUIRE(waitpid(pid, &status, 0) =3D=3D pid); ATF_REQUIRE(WIFEXITED(status)); ATF_REQUIRE_MSG(WEXITSTATUS(status) =3D=3D 0, "child exited with status %d", WSTOPSIG(status)); } I give my evidence below, but the ptrace(PT_WRITE_D, . . .) looks to be the one involved in the backtraces: It looks like proc_rwmem is using the vm_sync_icache in: /* * If we are writing, then we request vm_fault() to create a = private * copy of each page. Since these copies will not be writeable = by the * process, we must explicity request that they be dirtied. */ writing =3D uio->uio_rw =3D=3D UIO_WRITE; reqprot =3D writing ? VM_PROT_COPY | VM_PROT_READ : = VM_PROT_READ; fault_flags =3D writing ? VM_FAULT_DIRTY : VM_FAULT_NORMAL; . . . /* Make the I-cache coherent for breakpoints. */ if (writing && error =3D=3D 0) { vm_map_lock_read(map); if (vm_map_check_protection(map, pageno, pageno = + PAGE_SIZE, VM_PROT_EXECUTE)) vm_sync_icache(map, uva, len); vm_map_unlock_read(map); } This appears to mean that it got there via kern_ptrace's: case PT_WRITE_I: case PT_WRITE_D: td2->td_dbgflags |=3D TDB_USERWR; PROC_UNLOCK(p); error =3D 0; if (proc_writemem(td, p, (off_t)(uintptr_t)addr, &data, sizeof(int)) !=3D sizeof(int)) error =3D ENOMEM; else CTR3(KTR_PTRACE, "PT_WRITE: pid %d: %p <=3D = %#x", p->p_pid, addr, data); PROC_LOCK(p); break; (via proc_writemem). So the test's ptrace involved in the backtraces seems to be: ATF_REQUIRE(ptrace(PT_WRITE_D, pid, addr, val) =3D=3D 0); (Not that I've a clue what to do with that information at this point.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)