Date: Wed, 8 Jan 2020 19:51:21 -0800 From: Mark Millard <marklmi@yahoo.com> To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Cc: FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: head -356426 based 32-bit powerpc (2 sock PowerMac G4): kyua sys/vm/mlock_test:mlock__copy_on_write_vnode seems stuck Message-ID: <8F556550-A91F-4F47-8182-CF07F35B3EC4@yahoo.com> In-Reply-To: <67C0401B-7135-4B79-912F-6AD2CBD60B6B@yahoo.com> References: <FB2E3099-C59B-47CE-9524-6A1C5D477E82@yahoo.com> <67C0401B-7135-4B79-912F-6AD2CBD60B6B@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[Even the artifacts debug-style kernel gets the problem. More evidence of context extracted and supplied.] On 2020-Jan-8, at 18:09, Mark Millard <marklmi at yahoo.com> wrote: > [I got kernel backtrace information, included.] >=20 > On 2020-Jan-8, at 15:12, Mark Millard <marklmi at yahoo.com> wrote: >=20 >> I've attempted a head -356426 based kyua run on an >> old 2-socket PowerMac G4. The buildworld and >> buildkernel were non-debug. The context has personal >> patches, mostly for dealing with PowerMac issues. >>=20 >> It has had over 180 CPU minutes running for: >>=20 >> sys/vm/mlock_test:mlock__copy_on_write_vnode -> =20 >>=20 >> Normal seems to be way under 0.1 sec on the >> other platforms I've made runs on recently. >>=20 >> Hopefully kyua will time out and continue the >> testing at some point. >>=20 >> The 2 socket (2 cores each) G5 powerpc64 context >> did not have this problem. Nor did the armv7 or >> aarch64 examples (CortexA7, CortexA53, CortexA57, >> and CortexA72). >=20 >=20 > I finally gave up on it after 240 CPU minutes but > could not kill/stop the stuck process. So I > sync'd the file system and tried a "shutdown -r > now" and forced the power off after it got stuck > (no reboot happened). >=20 > After power-up I tried: >=20 > # kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test > sys/vm/mlock_test:mlock__copy_on_write_anon -> passed [0.017s] > sys/vm/mlock_test:mlock__copy_on_write_vnode -> =20 >=20 > and it got stuck again. >=20 > I'll note that ps -auxd shows the likes of: >=20 > root 1120 0.0 0.4 11512 8772 0 I+ 16:38 0:00.62 | | = `-- kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test > root 1124 100.0 0.1 4640 2332 - Rs 16:38 2:57.43 | | = `-- /usr/tests/sys/vm/mlock_test -vunprivileged-user=3Dtests = -r/tmp/kyua.B2pXx8/3/result.atf mlock__copy_on_write_vnode > root 1125 0.0 0.0 4640 620 - TXL 16:38 0:00.00 | | = `-- /usr/tests/sys/vm/mlock_test -vunprivileged-user=3Dtests = -r/tmp/kyua.B2pXx8/3/result.atf mlock__copy_on_write_vnode >=20 >=20 > I got a couple of backtraces from the kernel > via the ddb> prompt : >=20 > pid 1125 was in thread_suspend_switch called via ptracestop. >=20 > I've a couple of examples of pid 1124 (the CPU time > taker): (manually typed from screen images) >=20 > 0xdc9e0520: at mi_switch+0x17c > 0xdc9e0540: at critical_exit_preempt+0x7c > 0xdc9e0560: at powerpc_interrupt+0x1c4 > 0xdc9e0590: at kernel EXI trap by __syncicache+0x5c: > srr1=3D 0x209032 > r1=3D 0xdc9e0650 > cr=3D 0x8822fc22 > xer=3D 0 > ctr=3D 0 > frame=3D0xdc9e0598 > 0xdc9e0650: at 0x5ed67ec > 0xdc9e0660: at moea_sync_icache+0x118 > Note: =46rom here on down is common with the > other example backtrace: > 0xdc9e0690: at pmap_sync_icache+0x98 > 0xdc9e06b0: at vm_sync_icache+0x2c > 0xdc9e06c0: at proc_rwmem+0x13c > 0xdc9e0710: at kern_ptrace+0x76c > 0xdc9e0830: at sys_ptrace+0x12c > 0xdc9e0960: at trap+0xae8 > 0xdc9e0a10: at powerpc_interrupt+0x1ec > 0xdc9e0a40: at use SC trap by 0x4191ea48: > srr1=3D 0x209032 > r1=3D 0xffffc6d0 > cr=3D 0x28000200 > xer=3D 0 > ctr=3D 0x4191ea40 > frame=3D0xdc9e0a48 >=20 > The non-common part of the other backtrace > is: >=20 > 0xdc9e04a0: at intr_event_handle+0xd4 > 0xdc9e04e0: at powerpc_dispatch_intr+0xe0 > 0xdc9e0520: at openpic_dispatch+0x90 > 0xdc9e0540: at powerpc_interrupt+0x128 > 0xdc9e0570: at kernel EXI trap by moea_pvo_find_va: > srr1=3D 0xf032 > r1=3D 0xdc9e0630 > cr=3D 0x4822fc22 > xer=3D 0 > ctr=3D 0 > frame=3D0xdc9e0578 > 0xdc9e0630: at 0x41b76ffc > 0xdc9e0660: at moea_sync_icache+0x100 >=20 > Showing a lock chain showed just one line: >=20 > thread 100151 (pid 1124, mlock_test) running on CPU 0 >=20 > The pcpu output for cpuid 0 metioned: critnest 2 "mlock_test" > when I tried it. >=20 > (After that I did something that locked up the machine, > probably my fault.) >=20 >=20 > It does not look like I can complete a normal kyua > run for a 2-socket 32-bit powerpc. May be someone > else can for some multi-socket 32-bit powerpc to > see if this repeats. >=20 > Single-socket/single-core might prove interesting > as well. Maybe I can try such. It fails on the single-processor/single-core G4 PowerMac that I tested as well. I then installed the kernel*.txz from: https://artifact.ci.freebsd.org/snapshot/head/r356426/powerpc/powerpc/ and rebooted with it via boot -v (to avoid the memory modified-after-free for debug kernels). (I'd forgotten about debug stopping and reporting such.) It still fails: # kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test sys/vm/mlock_test:mlock__copy_on_write_anon -> passed [0.029s] sys/vm/mlock_test:mlock__copy_on_write_vnode -> =20 The debug kernel does not report anything for it. I gather that the test's internal description is: /* * Use ptrace(2) to trigger a copy-on-write fault of a read-only text = page. */ ATF_TC_WITHOUT_HEAD(mlock__copy_on_write_vnode); ATF_TC_BODY(mlock__copy_on_write_vnode, tc) { void *addr; int len; len =3D getpagesize(); addr =3D (void *)((uintptr_t)test_wired_copy_on_write & ~(len - = 1)); test_wired_copy_on_write(addr, len); } where test_wired_copy_on_write is: static void test_wired_copy_on_write(void *addr, size_t len) { int status, val; pid_t pid; pid =3D fork(); if (pid =3D=3D -1) atf_tc_fail("fork() failed: %s", strerror(errno)); if (pid =3D=3D 0) { if (mlock(addr, len) !=3D 0) _exit(1); if (ptrace(PT_TRACE_ME, 0, NULL, 0) !=3D 0) _exit(2); if (raise(SIGSTOP) !=3D 0) _exit(3); if (munlock(addr, len) !=3D 0) _exit(4); _exit(0); } ATF_REQUIRE(waitpid(pid, &status, 0) =3D=3D pid); ATF_REQUIRE_MSG(!WIFEXITED(status), "child exited with status %d", WEXITSTATUS(status)); ATF_REQUIRE(WIFSTOPPED(status)); ATF_REQUIRE(WSTOPSIG(status) =3D=3D SIGSTOP); errno =3D 0; val =3D ptrace(PT_READ_D, pid, addr, 0); ATF_REQUIRE(errno =3D=3D 0); ATF_REQUIRE(ptrace(PT_WRITE_D, pid, addr, val) =3D=3D 0); ATF_REQUIRE(ptrace(PT_CONTINUE, pid, (caddr_t)1, 0) =3D=3D 0); ATF_REQUIRE(waitpid(pid, &status, 0) =3D=3D pid); ATF_REQUIRE(WIFEXITED(status)); ATF_REQUIRE_MSG(WEXITSTATUS(status) =3D=3D 0, "child exited with status %d", WSTOPSIG(status)); } I give my evidence below, but the ptrace(PT_WRITE_D, . . .) looks to be the one involved in the backtraces: It looks like proc_rwmem is using the vm_sync_icache in: /* * If we are writing, then we request vm_fault() to create a = private * copy of each page. Since these copies will not be writeable = by the * process, we must explicity request that they be dirtied. */ writing =3D uio->uio_rw =3D=3D UIO_WRITE; reqprot =3D writing ? VM_PROT_COPY | VM_PROT_READ : = VM_PROT_READ; fault_flags =3D writing ? VM_FAULT_DIRTY : VM_FAULT_NORMAL; . . . /* Make the I-cache coherent for breakpoints. */ if (writing && error =3D=3D 0) { vm_map_lock_read(map); if (vm_map_check_protection(map, pageno, pageno = + PAGE_SIZE, VM_PROT_EXECUTE)) vm_sync_icache(map, uva, len); vm_map_unlock_read(map); } This appears to mean that it got there via kern_ptrace's: case PT_WRITE_I: case PT_WRITE_D: td2->td_dbgflags |=3D TDB_USERWR; PROC_UNLOCK(p); error =3D 0; if (proc_writemem(td, p, (off_t)(uintptr_t)addr, &data, sizeof(int)) !=3D sizeof(int)) error =3D ENOMEM; else CTR3(KTR_PTRACE, "PT_WRITE: pid %d: %p <=3D = %#x", p->p_pid, addr, data); PROC_LOCK(p); break; (via proc_writemem). So the test's ptrace involved in the backtraces seems to be: ATF_REQUIRE(ptrace(PT_WRITE_D, pid, addr, val) =3D=3D 0); (Not that I've a clue what to do with that information at this point.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8F556550-A91F-4F47-8182-CF07F35B3EC4>