Date: Sun, 18 Feb 2018 13:25:17 +0100 (CET) From: =?ISO-8859-1?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no> To: FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork Message-ID: <alpine.BSF.2.21.1802181310590.24158@mail.fig.ol.no> In-Reply-To: <DA76F62D-3373-47CA-AD95-DE9BA580772B@yahoo.com> References: <DA76F62D-3373-47CA-AD95-DE9BA580772B@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 17 Feb 2018 17:39-0800, Mark Millard wrote: > This is for FreeBSD running under Hyper-V on a Windows 10 Pro machine. > The FreeBSD "disk" bindings are to SSDs, not the insides of NTFS files. > 29 logical processors assigned to FreeBSD (on a 32-thread Ryzen > Threadripper 1950X). No other Hyper-V use. > > This happened during: > > # ~/sys_build_scripts.amd64-host/make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host.sh check-old DESTDIR=/usr/obj/DESTDIRs/clang-powerpc64-installworld_altbinutils > Script started, output file is /root/sys_typescripts/typescript_make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host-2018-02-17:15:56:20 > >>> Checking for old files > > > (Hand typed from a picture of a window's content > at slighly different times, expect typos:) > > KDB: enter: panic > [thread pid 42170 tid 100752 ] > Stopped at kdb_enter+0x3b: movq $0,kdb_why > db> call doadump > Dumping 4825 out of 110559 MB: ... (omitted) ... > Dump complete > = 0 > > > (The "pid 42170" identification as the process reporting the > issue does not seem to appear in the core.txt.0 file.) > > > # ls -lTdt /var/crash/* > -rw-r--r-- 1 root wheel 100792 Feb 17 16:09:18 2018 /var/crash/core.txt.0 > lrwxr-xr-x 1 root wheel 8 Feb 17 16:09:08 2018 /var/crash/vmcore.last -> vmcore.0 > lrwxr-xr-x 1 root wheel 6 Feb 17 16:09:08 2018 /var/crash/info.last -> info.0 > -rw------- 1 root wheel 5060296704 Feb 17 16:09:08 2018 /var/crash/vmcore.0 > -rw------- 1 root wheel 392 Feb 17 16:08:59 2018 /var/crash/info.0 > -rw-r--r-- 1 root wheel 2 Feb 17 16:08:59 2018 /var/crash/bounds > -rw-r--r-- 1 root wheel 5 Nov 22 04:34:36 2017 /var/crash/minfree > > >From /var/crash/core.txt.0 : > > Unread portion of the kernel message buffer: > spin lock 0xffffffff81b2dcc0 (sched lock 5) held by 0xfffff8011d936560 (tid 100691) too long > panic: spin lock held too long > cpuid = 5 > time = 1518911834 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f10094d0 > vpanic() at vpanic+0x18d/frame 0xfffffe00f1009530 > panic() at panic+0x43/frame 0xfffffe00f1009590 > _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame 0xfffffe00f10095a0 > thread_lock_flags_() at thread_lock_flags_+0xdb/frame 0xfffffe00f1009610 > statclock_cnt() at statclock_cnt+0xdc/frame 0xfffffe00f1009650 > handleevents() at handleevents+0x113/frame 0xfffffe00f10096a0 > timercb() at timercb+0xa9/frame 0xfffffe00f10096f0 > lapic_handle_timer() at lapic_handle_timer+0xa7/frame 0xfffffe00f1009730 > timerint_u() at timerint_u+0x96/frame 0xfffffe00f1009810 > thread_lock_flags_() at thread_lock_flags_+0xc1/frame 0xfffffe00f1009880 > fork1() at fork1+0x1b9f/frame 0xfffffe00f1009930 > sys_vfork() at sys_vfork+0x4c/frame 0xfffffe00f1009980 > amd64_syscall() at amd64_syscall+0xa48/frame 0xfffffe00f1009ab0 > fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffcfc0 > KDB: enter: panic > > __curthread () at ./machine/pcpu.h:230 > 230 __asm("movq %%gs:%1,%0" : "=r" (td) > (kgdb) #0 __curthread () at ./machine/pcpu.h:230 > #1 doadump (textdump=-2122191464) at /usr/src/sys/kern/kern_shutdown.c:347 > #2 0xffffffff8040b42c in db_fncall_generic (addr=<optimized out>, > rv=<optimized out>, nargs=<optimized out>, args=<optimized out>) > at /usr/src/sys/ddb/db_command.c:609 > #3 db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, > dummy3=<optimized out>, dummy4=<optimized out>) > at /usr/src/sys/ddb/db_command.c:657 > #4 0xffffffff8040af79 in db_command (last_cmdp=<optimized out>, > cmd_table=<optimized out>, dopager=<optimized out>) > at /usr/src/sys/ddb/db_command.c:481 > #5 0xffffffff8040acf4 in db_command_loop () > at /usr/src/sys/ddb/db_command.c:534 > #6 0xffffffff8040df9f in db_trap (type=<optimized out>, code=<optimized out>) > at /usr/src/sys/ddb/db_main.c:250 > #7 0xffffffff80b370e3 in kdb_trap (type=3, code=-61456, tf=<optimized out>) > at /usr/src/sys/kern/subr_kdb.c:697 > #8 0xffffffff80fa2c5c in trap (frame=0xfffffe00f1009400) > at /usr/src/sys/amd64/amd64/trap.c:547 > #9 <signal handler called> > #10 kdb_enter (why=0xffffffff811f280b "panic", msg=<optimized out>) > at /usr/src/sys/kern/subr_kdb.c:479 > #11 0xffffffff80aef17a in vpanic (fmt=<optimized out>, ap=0xfffffe00f1009570) > at /usr/src/sys/kern/kern_shutdown.c:801 > #12 0xffffffff80aeefc3 in panic (fmt=0x0) > at /usr/src/sys/kern/kern_shutdown.c:739 > #13 0xffffffff80acfa31 in _mtx_lock_indefinite_check (m=<optimized out>, > ldap=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:1224 > #14 0xffffffff80acfb9b in thread_lock_flags_ (td=0xfffff818719d1000, > opts=<optimized out>, file=<optimized out>, line=<optimized out>) > at /usr/src/sys/kern/kern_mutex.c:913 > #15 0xffffffff80a89d6c in statclock_cnt (cnt=1, usermode=<optimized out>) > at /usr/src/sys/kern/kern_clock.c:768 > #16 0xffffffff810d0003 in handleevents (now=43230207690178, fake=0) > at /usr/src/sys/kern/kern_clocksource.c:196 > #17 0xffffffff810d0709 in timercb (et=0xffffffff81c528e8 <lapic_et>, > arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:353 > #18 0xffffffff8110dad7 in lapic_handle_timer (frame=0xfffffe00f1009740) > at /usr/src/sys/x86/x86/local_apic.c:1305 > #19 0xffffffff80f849d0 in timerint_u () > at /usr/src/sys/amd64/amd64/apic_vector.S:132 > #20 0xfffffe00f1009828 in ?? () > #21 0x000000000000b6b1 in ?? () > #22 0x0000000000002000 in ?? () > #23 0x00000000ffffdfff in ?? () > #24 0x00c11c08e43e7fd5 in ?? () > #25 0x00000000000003e8 in ?? () > #26 0x00000000fffff1eb in ?? () > #27 0xfffffe00f1009828 in ?? () > #28 0xfffffe00f1009810 in ?? () > #29 0x00000000800e6d01 in ?? () > #30 0x0000000000000064 in ?? () > #31 0xfffff8011d936560 in ?? () > #32 0xfffffe00f1009828 in ?? () > #33 0xffffffff81771014 in mtx_delay () > #34 0x0000000000000000 in ?? () > (kgdb) > > . . . > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > . . . > 0 1110 1102 0 33 0 12024 3076 ttyin D+ - 0:00.00 [sh] > 0 1120 1044 0 20 0 18572 7936 select Ds - 0:00.00 [sshd] > 1001 1123 1120 0 20 0 18936 8044 select D - 0:00.00 [sshd] > 1001 1124 1123 0 34 0 12120 3196 wait Ds - 0:00.00 [sh] > 0 1134 1124 0 22 0 12060 3148 wait D - 0:00.00 [su] > 0 1135 1134 0 20 0 12312 3244 wait D - 0:00.00 [sh] > 0 42072 1135 0 25 0 11464 3060 wait D+ - 0:00.00 [sh] > 0 42072 1135 0 25 0 11464 3060 wait D+ - 0:00.00 [sh] > 0 42075 42072 0 20 0 10928 2480 select D+ - 0:00.00 [script] > 0 42076 42075 0 52 0 10160 1396 wait Ds+ - 0:00.00 [make] > 0 42108 42076 0 52 0 12236 3224 wait D+ - 0:00.00 [make] > 0 42168 42108 0 52 0 11496 3068 wait D+ - 0:00.00 [sh] > 0 42169 42168 0 52 0 12608 3568 pipewr D+ - 0:00.00 [make] > 0 42170 42168 0 72 0 10956 2284 - R+ - 0:00.00 [xargs] > 0 42171 42168 0 35 0 11500 3064 piperd D+ - 0:00.00 [sh] > 0 46769 42170 0 72 0 10956 2284 - ?VL+ - 0:00.00 [xargs] > . . . > > > > > Context details: > > # uname -apKU > FreeBSD FBSDFSSD 12.0-CURRENT FreeBSD 12.0-CURRENT r329465M amd64 amd64 1200058 1200058 > > # svnlite status /usr/src | sort > ? /usr/src/sys/amd64/conf/GENERIC-DBG > ? /usr/src/sys/amd64/conf/GENERIC-NODBG > ? /usr/src/sys/arm/conf/GENERIC-DBG > ? /usr/src/sys/arm/conf/GENERIC-NODBG > ? /usr/src/sys/arm64/conf/GENERIC-DBG > ? /usr/src/sys/arm64/conf/GENERIC-NODBG > ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG > ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG > ? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG > ? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG > M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp > M /usr/src/contrib/llvm/tools/lld/ELF/Arch/PPC64.cpp > M /usr/src/crypto/openssl/crypto/armcap.c > M /usr/src/lib/libkvm/kvm_powerpc.c > M /usr/src/lib/libkvm/kvm_private.c > M /usr/src/stand/defs.mk > M /usr/src/stand/powerpc/boot1.chrp/Makefile > M /usr/src/stand/powerpc/kboot/Makefile > M /usr/src/sys/arm64/arm64/identcpu.c > M /usr/src/sys/conf/kmod.mk > M /usr/src/sys/conf/ldscript.powerpc > M /usr/src/sys/kern/subr_pcpu.c > M /usr/src/sys/powerpc/aim/mmu_oea64.c > M /usr/src/sys/powerpc/ofw/ofw_machdep.c > M /usr/src/sys/powerpc/powerpc/interrupt.c > M /usr/src/sys/powerpc/powerpc/mp_machdep.c > M /usr/src/sys/powerpc/powerpc/trap.c > M /usr/src/sys/vm/uma_core.c > M /usr/src/usr.bin/top/machine.c > > The GENERIC* files include GENERIC and then set explicit > debug status choices of mine. Most of the rest is tied to > historical powerpc and powerpc64 investigations. I also > have top report the maximum swap-in-use figure that it > sees during its run. > > # svnlite diff /usr/src/sys/vm/uma_core.c > Index: /usr/src/sys/vm/uma_core.c > =================================================================== > --- /usr/src/sys/vm/uma_core.c (revision 329465) > +++ /usr/src/sys/vm/uma_core.c (working copy) > @@ -3428,7 +3428,7 @@ > void > uma_reclaim_wakeup(void) > { > - > +printf("limit %lX, total %lX, needed %d\n", uma_kmem_limit, uma_kmem_total, uma_reclaim_needed); > if (atomic_fetchadd_int(&uma_reclaim_needed, 1) == 0) > wakeup(uma_reclaim); > } > > > > Side note: It took the automatic fsck and 2 manual fsck > runs to get back to a clean status (before I could get to > multi-user). The same thing happened to me on a VirtualBox VM running base/head r329515. This VM previously ran r329101 and I can easily revert to this version where the problem doesn't exist. See https://ximalas.info/~trond/base-head-r329515/ for core.txt.0 and info.0. Crash dump of roughly 1 GiB is available on request. I achieved an uptime of roughly 26 minutes before the panic set in. I was in the process of upgrading print/texinfo using portmaster when the crash occurred. xargs was the running process. Unread portion of the kernel message buffer: spin lock 0xffffffff80e76e80 (sched lock 0) held by 0xfffff800c58d4000 (tid 100746) too long panic: spin lock held too long cpuid = 0 time = 1518955069 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff804d281b = db_trace_self_wrapper+0x2b/frame 0xfffffe004b204610 vpanic() at 0xffffffff807106cd = vpanic+0x18d/frame 0xfffffe004b204670 panic() at 0xffffffff80710533 = panic+0x43/frame 0xfffffe004b2046d0 _mtx_lock_indefinite_check() at 0xffffffff806f0fa1 = _mtx_lock_indefinite_check+0x71/frame 0xfffffe004b2046e0 thread_lock_flags_() at 0xffffffff806f110b = thread_lock_flags_+0xdb/frame 0xfffffe004b204750 statclock_cnt() at 0xffffffff806ab0dc = statclock_cnt+0xdc/frame 0xfffffe004b204790 handleevents() at 0xffffffff80a72d73 = handleevents+0x113/frame 0xfffffe004b2047e0 timercb() at 0xffffffff80a73479 = timercb+0xa9/frame 0xfffffe004b204830 lapic_handle_timer() at 0xffffffff80ad6237 = lapic_handle_timer+0xa7/frame 0xfffffe004b204870 timerint_u() at 0xffffffff80a1f080 = timerint_u+0x96/frame 0xfffffe004b204950 thread_lock_flags_() at 0xffffffff806f10f1 = thread_lock_flags_+0xc1/frame 0xfffffe004b2049c0 fork1() at 0xffffffff806d124f = fork1+0x1b9f/frame 0xfffffe004b204a70 sys_vfork() at 0xffffffff806d17ec = sys_vfork+0x4c/frame 0xfffffe004b204ac0 amd64_syscall() at 0xffffffff80a3d6c8 = amd64_syscall+0xa48/frame 0xfffffe004b204bf0 fast_syscall_common() at 0xffffffff80a1e03d = fast_syscall_common+0x101/frame 0x7fffffffe000 Uptime: 26m5s Dumping 1003 out of 6123 MB:..2%..12%..21%..31%..42%..52%..61%..71%..82%..91% __curthread () at ./machine/pcpu.h:230 230 ./machine/pcpu.h: No such file or directory. (kgdb) #0 __curthread () at ./machine/pcpu.h:230 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:347 #2 0xffffffff80710200 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416 #3 0xffffffff8071072d in vpanic (fmt=<optimized out>, ap=0xfffffe004b2046b0) at /usr/src/sys/kern/kern_shutdown.c:812 #4 0xffffffff80710533 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:739 #5 0xffffffff806f0fa1 in _mtx_lock_indefinite_check (m=<optimized out>, ldap=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:1224 #6 0xffffffff806f110b in thread_lock_flags_ (td=0xfffff8006205d000, opts=<optimized out>, file=<optimized out>, line=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:913 #7 0xffffffff806ab0dc in statclock_cnt (cnt=1, usermode=<optimized out>) at /usr/src/sys/kern/kern_clock.c:768 #8 0xffffffff80a72d73 in handleevents (now=6721246917305, fake=0) at /usr/src/sys/kern/kern_clocksource.c:196 #9 0xffffffff80a73479 in timercb (et=0xffffffff80f80ad8 <lapic_et>, arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:353 #10 0xffffffff80ad6237 in lapic_handle_timer (frame=0xfffffe004b204880) at /usr/src/sys/x86/x86/local_apic.c:1305 #11 0xffffffff80a1f080 in timerint_u () at /usr/src/sys/amd64/amd64/apic_vector.S:132 #12 0xfffffe004b204968 in ?? () #13 0x0000000000001b8d in ?? () #14 0x0000000000000800 in ?? () #15 0x00000000fffff7ff in ?? () #16 0x0000000080000034 in ?? () #17 0x00000000000001f4 in ?? () #18 0x00000000fffff8eb in ?? () #19 0xfffffe004b204968 in ?? () #20 0xfffffe004b204950 in ?? () #21 0x000000008000ba2f in ?? () #22 0x0000000000000032 in ?? () #23 0xfffff800c58d4000 in ?? () #24 0xfffffe004b204968 in ?? () #25 0xffffffff80c41014 in mtx_delay () #26 0x0000000000000000 in ?? () Looking at the commit logs, I suspect r329419, r329420, r329422, and/or r320449 being possible candidates. I'll try revert my source tree to r329418, and take it from there. -- Trond.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.21.1802181310590.24158>