Date: Sat, 05 Feb 2022 17:30:14 +0000 From: bugzilla-noreply@freebsd.org To: virtualization@FreeBSD.org Subject: [Bug 261198] bhyve host panics with: spin lock 0xffffffff81eac800 (callout) helpanic: spin lock held too long Message-ID: <bug-261198-27103-Xp7EeRjKgg@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-261198-27103@https.bugs.freebsd.org/bugzilla/> References: <bug-261198-27103@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261198 Mark Johnston <markj@FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|New |Open CC| |markj@FreeBSD.org --- Comment #5 from Mark Johnston <markj@FreeBSD.org> --- I hit what appears to be the same thing. The system was running a bunch of CPU-intensive FreeBSD guests and a buildworld on the host, somewhat oversubscribed. One of the per-CPU callout spinlocks was held for too long: spin lock 0xffffffff81ead8c0 (callout) held by 0xfffffe00806e21e0 (tid 1002= 03) too long panic: spin lock held too long=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20 cpuid =3D 17=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 time =3D 1644078265=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 KDB: stack backtrace:=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01c0318= 4e0=20=20 vpanic() at vpanic+0x17f/frame 0xfffffe01c0318530=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 panic() at panic+0x43/frame 0xfffffe01c0318590=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x74/frame 0xfffffe01c03185b0=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe01c0318= 620=20=20 __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0xd5/frame 0xfffffe01c0318= 660=20=20 _callout_stop_safe() at _callout_stop_safe+0x145/frame 0xfffffe01c03186d0= =20=20=20=20=20=20=20 sleepq_remove_thread() at sleepq_remove_thread+0x1ad/frame 0xfffffe01c03186= f0=20=20=20 sleepq_resume_thread() at sleepq_resume_thread+0xaa/frame 0xfffffe01c031873= 0=20=20=20=20 wakeup_one() at wakeup_one+0x1f/frame 0xfffffe01c0318750=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 vcpu_notify_event() at vcpu_notify_event+0x4e/frame 0xfffffe01c0318780 If I look at the mutex state, though, the lock is unheld: (kgdb) frame 5 #5 0xffffffff80be46d4 in _mtx_lock_indefinite_check (m=3Dm@entry=3D0xffffffff81ead8c0 <cc_cpu+8000>, ldap=3Dldap@entry=3D0xfffffe01c03185e0) at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:1256 1256 panic("spin lock held too long"); (kgdb) p td $1 =3D (struct thread *) 0xfffffe00806e21e0 (kgdb) p m $2 =3D (struct mtx *) 0xffffffff81ead8c0 <cc_cpu+8000> (kgdb) p *m $3 =3D {lock_object =3D {lo_name =3D 0xffffffff8130481b "callout", lo_flags= =3D 196608, lo_data =3D 0, lo_witness =3D 0x0}, mtx_lock =3D 0} (kgdb) p td->td_tid $4 =3D 100203 (kgdb) tid 100203 (kgdb) bt #0 cpustop_handler () at /usr/home/markj/src/freebsd/sys/x86/x86/mp_x86.c:= 1490 #1 0xffffffff810ab739 in ipi_nmi_handler () at /usr/home/markj/src/freebsd/sys/x86/x86/mp_x86.c:1447 #2 0xffffffff810e4dd6 in trap (frame=3D0xfffffe006baf5f30) at /usr/home/markj/src/freebsd/sys/amd64/amd64/trap.c:240 #3 <signal handler called> #4 rdtsc () at /usr/home/markj/src/freebsd/sys/amd64/include/cpufunc.h:355 #5 delay_tsc (n=3D1) at /usr/home/markj/src/freebsd/sys/x86/x86/delay.c:68 #6 cpu_lock_delay () at /usr/home/markj/src/freebsd/sys/x86/x86/delay.c:140 #7 0xffffffff80be46d9 in _mtx_lock_indefinite_check (m=3D0x22000ee4, m@entry=3D0xfffffe006baf70c0, ldap=3D0x9927a00000000, ldap@entry=3D0xffffffff82d07e70) at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:1243 #8 0xffffffff80be490a in thread_lock_flags_ (td=3D0xfffffe00806e21e0, opts= =3D0, file=3D<optimized out>, line=3D<optimized out>) at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:929 #9 0xffffffff80c27226 in softclock_thread (arg=3Darg@entry=3D0xffffffff81e= ad8c0 <cc_cpu+8000>) at /usr/home/markj/src/freebsd/sys/kern/kern_timeout.c:840 #10 0xffffffff80bbfc40 in fork_exit (callout=3D0xffffffff80c271e0 <softclock_thread>, arg=3D0xffffffff81ead8c0 <cc_cpu+8000>, frame=3D0xffffffff82d07f40) at /usr/home/markj/src/freebsd/sys/kern/kern_fork.c:1102 #11 <signal handler called> #12 0x35f3ad8ceff1b8b6 in ?? () So this softclock thread is spinning for its own thread lock, which is: (kgdb) p/x *m@entry $5 =3D {lock_object =3D {lo_name =3D 0xfffffe006baf7d40, lo_flags =3D 0x300= 00, lo_data =3D 0x0, lo_witness =3D 0x0}, mtx_lock =3D 0xfffffe0237c51c80} and the owning thread is the thread that panicked: (kgdb) p ((struct thread *)$5.mtx_lock)->td_tid $6 =3D 161889 Hmm. I somewhat doubt this is bhyve specific, bhyve just exercises the cal= lout subsystem heavily. This is probably a result of commit 74cf7cae4d2238ae6d1c949b2bbd077e1ab33634 somehow. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-261198-27103-Xp7EeRjKgg>