Date: Thu, 11 May 2017 13:01:25 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Olivier Cinquin <olivier.cinquin@uci.edu> Cc: freebsd-stable@freebsd.org Subject: Re: Kernel panic in smp rendezvous Message-ID: <20170511100125.GW1622@kib.kiev.ua> In-Reply-To: <EC4BA9D6-A5B2-45D3-8DE9-A98FEE20694F@uci.edu> References: <EC4BA9D6-A5B2-45D3-8DE9-A98FEE20694F@uci.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 10, 2017 at 03:29:39PM -0700, Olivier Cinquin wrote: > Hi, > I'm getting the following kernel panics on recent 11-STABLE (r317883): > > spin lock 0xffffffff81df43d0 (smp rendezvous) held by 0xfffff8019c7a7000 (tid 100845) too long > timeout stopping cpus > panic: spin lock held too long > cpuid = 12 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe3e64caf2a0 > vpanic() at vpanic+0x186/frame 0xfffffe3e64caf320 > panic() at panic+0x43/frame 0xfffffe3e64caf380 > _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x311/frame 0xfffffe3e64caf3f0 > smp_targeted_tlb_shootdown() at smp_targeted_tlb_shootdown+0x101/frame 0xfffffe3e64caf470 > smp_masked_invlpg_range() at smp_masked_invlpg_range+0x50/frame 0xfffffe3e64caf4a0 > pmap_invalidate_range() at pmap_invalidate_range+0x196/frame 0xfffffe3e64caf4e0 > vm_thread_stack_dispose() at vm_thread_stack_dispose+0x2f/frame 0xfffffe3e64caf530 > thread_free() at thread_free+0x39/frame 0xfffffe3e64caf550 > thread_reap() at thread_reap+0x10e/frame 0xfffffe3e64caf570 > proc_reap() at proc_reap+0x6bd/frame 0xfffffe3e64caf5b0 > proc_to_reap() at proc_to_reap+0x48c/frame 0xfffffe3e64caf600 > kern_wait6() at kern_wait6+0x49e/frame 0xfffffe3e64caf6b0 > sys_wait4() at sys_wait4+0x78/frame 0xfffffe3e64caf8a0 > amd64_syscall() at amd64_syscall+0x6c4/frame 0xfffffe3e64cafa30 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe3e64cafa30 > > > The panics occur sporadically and seemingly randomly. They occur on > multiple machines with the same configuration, and on older revisions of > 11-STABLE as well as r317883 (I cannot easily bisect this down to a > specific commit given that it can take days or sometimes weeks for the > panic to occur). Note that my kernel is compiled with "options IPSEC", but > that does not seem to be relevant. > > My understanding of the overall problem is that an IPI is sent to all cores > by smp_rendezvous_cpus, but that some of the cores do not respond to it (or > at least do not respond to it correctly and in time). More specifically, I > can find 61 threads that seem to be blocked in cpustop_handler on an atomic > operation to indicate that they have stopped in response to the IPI. This > operation is CPU_SET_ATOMIC(cpu, &stopped_cpus); CPU_SET_ATOMIC boils down > to a call to "atomic_set_long" (not sure where that is itself defined for > amd64). Either there is a line numbering problem, or this suggests that > perhaps there is a deadlock at this step (unless there's some sort of > livelock and CPU_SET_ATOMIC is the place where the threads spend most of > their time). > > Looking at the thread backtraces as a whole, I can see the one that > triggered the panic, the 61 that are in cpustop_handler, and a lot of > sleeping threads. I guess each core should have an active thread, and this > is an architecture with 64 cores, so that leaves 64 - (1 + 61) = 2 cores > that are unaccounted for. Interestingly, for 2 different panics for which I > have a vmcore file these numbers are the same. For these two panics, the > IDs of the cores that are neither in cpustop_handler nor calling the smp > rendez vous are not the same (#18 and #19 in one instance, #44 and #45 in > the other instance) but in both instances the IDs are consecutive; perhaps > that's relevant. > > I've uploaded a full set of backtraces at > https://gist.github.com/cinquin/d63cdf9de01b0b8033c47128868f2d38 and a > dmesg at https://gist.github.com/cinquin/17c0cf6ac7832fd1b683488d08d3e69b > (in short, the machine is a 4 processor Opteron 6274 system). I'm pasting below > an example backtrace of the threads in stopcpu_handler. > > Any suggestions as to what the problem might be and how to solve it? I've > saved the core and can provide further information. > > Thanks > Olivier Cinquin > > A total of 61 threads are along the lines of > > #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1275 > #1 0xffffffff8105a6f5 in ipi_nmi_handler () at /usr/src/sys/x86/x86/mp_x86.c:1231 > #2 0xffffffff80ef740a in trap (frame=0xfffffe3e687f7f30) at /usr/src/sys/amd64/amd64/trap.c:185 > #3 0xffffffff80edbd03 in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:510 > ... > (kgdb) frame 0 > #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1275 > 1275 CPU_SET_ATOMIC(cpu, &stopped_cpus); > (kgdb) list > 1270 cpu = PCPU_GET(cpuid); > 1271 > 1272 savectx(&stoppcbs[cpu]); > 1273 > 1274 /* Indicate that we are stopped */ > 1275 CPU_SET_ATOMIC(cpu, &stopped_cpus); > 1276 > 1277 /* Wait for restart */ > 1278 while (!CPU_ISSET(cpu, &started_cpus)) > 1279 ia32_pause(); > (kgdb) p stopped_cpus > $1 = {__bits = 0xffffffff81df4368} > (kgdb) p started_cpus > $2 = {__bits = 0xffffffff81df4418} > It would be interesting to see only the backtraces for threads which are on CPUs, i.e. all nmi_stop_handler()-threads and others. >From your description, it looks like that there are two problems, both of which might have the same cause. Initial thread broadcasts tlb shootdown IPI, calling smp_targeted_tlb_shootdown(). The function locks the smp_ipi_mtx spinlock. After IPI is sent, the function loops waiting for other cores acknowledging the IPI. The is the while (*p_cpudone != generation) ia32_pause(); loop which reads from other CPU PCPU area. Then, other thread decides that it should execute shootdown as well, and tries to lock the same smp_ipi_mtx spinlock. It times out due to the initial thread, and panics. Panic(9) sends broadcast NMI to all cores except self to stop the machine. This is what you see in the backtrace for all except one thread. If you do not see NMI handler entrance for the initial thread, then this indicates the second problem, any way showing just backtraces for each CPU would clear the possible confusion. The question of why shootdown times out is open anyway. You should examine pc_smp_tlb_done in each PCPU area to see whether CPUs properly reacted to the IPI and it is initiator which does not see the update, or it is some CPU which did not received IPI. This might be a CPU errata. I remember much earlier AMD CPUs having an issue with tight read loop, could be that it is it. Did you tried to update BIOS ? Also install sysutils/devcpu-data and upgrade microcode on the processor online. I skimmed over the AMD document 48063 Rev. 3.24 "Revision Guide for AMD Family 15h Models 00h-0Fh Processors", nothing oustanding came out except possibly Errata 619: 619 Non-Posted Reads May Block Write Dependent on Probe Responses The northbridge may stall indefinitely on non-posted reads when a posted write becomes dependent on probe responses. You might also try to change the loop I cited above to some variant like while (*p_cpudone != generation) { ia32_pause(); mfence(); } and report whether any of the magic helped.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170511100125.GW1622>