From owner-freebsd-stable@freebsd.org  Fri May 19 06:03:02 2017
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2D269D73E14
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Fri, 19 May 2017 06:03:02 +0000 (UTC)
 (envelope-from ocinquin@uci.edu)
Received: from mda1.es.uci.edu (mda1.es.uci.edu [128.200.80.3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 108DD16B7
 for <freebsd-stable@freebsd.org>; Fri, 19 May 2017 06:03:01 +0000 (UTC)
 (envelope-from ocinquin@uci.edu)
Received: from ismtp2.es.uci.edu (ismtp2.es.uci.edu [128.195.153.32])
 by mda1.es.uci.edu (8.14.4/8.14.4) with ESMTP id v4J62ABg342969
 for <freebsd-stable@freebsd.org>; Thu, 18 May 2017 23:02:10 -0700
Received: from vcv079231.vpn.uci.edu (vcv079231.vpn.uci.edu [128.195.79.231])
 (authenticated bits=0)
 by ismtp2.es.uci.edu (8.14.4/8.14.4) with ESMTP id v4J622op231276
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 18 May 2017 23:02:03 -0700
X-UCInetID: ocinquin
From: Olivier Cinquin <olivier.cinquin@uci.edu>
Message-Id: <0C9874B5-0F3A-48A7-B6F4-39854D06257A@uci.edu>
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: Kernel panic in smp rendezvous
Date: Thu, 18 May 2017 23:02:01 -0700
In-Reply-To: <20170511100125.GW1622@kib.kiev.ua>
Cc: freebsd-stable@freebsd.org
To: Konstantin Belousov <kostikbel@gmail.com>
References: <EC4BA9D6-A5B2-45D3-8DE9-A98FEE20694F@uci.edu>
 <20170511100125.GW1622@kib.kiev.ua>
X-Mailer: Apple Mail (2.3273)
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 May 2017 06:03:02 -0000


> On May 11, 2017, at 3:01 AM, Konstantin Belousov <kostikbel@gmail.com> =
wrote:
>=20
> On Wed, May 10, 2017 at 03:29:39PM -0700, Olivier Cinquin wrote:
>> Hi,
>> I'm getting the following kernel panics on recent 11-STABLE =
(r317883):
>>=20
>> spin lock 0xffffffff81df43d0 (smp rendezvous) held by =
0xfffff8019c7a7000 (tid 100845) too long
>> timeout stopping cpus
>> panic: spin lock held too long
>> cpuid =3D 12
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame =
0xfffffe3e64caf2a0
>> vpanic() at vpanic+0x186/frame 0xfffffe3e64caf320
>> panic() at panic+0x43/frame 0xfffffe3e64caf380
>> _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x311/frame =
0xfffffe3e64caf3f0
>> smp_targeted_tlb_shootdown() at =
smp_targeted_tlb_shootdown+0x101/frame 0xfffffe3e64caf470
>> smp_masked_invlpg_range() at smp_masked_invlpg_range+0x50/frame =
0xfffffe3e64caf4a0
>> pmap_invalidate_range() at pmap_invalidate_range+0x196/frame =
0xfffffe3e64caf4e0
>> vm_thread_stack_dispose() at vm_thread_stack_dispose+0x2f/frame =
0xfffffe3e64caf530
>> thread_free() at thread_free+0x39/frame 0xfffffe3e64caf550
>> thread_reap() at thread_reap+0x10e/frame 0xfffffe3e64caf570
>> proc_reap() at proc_reap+0x6bd/frame 0xfffffe3e64caf5b0
>> proc_to_reap() at proc_to_reap+0x48c/frame 0xfffffe3e64caf600
>> kern_wait6() at kern_wait6+0x49e/frame 0xfffffe3e64caf6b0
>> sys_wait4() at sys_wait4+0x78/frame 0xfffffe3e64caf8a0
>> amd64_syscall() at amd64_syscall+0x6c4/frame 0xfffffe3e64cafa30
>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe3e64cafa30
>>=20
>>=20
>> The panics occur sporadically and seemingly randomly. They occur on
>> multiple machines with the same configuration, and on older revisions =
of
>> 11-STABLE as well as r317883 (I cannot easily bisect this down to a
>> specific commit given that it can take days or sometimes weeks for =
the
>> panic to occur). Note that my kernel is compiled with "options =
IPSEC", but
>> that does not seem to be relevant.
>>=20
>> My understanding of the overall problem is that an IPI is sent to all =
cores
>> by smp_rendezvous_cpus, but that some of the cores do not respond to =
it (or
>> at least do not respond to it correctly and in time). More =
specifically, I
>> can find 61 threads that seem to be blocked in cpustop_handler on an =
atomic
>> operation to indicate that they have stopped in response to the IPI. =
This
>> operation is CPU_SET_ATOMIC(cpu, &stopped_cpus); CPU_SET_ATOMIC boils =
down
>> to a call to "atomic_set_long" (not sure where that is itself defined =
for
>> amd64). Either there is a line numbering problem, or this suggests =
that
>> perhaps there is a deadlock at this step (unless there's some sort of
>> livelock and CPU_SET_ATOMIC is the place where the threads spend most =
of
>> their time).
>>=20
>> Looking at the thread backtraces as a whole, I can see the one that
>> triggered the panic, the 61 that are in cpustop_handler, and a lot of
>> sleeping threads. I guess each core should have an active thread, and =
this
>> is an architecture with 64 cores, so that leaves 64 - (1 + 61) =3D 2 =
cores
>> that are unaccounted for. Interestingly, for 2 different panics for =
which I
>> have a vmcore file these numbers are the same. For these two panics, =
the
>> IDs of the cores that are neither in cpustop_handler nor calling the =
smp
>> rendez vous are not the same (#18 and #19 in one instance, #44 and =
#45 in
>> the other instance) but in both instances the IDs are consecutive; =
perhaps
>> that's relevant.
>>=20
>> I've uploaded a full set of backtraces at
>> https://gist.github.com/cinquin/d63cdf9de01b0b8033c47128868f2d38 and =
a
>> dmesg at =
https://gist.github.com/cinquin/17c0cf6ac7832fd1b683488d08d3e69b
>> (in short, the machine is a 4 processor Opteron 6274 system). I'm =
pasting below
>> an example backtrace of the threads in stopcpu_handler.
>>=20
>> Any suggestions as to what the problem might be and how to solve it? =
I've
>> saved the core and can provide further information.
>>=20
>> Thanks
>> Olivier Cinquin
>>=20
>> A total of 61 threads are along the lines of
>>=20
>> #0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1275
>> #1  0xffffffff8105a6f5 in ipi_nmi_handler () at =
/usr/src/sys/x86/x86/mp_x86.c:1231
>> #2  0xffffffff80ef740a in trap (frame=3D0xfffffe3e687f7f30) at =
/usr/src/sys/amd64/amd64/trap.c:185
>> #3  0xffffffff80edbd03 in nmi_calltrap () at =
/usr/src/sys/amd64/amd64/exception.S:510
>> ...
>> (kgdb) frame 0
>> #0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1275
>> 1275		CPU_SET_ATOMIC(cpu, &stopped_cpus);
>> (kgdb) list
>> 1270		cpu =3D PCPU_GET(cpuid);
>> 1271=09
>> 1272		savectx(&stoppcbs[cpu]);
>> 1273=09
>> 1274		/* Indicate that we are stopped */
>> 1275		CPU_SET_ATOMIC(cpu, &stopped_cpus);
>> 1276=09
>> 1277		/* Wait for restart */
>> 1278		while (!CPU_ISSET(cpu, &started_cpus))
>> 1279		    ia32_pause();
>> (kgdb) p stopped_cpus
>> $1 =3D {__bits =3D 0xffffffff81df4368}
>> (kgdb) p started_cpus
>> $2 =3D {__bits =3D 0xffffffff81df4418}
>>=20
>=20
> It would be interesting to see only the backtraces for threads which =
are
> on CPUs, i.e. all nmi_stop_handler()-threads and others.
>=20
> =46rom your description, it looks like that there are two problems, =
both
> of which might have the same cause.  Initial thread broadcasts tlb
> shootdown IPI, calling smp_targeted_tlb_shootdown().  The function
> locks the smp_ipi_mtx spinlock.  After IPI is sent, the function loops
> waiting for other cores acknowledging the IPI.  The is the
> 		while (*p_cpudone !=3D generation)
> 			ia32_pause();
> loop which reads from other CPU PCPU area.
>=20
> Then, other thread decides that it should execute shootdown as well, =
and
> tries to lock the same smp_ipi_mtx spinlock.  It times out due to the
> initial thread, and panics.  Panic(9) sends broadcast NMI to all cores
> except self to stop the machine.  This is what you see in the =
backtrace
> for all except one thread.
>=20
> If you do not see NMI handler entrance for the initial thread, then =
this
> indicates the second problem, any way showing just backtraces for each
> CPU would clear the possible confusion.
>=20
> The question of why shootdown times out is open anyway.  You should
> examine pc_smp_tlb_done in each PCPU area to see whether CPUs properly
> reacted to the IPI and it is initiator which does not see the update,
> or it is some CPU which did not received IPI.
>=20
> This might be a CPU errata. I remember much earlier AMD CPUs having an
> issue with tight read loop, could be that it is it. Did you tried to
> update BIOS ? Also install sysutils/devcpu-data and upgrade microcode =
on
> the processor online. I skimmed over the AMD document 48063 Rev. 3.24
> "Revision Guide for AMD Family 15h Models 00h-0Fh Processors", nothing
> oustanding came out except possibly Errata 619:
>=20
> 619 Non-Posted Reads May Block Write Dependent on Probe
> Responses
> The northbridge may stall indefinitely on non-posted reads when a =
posted
> write becomes dependent on probe responses.
>=20
> You might also try to change the loop I cited above to some variant =
like
> 		while (*p_cpudone !=3D generation) {
> 			ia32_pause();
> 			mfence();
> 		}
> and report whether any of the magic helped.

So here's a report on the problem and its probable solution, in case =
others
find it useful (although it pertains more to my specific machines than =
to
FreeBSD).

I had tried manually updating the processor microcode after booting, =
using
the updated cpucontrol that was committed last year and microcode
downloaded from
=
https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/
tree/amd-ucode. That does not prevent the panic.

But your email and the contents of the AMD document you cited prompted =
me
to update the BIOS (Supermicro / AMI; going from 3.0 to 3.5, if I recall
correctly). That seems to have done the trick: I've had a few machines
running without hiccups for about a week, with a workload that would =
more
likely than not have triggered the problem by now if it still existed.

In case that information can be of use somehow, adding "mfence()" as
suggested above does not fix the problem in presence of the older BIOS.

Many thanks for your thoughtful response!
Olivier=