From owner-freebsd-current@FreeBSD.ORG  Tue Jan 17 23:03:56 2012
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 25B5E1065670;
	Tue, 17 Jan 2012 23:03:56 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 06CCA8FC2A;
	Tue, 17 Jan 2012 23:03:53 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id BAA16248;
	Wed, 18 Jan 2012 01:03:51 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1RnI4F-0005lc-9I; Wed, 18 Jan 2012 01:03:51 +0200
Message-ID: <4F15FE55.9020201@FreeBSD.org>
Date: Wed, 18 Jan 2012 01:03:49 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:9.0) Gecko/20111222 Thunderbird/9.0
MIME-Version: 1.0
To: mdf@FreeBSD.org, Gleb Smirnoff <glebius@FreeBSD.org>
References: <20120117110242.GD12760@glebius.int.ru>
	<CAMBSHm_iuFwV5Hm7ArzzQbyf1mKgjmC=FGkhqGMdYgCzuJcZZg@mail.gmail.com>
In-Reply-To: <CAMBSHm_iuFwV5Hm7ArzzQbyf1mKgjmC=FGkhqGMdYgCzuJcZZg@mail.gmail.com>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: current@FreeBSD.org
Subject: Re: new panic in cpu_reset() with WITNESS
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Jan 2012 23:03:56 -0000

on 17/01/2012 17:34 mdf@FreeBSD.org said the following:
> 2012/1/17 Gleb Smirnoff <glebius@freebsd.org>:
>>  New panic has been introduced somewhere between
>> r229851 and r229932, that happens on shutdown if
>> kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.
[**]
>> Uptime: 1h0m17s
>> Rebooting...
>> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500
>> cpuid = 0
>> KDB: enter: panic
>> [ thread pid 1 tid 100001 ]
>> Stopped at      kdb_enter+0x3b: movq    $0,0x514d32(%rip)
>> db>
>> db> bt
>> Tracing pid 1 tid 100001 td 0xfffffe0001d5e000
>> kdb_enter() at kdb_enter+0x3b
>> panic() at panic+0x1c7
>> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
>> cnputs() at cnputs+0x7a
>> putchar() at putchar+0x11f
>> kvprintf() at kvprintf+0x83
>> vprintf() at vprintf+0x85
>> printf() at printf+0x67
>> witness_checkorder() at witness_checkorder+0x773
>> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
>> uart_cnputc() at uart_cnputc+0x3e
>> cnputc() at cnputc+0x4c
>> cnputs() at cnputs+0x26
>> putchar() at putchar+0x11f
>> kvprintf() at kvprintf+0x83
>> vprintf() at vprintf+0x85
>> printf() at printf+0x67
>> cpu_reset() at cpu_reset+0x81
>> kern_reboot() at kern_reboot+0x3a5
>> --More--^M        ^Msys_reboot() at sys_reboot+0x42
>> amd64_syscall() at amd64_syscall+0x39e
>> Xfast_syscall() at Xfast_syscall+0xf7
>> --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffffffd6d8, rbp = 0x49 ---
>> db>
>> db> show locks
>> exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff809bc560) locked @ /usr/src/head/sys/kern/kern_module.c:101
>> exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0xffffffff80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542
>> db>
>>
>> So the problem is that we are holding smp rendezvous mutex during the cpu_reset().
>> No mutexes should be obtained after it. However, since cpu_reset() does priting
>> we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in
>> the subr_witness.c as mutex to obtain before smp rendezvous, this triggers
>> yet another printf from witness, that finally panics due to recursing on
>> cnputs_mtx.
> 
> At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't
> seem possible to fit it into the heirarchy in any sane way, since a
> print can come from basically anywhere.

This is the case for the FreeBSD tree too (at least in head).

> If anyone has a better fix, that'd be great, but I haven't been able
> to think of one.


Please note that while the panic happened on cnputs_mtx[*] (because of the
recursion), it seems to have been provoked by uart_hwmtx actually.

The root cause of the panic at hand is that smp_ipi_mtx is supposed to be a
"terminal" lock, i.e. no other lock is supposed to be acquired while smp_ipi_mtx
is held.  This is the case for all the normal code, but the rule is violated for
cpu_reset() in the SMP case, because smp_ipi_mtx is acquired in shutdown_reset()
and is held till the end.

Thus, while normally smp_ipi_mtx and cnputs_mtx/uart_hwmtx should be completely
independent, they have become dependent in this reset path.

I think that normally this problem is hidden by people not using WITNESS or by
them using WITNESS_SKIPSPIN.

One obvious solution is to change the relative order of smp_ipi_mtx (aka "smp
rendezvous") and uart_hwmtx in order_lists of subr_witness.c, which predefine
some order that is expected between the locks.

Another solution is to look into the reason why smp_ipi_mtx is acquired in
shutdown_reset().  This call has been added to avoid [what used to be] a
deadlock between the stop_cpus() call with interrupts disabled on a shutdown CPU
and smp_rendezvous or some tlb ipi call on another CPU.  As such it should be
sufficient to hold smp_ipi_mtx around the stop_cpus() call.
In fact, I would argue that stop_cpus() must hold smp_ipi_mtx for the above
reason.  That is, all inter-CPU operations should be initiated using the same lock.

[*]
And there is, of course, a different issue (pointed out many times by Bruce
Evans) of cnputs_mtx and now msg_lock ("msgbuf" lock) being inadequate for their
jobs as this example demonstrates.  Multiplied by the problem of printf being
used where db_printf would have been a better choice.
Since printf (or db_printf) can be called truly in any situation, the locking
must handle recursion, must handle being "stuck" on another CPU, etc.  In
general, these functions should strive to be lockless, but that's probably not
really practical for the normal situations in SMP environment.  Still the
special situations should be detected and handled.

[**] I would expect this panic/problem to be there for a long time already
(definitely before r229851).  It's surprising to me that it seems to be a recent
phenomenon.

-- 
Andriy Gapon