From owner-freebsd-current@FreeBSD.ORG Tue Jan 17 23:03:56 2012 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 25B5E1065670; Tue, 17 Jan 2012 23:03:56 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 06CCA8FC2A; Tue, 17 Jan 2012 23:03:53 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id BAA16248; Wed, 18 Jan 2012 01:03:51 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1RnI4F-0005lc-9I; Wed, 18 Jan 2012 01:03:51 +0200 Message-ID: <4F15FE55.9020201@FreeBSD.org> Date: Wed, 18 Jan 2012 01:03:49 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20111222 Thunderbird/9.0 MIME-Version: 1.0 To: mdf@FreeBSD.org, Gleb Smirnoff References: <20120117110242.GD12760@glebius.int.ru> In-Reply-To: X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: current@FreeBSD.org Subject: Re: new panic in cpu_reset() with WITNESS X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jan 2012 23:03:56 -0000 on 17/01/2012 17:34 mdf@FreeBSD.org said the following: > 2012/1/17 Gleb Smirnoff : >> New panic has been introduced somewhere between >> r229851 and r229932, that happens on shutdown if >> kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. [**] >> Uptime: 1h0m17s >> Rebooting... >> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 >> cpuid = 0 >> KDB: enter: panic >> [ thread pid 1 tid 100001 ] >> Stopped at kdb_enter+0x3b: movq $0,0x514d32(%rip) >> db> >> db> bt >> Tracing pid 1 tid 100001 td 0xfffffe0001d5e000 >> kdb_enter() at kdb_enter+0x3b >> panic() at panic+0x1c7 >> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f >> cnputs() at cnputs+0x7a >> putchar() at putchar+0x11f >> kvprintf() at kvprintf+0x83 >> vprintf() at vprintf+0x85 >> printf() at printf+0x67 >> witness_checkorder() at witness_checkorder+0x773 >> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 >> uart_cnputc() at uart_cnputc+0x3e >> cnputc() at cnputc+0x4c >> cnputs() at cnputs+0x26 >> putchar() at putchar+0x11f >> kvprintf() at kvprintf+0x83 >> vprintf() at vprintf+0x85 >> printf() at printf+0x67 >> cpu_reset() at cpu_reset+0x81 >> kern_reboot() at kern_reboot+0x3a5 >> --More--^M ^Msys_reboot() at sys_reboot+0x42 >> amd64_syscall() at amd64_syscall+0x39e >> Xfast_syscall() at Xfast_syscall+0xf7 >> --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffffffd6d8, rbp = 0x49 --- >> db> >> db> show locks >> exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff809bc560) locked @ /usr/src/head/sys/kern/kern_module.c:101 >> exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0xffffffff80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542 >> db> >> >> So the problem is that we are holding smp rendezvous mutex during the cpu_reset(). >> No mutexes should be obtained after it. However, since cpu_reset() does priting >> we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in >> the subr_witness.c as mutex to obtain before smp rendezvous, this triggers >> yet another printf from witness, that finally panics due to recursing on >> cnputs_mtx. > > At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't > seem possible to fit it into the heirarchy in any sane way, since a > print can come from basically anywhere. This is the case for the FreeBSD tree too (at least in head). > If anyone has a better fix, that'd be great, but I haven't been able > to think of one. Please note that while the panic happened on cnputs_mtx[*] (because of the recursion), it seems to have been provoked by uart_hwmtx actually. The root cause of the panic at hand is that smp_ipi_mtx is supposed to be a "terminal" lock, i.e. no other lock is supposed to be acquired while smp_ipi_mtx is held. This is the case for all the normal code, but the rule is violated for cpu_reset() in the SMP case, because smp_ipi_mtx is acquired in shutdown_reset() and is held till the end. Thus, while normally smp_ipi_mtx and cnputs_mtx/uart_hwmtx should be completely independent, they have become dependent in this reset path. I think that normally this problem is hidden by people not using WITNESS or by them using WITNESS_SKIPSPIN. One obvious solution is to change the relative order of smp_ipi_mtx (aka "smp rendezvous") and uart_hwmtx in order_lists of subr_witness.c, which predefine some order that is expected between the locks. Another solution is to look into the reason why smp_ipi_mtx is acquired in shutdown_reset(). This call has been added to avoid [what used to be] a deadlock between the stop_cpus() call with interrupts disabled on a shutdown CPU and smp_rendezvous or some tlb ipi call on another CPU. As such it should be sufficient to hold smp_ipi_mtx around the stop_cpus() call. In fact, I would argue that stop_cpus() must hold smp_ipi_mtx for the above reason. That is, all inter-CPU operations should be initiated using the same lock. [*] And there is, of course, a different issue (pointed out many times by Bruce Evans) of cnputs_mtx and now msg_lock ("msgbuf" lock) being inadequate for their jobs as this example demonstrates. Multiplied by the problem of printf being used where db_printf would have been a better choice. Since printf (or db_printf) can be called truly in any situation, the locking must handle recursion, must handle being "stuck" on another CPU, etc. In general, these functions should strive to be lockless, but that's probably not really practical for the normal situations in SMP environment. Still the special situations should be detected and handled. [**] I would expect this panic/problem to be there for a long time already (definitely before r229851). It's surprising to me that it seems to be a recent phenomenon. -- Andriy Gapon