From owner-freebsd-current@FreeBSD.ORG Thu May 17 11:39:45 2012 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 53CAD106564A; Thu, 17 May 2012 11:39:45 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 195A28FC08; Thu, 17 May 2012 11:39:43 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA10166; Thu, 17 May 2012 14:39:36 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4FB4E377.8050805@FreeBSD.org> Date: Thu, 17 May 2012 14:39:35 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120503 Thunderbird/12.0.1 MIME-Version: 1.0 To: Gleb Smirnoff , current@FreeBSD.org References: <4F1D18A5.8010006@FreeBSD.org> <20120123130743.GI16676@glebius.int.ru> <4F1D6830.60602@FreeBSD.org> <20120123162410.GN16676@glebius.int.ru> <20120123162606.GO16676@FreeBSD.org> <4F1D8E2B.30800@FreeBSD.org> <20120123164659.GQ16676@glebius.int.ru> <4F1D9128.3030501@FreeBSD.org> <20120124115424.GX16676@glebius.int.ru> <4F1E9F5F.8080209@FreeBSD.org> <20120124123245.GZ16676@glebius.int.ru> <4F2079B3.80305@FreeBSD.org> In-Reply-To: <4F2079B3.80305@FreeBSD.org> X-Enigmail-Version: 1.5pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: hackers@FreeBSD.org Subject: Re: new panic in cpu_reset() with WITNESS X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 May 2012 11:39:45 -0000 on 25/01/2012 23:52 Andriy Gapon said the following: > on 24/01/2012 14:32 Gleb Smirnoff said the following: >> Yes, now: >> >> Rebooting... >> lock order reversal: >> 1st 0xffffffff80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542 >> 2nd 0xfffffe0001f5d838 uart_hwmtx (uart_hwmtx) @ /usr/src/head/sys/dev/uart/uart_cpu.h:92 >> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 > > OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no new > details... > > It's still intriguing to me why the LOR *doesn't* happen [*] with > stop_scheduler_on_panic=0. But I don't see a productive way to pursue this > investigation further. Salve Glebius! After your recent nudging I took yet another look at this issue and it seems that I have some findings. For those who might get interested here is a convenience reference to the whole thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307 A short summary. Prerequisites: an SMP x86 system, its kernel is configured with WITNESS && !WITNESS_SKIPSPIN (this is important) and a system uses serial console via uart. Then, if stop_scheduler_on_panic is set to zero the system can be rebooted without a problem. On the other hand, if stop_scheduler_on_panic is enabled, then the system first runs into a LOR when calling printf() in cpu_reset() and then it runs into a panic when printf is recursively called from witness(9) to report the LOR. The panic happens because of the recursion on cnputs_mtx, which should ensure that cnputs() output is not intermingled and which is not flagged to support recursion. There are two things about this report that greatly confused and puzzled me: 1. stop_scheduler_on_panic variable is used _only_ in panic(9). So how could it be possible that changing its value affects behavior of the system when panic(9) is not called?! 2. The LOR in question happens between "smp rendezvous" (smp_ipi_mtx) and "uart_hwmtx" (sc_hwmtx_s in uart core) spin locks. The order of these locks is actually predefined in witness order_lists[] such that uart_hwmtx must come before smp_ipi_mtx. But in the reboot path we first take smp_ipi_mtx in shutdown_reset(), then we call cpu_reset, then it calls printf and from there we get to uart_hwmtx for serial console output. So the order between these spinlocks is always violated in the x86 SMP reboot path. How come witness(9) doesn't _always_ detect this LOR? How come it didn't detect this LOR before any "stop scheduler" commits?! [Spoiler alert :-)] Turns out that the two puzzles above are closely related. Let's first list all the things that change when stop_scheduler_on_panic is enabled and a panic happens: - other CPUs are stopped (forced to spin) - interrupts on current CPU are disabled - by virtue of the above the current thread should be the only thread running (unless it executes a voluntary switch) - all locks are "busted", they are completely ignored / bypassed - by virtue of the above no lock invariants and witness checks are performed, so no lock order checking, no recursion checking, etc So, what I observe is this: when stop_scheduler_on_panic is disabled, the LOR is actually detected as it should be. witness(9) works properly here. Once the LOR is detected witness(9) wants to report it using printf(9). That's where we run into the cnputs_mtx recursion panic. It's all exactly as with stop_scheduler_on_panic enabled. Then panic(9) wants to report the panic using printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags() detects locks recursion again (this is independent of witness(9)) and calls panic(9) again. Then panic(9) wants to report the panic using printf(9)... I assume that when the stack is exhausted we run into a double fault and dblfault_handler wants to print something again... Likely we eventually run into a triple fault which resets the machine. Although, I recall some old reports of machines hanging during reboot in a place suspiciously close to where the described ordeal happens... But if the machine doesn't hang we get a full appearance of the reset successfully happening (modulo some last messages missing). With stop_scheduler_on_panic enabled and all the locks being ignored we, of course, do not run into any secondary lock recursions and resulting panics. So the system is able to at least panic "gracefully" (still we should have just reported the LOR gracefully, no panic is required). Some obvious conclusions: - the "smp rendezvous" and "uart_hwmtx" LOR is real and it is the true cause of the problem; it should be fixed one way or other - either by correcting witness order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset; - because witness(9) uses printf(9) to report problems, it is very fragile to use witness with any locks that can be acquired under printf(9) - stop_scheduler_on_panic just uncovers the true bug There are probably other conclusions that can be made. I welcome any suggestions on how to fix the problems discovered. -- Andriy Gapon