From owner-freebsd-hackers@FreeBSD.ORG Thu May 17 14:07:40 2012 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 446D91065670; Thu, 17 May 2012 14:07:40 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 266D78FC15; Thu, 17 May 2012 14:07:38 +0000 (UTC) Received: by lbon10 with SMTP id n10so1846079lbo.13 for ; Thu, 17 May 2012 07:07:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=FVoHhCR4eLsHfe96Y2UqH2+D56DRqpTAsa8JIpwIPGM=; b=03+fJiwLCRGYfprZtXPDiXZ9MGQmAardCi16C9/Zlyk7iicOGCohj4DeBbK+OWrlZp R0mvxjSDLTD1gYEU89CHa2gpXJjClIfVIrucipj9T/8y+5Sg4/GDb1hKDO+D6NDzN5bN Lz9sw8gFMJ4tFFAedN86Chrh7z2dubPVYG5Qu6AG1cVpFWA1JMevoMrgUITX7kqODc0b JD9zwl+bxTHZAyNy17phW8VNMM73AVwInrlDxyNDw7K7f85wfEb7GSil3Z6JyCthvm8s W4ZqiqDyDlw34SdJrNEBIGyKVDRYO7jLEzZS6O9FFtPbcccLkIGL41zXlVN+EsbBkIKd E17g== MIME-Version: 1.0 Received: by 10.152.135.105 with SMTP id pr9mr7242041lab.37.1337263657727; Thu, 17 May 2012 07:07:37 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.27.65 with HTTP; Thu, 17 May 2012 07:07:37 -0700 (PDT) In-Reply-To: <4FB4E377.8050805@FreeBSD.org> References: <4F1D18A5.8010006@FreeBSD.org> <20120123130743.GI16676@glebius.int.ru> <4F1D6830.60602@FreeBSD.org> <20120123162410.GN16676@glebius.int.ru> <20120123162606.GO16676@FreeBSD.org> <4F1D8E2B.30800@FreeBSD.org> <20120123164659.GQ16676@glebius.int.ru> <4F1D9128.3030501@FreeBSD.org> <20120124115424.GX16676@glebius.int.ru> <4F1E9F5F.8080209@FreeBSD.org> <20120124123245.GZ16676@glebius.int.ru> <4F2079B3.80305@FreeBSD.org> <4FB4E377.8050805@FreeBSD.org> Date: Thu, 17 May 2012 15:07:37 +0100 X-Google-Sender-Auth: xeySoIxqn6F6IL_1txyq-7se1RA Message-ID: From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: hackers@freebsd.org, Gleb Smirnoff , current@freebsd.org Subject: Re: new panic in cpu_reset() with WITNESS X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 May 2012 14:07:40 -0000 2012/5/17, Andriy Gapon : > on 25/01/2012 23:52 Andriy Gapon said the following: >> on 24/01/2012 14:32 Gleb Smirnoff said the following: >>> Yes, now: >>> >>> Rebooting... >>> lock order reversal: >>> 1st 0xffffffff80937140 smp rendezvous (smp rendezvous) @ >>> /usr/src/head/sys/kern/kern_shutdown.c:542 >>> 2nd 0xfffffe0001f5d838 uart_hwmtx (uart_hwmtx) @ >>> /usr/src/head/sys/dev/uart/uart_cpu.h:92 >>> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ >>> /usr/src/head/sys/kern/kern_cons.c:500 >> >> OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no >> new >> details... >> >> It's still intriguing to me why the LOR *doesn't* happen [*] with >> stop_scheduler_on_panic=0. But I don't see a productive way to pursue >> this >> investigation further. > > Salve Glebius! > After your recent nudging I took yet another look at this issue and it seems > that > I have some findings. > > For those who might get interested here is a convenience reference to the > whole > thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307 > > A short summary. > Prerequisites: an SMP x86 system, its kernel is configured with WITNESS && > !WITNESS_SKIPSPIN (this is important) and a system uses serial console via > uart. > Then, if stop_scheduler_on_panic is set to zero the system can be rebooted > without > a problem. On the other hand, if stop_scheduler_on_panic is enabled, then > the > system first runs into a LOR when calling printf() in cpu_reset() and then > it runs > into a panic when printf is recursively called from witness(9) to report the > LOR. > The panic happens because of the recursion on cnputs_mtx, which should > ensure > that cnputs() output is not intermingled and which is not flagged to > support > recursion. > > There are two things about this report that greatly confused and puzzled > me: > 1. stop_scheduler_on_panic variable is used _only_ in panic(9). So how > could it > be possible that changing its value affects behavior of the system when > panic(9) > is not called?! > > 2. The LOR in question happens between "smp rendezvous" (smp_ipi_mtx) and > "uart_hwmtx" (sc_hwmtx_s in uart core) spin locks. The order of these locks > is > actually predefined in witness order_lists[] such that uart_hwmtx must come > before > smp_ipi_mtx. But in the reboot path we first take smp_ipi_mtx in > shutdown_reset(), then we call cpu_reset, then it calls printf and from > there we > get to uart_hwmtx for serial console output. So the order between these > spinlocks > is always violated in the x86 SMP reboot path. > How come witness(9) doesn't _always_ detect this LOR? > How come it didn't detect this LOR before any "stop scheduler" commits?! > > [Spoiler alert :-)] > > Turns out that the two puzzles above are closely related. > Let's first list all the things that change when stop_scheduler_on_panic is > enabled and a panic happens: > - other CPUs are stopped (forced to spin) > - interrupts on current CPU are disabled > - by virtue of the above the current thread should be the only thread > running > (unless it executes a voluntary switch) > - all locks are "busted", they are completely ignored / bypassed > - by virtue of the above no lock invariants and witness checks are > performed, so > no lock order checking, no recursion checking, etc > > So, what I observe is this: when stop_scheduler_on_panic is disabled, the > LOR is > actually detected as it should be. witness(9) works properly here. Once > the LOR > is detected witness(9) wants to report it using printf(9). That's where we > run > into the cnputs_mtx recursion panic. It's all exactly as with > stop_scheduler_on_panic enabled. Then panic(9) wants to report the panic > using > printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags() > detects > locks recursion again (this is independent of witness(9)) and calls > panic(9) > again. Then panic(9) wants to report the panic using printf(9)... > I assume that when the stack is exhausted we run into a double fault and > dblfault_handler wants to print something again... Likely we eventually run > into > a triple fault which resets the machine. Although, I recall some old > reports of > machines hanging during reboot in a place suspiciously close to where the > described ordeal happens... > But if the machine doesn't hang we get a full appearance of the reset > successfully > happening (modulo some last messages missing). > > With stop_scheduler_on_panic enabled and all the locks being ignored we, of > course, do not run into any secondary lock recursions and resulting panics. > So > the system is able to at least panic "gracefully" (still we should have > just > reported the LOR gracefully, no panic is required). > > Some obvious conclusions: > - the "smp rendezvous" and "uart_hwmtx" LOR is real and it is the true cause > of > the problem; it should be fixed one way or other - either by correcting > witness > order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset; > - because witness(9) uses printf(9) to report problems, it is very fragile > to use > witness with any locks that can be acquired under printf(9) > - stop_scheduler_on_panic just uncovers the true bug Thanks for working on this. My thoughts on the matter: - We should not printf() in cpu_reset() which is supposed to be a very low level primitives, usable by any context. Acquiring console spinlocks there is not really a wise choice. - I'd like to see witness being based on an alternative approach, but we may find something that doesn't wrap up (thus a bufring ala KTR is not appropriate) and can work also in SMP context (thus a lockfree algorithm is not appropriate). I don't have any good solution for this on the top of my idea, but of course, relying on a path acquiring locks for a lock protector is not a great idea. The only other thing we may do is marking locks involved in the printf() path with NO_WITNESS, but that of course won't solve the deadlock problem itself. - On a related note, I once had a patch that was having BSP recovering from deadlock in the reset code and panic/print informations, under DIAGNOSTIC. I may find it again and post here. Definitively, I think that we need to make cpu_reset() locks agnostic and if you agree I can make a patch for that. Attilio -- Peace can only be achieved by understanding - A. Einstein