Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 May 2012 15:07:37 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        hackers@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, current@freebsd.org
Subject:   Re: new panic in cpu_reset() with WITNESS
Message-ID:  <CAJ-FndAbKDJiCSz%2Bbvag=sSQABNsMOg62hVVo%2BnN6XEnBtPjkQ@mail.gmail.com>
In-Reply-To: <4FB4E377.8050805@FreeBSD.org>
References:  <4F1D18A5.8010006@FreeBSD.org> <20120123130743.GI16676@glebius.int.ru> <4F1D6830.60602@FreeBSD.org> <20120123162410.GN16676@glebius.int.ru> <20120123162606.GO16676@FreeBSD.org> <4F1D8E2B.30800@FreeBSD.org> <20120123164659.GQ16676@glebius.int.ru> <4F1D9128.3030501@FreeBSD.org> <20120124115424.GX16676@glebius.int.ru> <4F1E9F5F.8080209@FreeBSD.org> <20120124123245.GZ16676@glebius.int.ru> <4F2079B3.80305@FreeBSD.org> <4FB4E377.8050805@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
2012/5/17, Andriy Gapon <avg@freebsd.org>:
> on 25/01/2012 23:52 Andriy Gapon said the following:
>> on 24/01/2012 14:32 Gleb Smirnoff said the following:
>>> Yes, now:
>>>
>>> Rebooting...
>>> lock order reversal:
>>>  1st 0xffffffff80937140 smp rendezvous (smp rendezvous) @
>>> /usr/src/head/sys/kern/kern_shutdown.c:542
>>>  2nd 0xfffffe0001f5d838 uart_hwmtx (uart_hwmtx) @
>>> /usr/src/head/sys/dev/uart/uart_cpu.h:92
>>> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @
>>> /usr/src/head/sys/kern/kern_cons.c:500
>>
>> OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no
>> new
>> details...
>>
>> It's still intriguing to me why the LOR *doesn't* happen [*] with
>> stop_scheduler_on_panic=0.  But I don't see a productive way to pursue
>> this
>> investigation further.
>
> Salve Glebius!
> After your recent nudging I took yet another look at this issue and it seems
> that
> I have some findings.
>
> For those who might get interested here is a convenience reference to the
> whole
> thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307
>
> A short summary.
> Prerequisites: an SMP x86 system, its kernel is configured with WITNESS &&
> !WITNESS_SKIPSPIN (this is important) and a system uses serial console via
> uart.
> Then, if stop_scheduler_on_panic is set to zero the system can be rebooted
> without
> a problem.  On the other hand, if stop_scheduler_on_panic is enabled, then
> the
> system first runs into a LOR when calling printf() in cpu_reset() and then
> it runs
> into a panic when printf is recursively called from witness(9) to report the
> LOR.
>  The panic happens because of the recursion on cnputs_mtx, which should
> ensure
> that cnputs() output is not intermingled and which is not flagged to
> support
> recursion.
>
> There are two things about this report that greatly confused and puzzled
> me:
> 1. stop_scheduler_on_panic variable is used _only_ in panic(9).  So how
> could it
> be possible that changing its value affects behavior of the system when
> panic(9)
> is not called?!
>
> 2. The LOR in question happens between "smp rendezvous" (smp_ipi_mtx) and
> "uart_hwmtx" (sc_hwmtx_s in uart core) spin locks.  The order of these locks
> is
> actually predefined in witness order_lists[] such that uart_hwmtx must come
> before
> smp_ipi_mtx.  But in the reboot path we first take smp_ipi_mtx in
> shutdown_reset(), then we call cpu_reset, then it calls printf and from
> there we
> get to uart_hwmtx for serial console output.  So the order between these
> spinlocks
> is always violated in the x86 SMP reboot path.
> How come witness(9) doesn't _always_ detect this LOR?
> How come it didn't detect this LOR before any "stop scheduler" commits?!
>
> [Spoiler alert :-)]
>
> Turns out that the two puzzles above are closely related.
> Let's first list all the things that change when stop_scheduler_on_panic is
> enabled and a panic happens:
> - other CPUs are stopped (forced to spin)
> - interrupts on current CPU are disabled
> - by virtue of the above the current thread should be the only thread
> running
> (unless it executes a voluntary switch)
> - all locks are "busted", they are completely ignored / bypassed
> - by virtue of the above no lock invariants and witness checks are
> performed, so
> no lock order checking, no recursion checking, etc
>
> So, what I observe is this: when stop_scheduler_on_panic is disabled, the
> LOR is
> actually detected as it should be.  witness(9) works properly here.  Once
> the LOR
> is detected witness(9) wants to report it using printf(9).  That's where we
> run
> into the cnputs_mtx recursion panic.  It's all exactly as with
> stop_scheduler_on_panic enabled.  Then panic(9) wants to report the panic
> using
> printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags()
> detects
> locks recursion again (this is independent of witness(9)) and calls
> panic(9)
> again.  Then panic(9) wants to report the panic using printf(9)...
> I assume that when the stack is exhausted we run into a double fault and
> dblfault_handler wants to print something again...  Likely we eventually run
> into
> a triple fault which resets the machine.  Although, I recall some old
> reports of
> machines hanging during reboot in a place suspiciously close to where the
> described ordeal happens...
> But if the machine doesn't hang we get a full appearance of the reset
> successfully
> happening (modulo some last messages missing).
>
> With stop_scheduler_on_panic enabled and all the locks being ignored we, of
> course, do not run into any secondary lock recursions and resulting panics.
> So
> the system is able to at least panic "gracefully" (still we should have
> just
> reported the LOR gracefully, no panic is required).
>
> Some obvious conclusions:
> - the "smp rendezvous" and "uart_hwmtx" LOR is real and it is the true cause
> of
> the problem; it should be fixed one way or other - either by correcting
> witness
> order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset;
> - because witness(9) uses printf(9) to report problems, it is very fragile
> to use
> witness with any locks that can be acquired under printf(9)
> - stop_scheduler_on_panic just uncovers the true bug

Thanks for working on this.

My thoughts on the matter:
- We should not printf() in cpu_reset() which is supposed to be a very
low level primitives, usable by any context. Acquiring console
spinlocks there is not really a wise choice.
- I'd like to see witness being based on an alternative approach, but
we may find something that doesn't wrap up (thus a bufring ala KTR is
not appropriate) and can work also in SMP context (thus a lockfree
algorithm is not appropriate). I don't have any good solution for this
on the top of my idea, but of course, relying on a path acquiring
locks for a lock protector is not a great idea. The only other thing
we may do is marking locks involved in the printf() path with
NO_WITNESS, but that of course won't solve the deadlock problem
itself.
- On a related note, I once had a patch that was having BSP recovering
from deadlock in the reset code and panic/print informations, under
DIAGNOSTIC. I may find it again and post here.

Definitively, I think that we need to make cpu_reset() locks agnostic
and if you agree I can make a patch for that.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndAbKDJiCSz%2Bbvag=sSQABNsMOg62hVVo%2BnN6XEnBtPjkQ>