Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Mar 2018 05:31:18 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Warner Losh <imp@freebsd.org>, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r331298 - head/sys/dev/syscons
Message-ID:  <20180322050628.U5175@besplex.bde.org>
In-Reply-To: <20180322024846.S4293@besplex.bde.org>
References:  <201803211447.w2LElDcK091988@repo.freebsd.org> <20180322024846.S4293@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 22 Mar 2018, Bruce Evans wrote:

> On Wed, 21 Mar 2018, Warner Losh wrote:
>
>> Log:
>>  Unlock giant when calling shutdown_nice()
> ...
> This breaks the driver.  Giant is syscons' driver lock, and also the
> interrupt handler lock for at least the atkbd keyboard driver, so vt
> sometimes holds the lock for.

[That should have been "... lock too".]

> ...
> Actual testing shows that doesn't cause a panic, but it also doesn't
> actually unlock for shutdown_nice(), since the lock is acquired twice
> and only released once.  syscons has much the same extra lock sprinkling
> for event handling as vt:
>
> - intr_event_execute_handlers() acquires Giant and calls atkbdintr()
> - atkbdintr() calls sckbdevent()
> - sckbdevent() unnecessarily acquires Giant again
> - the buggy unlocking drops Giant just once
> - shutdown_nice() is called with Giant held
> - the buggy unlocking fails to re-acquire Giant
> - sckbdevent() releases Giant, leaving it not held
> - sckbdevent() returns
> - atkbdintr() returns
> - intr_event_execute_handlers() releases Giant.  This should panic, but
>  it apparently blocks for long enough for init to shut down first.
>
> When I trace the last step, I get a panic which might be either from the
> different timing or just a bug in kdb.

Testing with a kernel with other bugs fixed shows that the invariants
violation really is detected, causing a panic instead of a nice shutdown
for Ctrl-Alt-Del, but recursive panics pile up and printf() in -current
is too broken to print anything.  A better kernel prints:

XX ppppppppppppppppppppppppppkernel trap 12 with interrupts disabled
XX panic: kernel trap doesn't have ucred
XX cpuid = 32
XX time = 1521655530
XX KDB: enter: panic
XX panic: kernel trap doesn't have ucred
XX cpuid = 32
XX time = 1521655530
XX KDB: enter: panic
XX panic: kernel trap doesn't have ucred
XX cpuid = 32
XX time = 1521655530
XX KDB: enter: panic
XX panic: kernel trap doesn't have ucrepanic: kernel trap doesn't have ucred
XX cpuid = 32
XX time = 1521655530
XX KDB: enter: panic

where some messages are lost and some are duplicated (there are 8 CPUs).

I got control by putting a breakpoint after stop_cpus_hard() in vpanic().
(A breakpoint at panic() crashes, probably for multiple CPUs hitting it,
though this is supposed to be fixed in the test version.)  The backtrace
is then:

XX Breakpoint at   vpanic+0x4a:    popl    %ecx
XX db> t
XX Tracing pid 11 tid 100042 td 0xd6a61360
XX vpanic(c0916c91,d6685ab4,d6685ab4,c098f050,c098f040,...) at vpanic+0x4a/frame 0xd6685a8c
XX kassert_panic(c0916c91,c093b956,c0930a24,c09066cf,557) at kassert_panic+0x49/frame 0xd6685aa8
XX witness_unlock(c098f040,8,c09066c6,557) at witness_unlock+0xe7/frame 0xd6685af0
XX __mtx_unlock_flags(c098f050,0,c09066c6,557) at __mtx_unlock_flags+0x65/frame 0xd6685b14
XX intr_event_execute_handlers(c8f466b0) at intr_event_execute_handlers+0xed/frame 0xd6685b40
XX ithread_execute_handlers(c8f466b0,0,80202,d6a61360,c8f466b0,...) at ithread_execute_handlers+0x21/frame 0xd6685b54
XX ithread_loop(d6cda250,d6685ba8,0,d6cda250,c06a2c56,...) at ithread_loop+0x5f/frame 0xd6685b74
XX fork_exit(c06a2c56,d6cda250,d6685ba8) at fork_exit+0x83/frame 0xd6685b94
XX fork_trampoline() at fork_trampoline+0x8

The stack trace is messed up are mis-decoded despite attempts to avoid this
(use i386, don't use clang, and turn off auto-inlining...).   panic() is not
shown, and args passed in registers are not shown.  There are just enough
args to find the panic message.  It is as expected:

XX db> x/s 0xd6685ab4
XX 0xd6685ab4:     V\271\223\300$\012\223\300\317f\220\300W\005
XX db> x/s 0xc0916c91
XX __func__.16466+0x1c41:  lock (%s) %s not locked @ %s:%d

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180322050628.U5175>