Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Dec 2012 21:16:48 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Mark Johnston <markjdb@gmail.com>
Cc:        freebsd-bugs@freebsd.org, freebsd-gnats-submit@freebsd.org
Subject:   Re: kern/174440: [patch][syscons] panic: mutex Giant not owned at kern/tty_ttydisc.c:1089
Message-ID:  <20121215200341.J1851@besplex.bde.org>
In-Reply-To: <201212141805.qBEI5qJZ026284@red.freebsd.org>
References:  <201212141805.qBEI5qJZ026284@red.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 14 Dec 2012, Mark Johnston wrote:

>> Description:
> Last night the kernel on my laptop panicked, apparently while printing a LOR warning. The cause was a failed mutex assert:
>
> (kgdb) bt
> #0  doadump (textdump=1) at /usr/home/mark/src/freebsd/sys/kern/kern_shutdown.c:263
> #1  0xffffffff8058c533 in kern_reboot (howto=260) at /usr/home/mark/src/freebsd/sys/kern/kern_shutdown.c:446
> #2  0xffffffff8058ca32 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/home/mark/src/freebsd/sys/kern/kern_shutdown.c:753
> #3  0xffffffff8058cb47 in panic (fmt=<value optimized out>) at /usr/home/mark/src/freebsd/sys/kern/kern_shutdown.c:682
> #4  0xffffffff805786f4 in __mtx_assert (c=<value optimized out>, what=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at /usr/home/mark/src/freebsd/sys/kern/kern_mutex.c:788
> #5  0xffffffff805f6ec1 in ttydisc_rint_bypass (tp=0xfffffe000901b800, buf=0xffffff81b1249410, len=7) at /usr/home/mark/src/freebsd/sys/kern/tty_ttydisc.c:1089

The way that this is supposed to work is that syscons is Giant-locked (it
still uses D_NEEDSGIANT), so its keyboard interrupt handler holds Giant
when it calls ttydisc_rint_bypass().  Nothing else in it should make this
call.  The tty lock for syscons is just Giant.

The compiler also apparently optimized the call chain beyond recognition.
Your patch shows ttydisc_rint_simple(), not ttydisc_rint_bypass() being
called, but there is no trace of the former in the call chain.

> #6  0xffffffff80423b9f in sc_respond (scp=<value optimized out>, p=<value optimized out>, count=<value optimized out>, wakeup=0) at /usr/home/mark/src/freebsd/sys/dev/syscons/syscons.c:3798
> #7  0xffffffff807acda1 in teken_state_2 (t=0xffffffff80eafe40, c=99) at /usr/home/mark/src/freebsd/sys/teken/teken.c:137
> #8  0xffffffff807abf83 in teken_input_char (t=0xffffffff80eafe40, c=0) at /usr/home/mark/src/freebsd/sys/teken/teken.c:224
> #9  0xffffffff807ac47a in teken_input (t=0xffffffff80eafe40, buf=0x0, len=0) at /usr/home/mark/src/freebsd/sys/teken/teken.c:254
> #10 0xffffffff80854388 in scteken_puts (scp=0xffffffff80c7d820, buf=0xffffff81b124956f "c&#65533;\"&#65533;\200&#65533;&#65533;&#65533;&#65533;&#65533;t&#65533;\200&#65533;&#65533;&#65533;&#65533;&#65533;\225$&#65533;\201&#65533;&#65533;&#65533;&#65533;;T\200&#65533;&#65533;&#65533;&#65533;\020\226$&#65533;\201&#65533;&#65533;&#65533;\001", len=<value optimized out>, kernel=<value optimized out>)
>    at /usr/home/mark/src/freebsd/sys/dev/syscons/scterm-teken.c:181

Looks like a bug in teken.  It shouldn't be doing input in an output routine.

> #11 0xffffffff80424c0f in sc_puts (scp=0xffffffff80c7d820, buf=0xffffff81b124956f "c&#65533;\"&#65533;\200&#65533;&#65533;&#65533;&#65533;&#65533;t&#65533;\200&#65533;&#65533;&#65533;&#65533;&#65533;\225$&#65533;\201&#65533;&#65533;&#65533;&#65533;;T\200&#65533;&#65533;&#65533;&#65533;\020\226$&#65533;\201&#65533;&#65533;&#65533;\001", len=1, kernel=1)

sc_puts() is fragile too.  It should probably use the same Giant locking as
syscons input.  I don't remember seeing any, and this panic shows that it
doesn't in some cases.

syscons does sprinkle some bogus spinlocks in its output routines.  2
different ones, that sometimes help, but sometimes give deadlock by
getting in each other's and their own way.

>    at /usr/home/mark/src/freebsd/sys/dev/syscons/syscons.c:2663
> #12 0xffffffff8042894e in sc_cnputc (cd=<value optimized out>, c=99) at /usr/home/mark/src/freebsd/sys/dev/syscons/syscons.c:1711
> #13 0xffffffff80543bdc in cnputc (c=99) at /usr/home/mark/src/freebsd/sys/kern/kern_cons.c:476
> #14 0xffffffff80543fc6 in cnputs (p=0xffffff81b1249610 "c") at /usr/home/mark/src/freebsd/sys/kern/kern_cons.c:505

This is low-console output.  It is a gross error for low-level console output
to call up to a high-level tty-level routine like tty_rint_bypass().  The
panic on the unlocked Giant is just detecting this accidentally.

> #15 0xffffffff805cad8b in vprintf (fmt=<value optimized out>, ap=<value optimized out>) at /usr/home/mark/src/freebsd/sys/kern/subr_prf.c:396
> #16 0xffffffff805cae17 in printf (fmt=<value optimized out>) at /usr/home/mark/src/freebsd/sys/kern/subr_prf.c:362
> #17 0xffffffff803067a1 in db_putc (c=99) at /usr/home/mark/src/freebsd/sys/ddb/db_output.c:156

The error is grosser if the low-level output is for ddb.  Low-level console
output must not use any normal locks, in part because it may deadlock.
Deadlock is most obviously fatal if it is within ddb.  Some console
drivers have hacks in their low-level i/o routines, to use normal locks
except when within ddb.  Syscons used to be better in this area.  Now it
does very little, and avoiding the lock Giant lock in the low-level console
output would just help detect the gross error of calling up to
tty_rint_bypass() from within ddb.

> #18 0xffffffff805ca22a in kvprintf (fmt=0xffffffff808a6710 "", func=0xffffffff803068e0 <db_putchar>, arg=0xffffff81b1249910, radix=16, ap=0x0) at /usr/home/mark/src/freebsd/sys/kern/subr_prf.c:818
> #19 0xffffffff80306436 in db_printf (fmt=<value optimized out>) at /usr/home/mark/src/freebsd/sys/ddb/db_output.c:340
> #20 0xffffffff80308b61 in db_printsym (off=-2138971481, strategy=2) at /usr/home/mark/src/freebsd/sys/ddb/db_sym.c:478
> #21 0xffffffff8081d0a9 in db_print_stack_entry (name=<value optimized out>, narg=<value optimized out>, argnp=<value optimized out>, argp=<value optimized out>, callpc=18446744071570580135, frame=0xffffff81b124aab0)
>    at /usr/home/mark/src/freebsd/sys/amd64/amd64/db_trace.c:254
> #22 0xffffffff8081d528 in db_backtrace (td=0xfffffe013a802000, tf=0x0, frame=<value optimized out>, pc=18446744071570580135, count=1004) at /usr/home/mark/src/freebsd/sys/amd64/amd64/db_trace.c:348
> #23 0xffffffff803060ba in db_trace_self_wrapper () at /usr/home/mark/src/freebsd/sys/ddb/db_main.c:248
> #24 0xffffffff805c6577 in kdb_backtrace () at /usr/home/mark/src/freebsd/sys/kern/subr_kdb.c:372

Oops, we're not really in ddb, but are using ddb's stack trace function.
This is not an abuse, but the function is clearly not ready for this
use, since it calls db_printf() instead of printf(), and only hacks
in it allow it to work at all.  kdb_backtrace doesn't enter ddb, so
the low-level console output is more fragile than for calls from within
ddb.  It should be equally robust for all cases, but isn't.  Deadlock
in witness would be almost as bad as in ddb.

> #25 0xffffffff805dcd8c in _witness_debugger (cond=0, msg=0xffffffff80914920 "witness_checkorder") at /usr/home/mark/src/freebsd/sys/kern/subr_witness.c:2884
> #26 0xffffffff805de7dd in witness_checkorder (lock=0xfffffe015713fa48, flags=9, file=0xffffffff8091c628 "/usr/home/mark/src/freebsd/sys/kern/vfs_subr.c", line=2161, interlock=0x0)
>    at /usr/home/mark/src/freebsd/sys/kern/subr_witness.c:1345

Presumably this is only a LOR, so shouldn't be fatal.

> #27 0xffffffff80570eb5 in __lockmgr_args (lk=0xfffffe015713fa48, flags=524544, ilk=0xfffffe015713fa78, wmesg=<value optimized out>, pri=<value optimized out>, timo=<value optimized out>,
>    file=0xffffffff8091c628 "/usr/home/mark/src/freebsd/sys/kern/vfs_subr.c", line=2161) at /usr/home/mark/src/freebsd/sys/kern/kern_lock.c:720

How useful to optimize out the messages, so that we can't see if it is a LOR.

> #28 0xffffffff807dd5ec in ffs_lock (ap=0xffffff81b1249f50) at lockmgr.h:97

The context for the LOR is top-level, so there shouldn't be any problems
using low-level console output to report it.

> #29 0xffffffff80883c58 in VOP_LOCK1_APV (vop=0xffffffff80c3cbe0, a=0xffffff81b1249f50) at vnode_if.c:1990
> #30 0xffffffff8063ddea in _vn_lock (vp=0xfffffe015713f9b0, flags=524288, file=0xffffffff8091c628 "/usr/home/mark/src/freebsd/sys/kern/vfs_subr.c", line=2161) at vnode_if.h:859
> #31 0xffffffff80632953 in vget (vp=0xfffffe015713f9b0, flags=524544, td=0xfffffe013a802000) at /usr/home/mark/src/freebsd/sys/kern/vfs_subr.c:2161
> #32 0xffffffff806222b5 in vfs_hash_get (mp=0xfffffe0009f75cc0, hash=5928856, flags=<value optimized out>, td=0xfffffe013a802000, vpp=0xffffff81b124a198, fn=0, arg=0x0) at /usr/home/mark/src/freebsd/sys/kern/vfs_hash.c:81
> #33 0xffffffff807d7728 in ffs_vgetf (mp=0xfffffe0009f75cc0, ino=5928856, flags=524288, vpp=0xffffff81b124a198, ffs_flags=1) at /usr/home/mark/src/freebsd/sys/ufs/ffs/ffs_vfsops.c:1656
> #34 0xffffffff807d5d2c in softdep_sync_buf (vp=0xfffffe01a9c319b0, bp=0xffffff816d6b3108, waitfor=1) at /usr/home/mark/src/freebsd/sys/ufs/ffs/ffs_softdep.c:12664
> #35 0xffffffff807dd987 in ffs_syncvnode (vp=0xfffffe01a9c319b0, waitfor=1, flags=0) at /usr/home/mark/src/freebsd/sys/ufs/ffs/ffs_vnops.c:269
> #36 0xffffffff807b7e19 in ffs_truncate (vp=0xfffffe01a9c319b0, length=2048, flags=2176, cred=0xfffffe0009f20200) at /usr/home/mark/src/freebsd/sys/ufs/ffs/ffs_inode.c:338
> #37 0xffffffff807e51f9 in ufs_direnter (dvp=<value optimized out>, tvp=0xfffffe015713f9b0, dirp=0xffffff81b124a5e0, cnp=<value optimized out>, newdirbp=0x0, isrename=0)
>    at /usr/home/mark/src/freebsd/sys/ufs/ufs/ufs_lookup.c:1137
> #38 0xffffffff807ea432 in ufs_makeinode (mode=<value optimized out>, dvp=0xfffffe01a9c319b0, vpp=0xffffff81b124a800, cnp=0xffffff81b124a828) at /usr/home/mark/src/freebsd/sys/ufs/ufs/ufs_vnops.c:2724
> #39 0xffffffff807ea80f in ufs_symlink (ap=0xffffff81b124a920) at /usr/home/mark/src/freebsd/sys/ufs/ufs/ufs_vnops.c:2139
> #40 0xffffffff80883d90 in VOP_SYMLINK_APV (vop=0xffffffff80c3d5e0, a=0xffffff81b124a920) at vnode_if.c:1673
> #41 0xffffffff806366c1 in kern_symlinkat (td=0xfffffe013a802000, path1=<value optimized out>, fd=-100, path2=<value optimized out>, segflg=UIO_USERSPACE) at vnode_if.h:723
> #42 0xffffffff80834079 in amd64_syscall (td=0xfffffe013a802000, traced=0) at subr_syscall.c:135
> #43 0xffffffff8081e2a7 in Xfast_syscall () at /usr/home/mark/src/freebsd/sys/amd64/amd64/exception.S:387
> #44 0x000000080088602a in ?? ()
>> How-To-Repeat:
> No idea. It only happened once, and I can't reproduce it.
>> Fix:
> I've attached a patch which should fix the issue.
>
> I'm confused as to how teken_state_2() was entered in the first place. Looking at teken_state.h, it seems that it's entered from state 1 with a '[', but I don't see that character in the WITNESS output.

I don't like teken, and used it in syscons compatibility mode when I debugged
deadlock in syscons output.  This might have prevented teken doing input.
It's probably rare, as well as a bug for the low-level case, for it to do
input during output.

> Regardless, it seems that ttydisc_rint_simple() expects to be called with the tty lock held, so the fix should be to add the appropriate locking to syscons.c:sc_respond().
>
> Patch attached with submission follows:
>
> diff --git a/sys/dev/syscons/syscons.c b/sys/dev/syscons/syscons.c
> index beb0c67..0f6a2c4 100644
> --- a/sys/dev/syscons/syscons.c
> +++ b/sys/dev/syscons/syscons.c
> @@ -3795,11 +3795,13 @@ sc_respond(scr_stat *scp, const u_char *p, int count, int wakeup)
>     tp = SC_DEV(scp->sc, scp->sc->cur_scp->index);
>     if (!tty_opened_ns(tp))
> 	return;
> +    tty_lock(tp);
>     ttydisc_rint_simple(tp, p, count);
>     if (wakeup) {
> 	/* XXX: we can't always call ttydisc_rint_done() here! */
> 	ttydisc_rint_done(tp);
>     }
> +    tty_unlock(tp);
> }

No, this would break the warning.  You could add an assert that the lock
is held here instead, but that would be no different.  It would be
more interesting to sprinkle assertions in callers to find the one that
called here.  The stack trace should show this, but too many callers
are optimized away.  There are many other calls to tty_rint*() in syscons,
and no obvious reason why broken callers would be limited to this one.
The one on sc_respond() does seem more fragile as designed than most
others, since sc_respond() is not called from syscons' normal keyboard
handler sckbdevent().  It seems to be designed for use only by external
callers which shouldn't and don't know about syscons' Giant locking.
They also apparentlly should but don't know that it cannot be called in
low-level console context.  There are only 4 references to Giant in all
of syscons:
- D_NEEDSGIANT.  This makes upper layers aqcuire Giant for syscons.  I
   wonder what this does now that there is a tty lock that should do the
   same.
- in the tty alloc call, to make the lock Giant and not a per-device one
- in sckbdevent(), to acquire Giant.  I think this is needed because not
   all keyboard interrupt handlers are non-MPSAFE, so some of them will
   legitimately call here without Giant held.  This complexity makes it
   hard to check that there are no callers from invalid contexts like
   low-level console i/o.  scgetc() uses a flag to tell it not to do
   harmful this in the low-level case.  I think it still does harmful
   things.
- in sckbdevent(), to release Giant.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121215200341.J1851>