Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Mar 2018 04:53:22 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Warner Losh <imp@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r331298 - head/sys/dev/syscons
Message-ID:  <20180322024846.S4293@besplex.bde.org>
In-Reply-To: <201803211447.w2LElDcK091988@repo.freebsd.org>
References:  <201803211447.w2LElDcK091988@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 21 Mar 2018, Warner Losh wrote:

> Log:
>  Unlock giant when calling shutdown_nice()

This breaks the driver.  Giant is syscons' driver lock, and also the
interrupt handler lock for at least the atkbd keyboard driver, so vt
sometimes holds the lock for.

vt has to sprinkle lots of Giant locking and unlocking where the
corresponding locking is automatic for syscons, but this seems to be
nonsense in its event handler.  vt has to acquire Giant when calling
the keyboard driver, but event handling is a call from the keyboard
driver, usually from the interrupt handler with a suitable lock held.
For atkbd, the suitable lock is Giant -- see atkbd_intr().  (All keyboard
drivers that use the kbd(4) for the top level use Giant looking since
toe top level specifies D_NEEDGIANT in its cdevsw.  I think that is all
keyboard drivers.)  So where vt_kbdevent() sprinkles Giant, that has no
effect since Giant is held by the caller.  So vt_kbdevent() calls
vt_processkey() with Giant held despite it not acquiring Giant explicitly;
vt_processkey() calls vt_machine_kbdevent() and that calls various
shutdown functions, all with Giant held.

> Modified: head/sys/dev/syscons/syscons.c
> ==============================================================================
> --- head/sys/dev/syscons/syscons.c	Wed Mar 21 14:47:08 2018	(r331297)
> +++ head/sys/dev/syscons/syscons.c	Wed Mar 21 14:47:12 2018	(r331298)
> @@ -3858,22 +3858,28 @@ next_code:
>
> 	    case RBT:
> #ifndef SC_DISABLE_REBOOT
> -		if (enable_reboot && !(flags & SCGETC_CN))
> +		if (enable_reboot && !(flags & SCGETC_CN)) {
> +			mtx_unlock(&Giant);
> 			shutdown_nice(0);
> +		}
> #endif
> 		break;
>
> 	    case HALT:
> #ifndef SC_DISABLE_REBOOT
> -		if (enable_reboot && !(flags & SCGETC_CN))
> +		if (enable_reboot && !(flags & SCGETC_CN)) {
> +			mtx_unlock(&Giant);
> 			shutdown_nice(RB_HALT);
> +		}
> #endif
> 		break;
>
> 	    case PDWN:
> #ifndef SC_DISABLE_REBOOT
> -		if (enable_reboot && !(flags & SCGETC_CN))
> +		if (enable_reboot && !(flags & SCGETC_CN)) {
> +			mtx_unlock(&Giant);
> 			shutdown_nice(RB_HALT|RB_POWEROFF);
> +		}
> #endif
> 		break;

The new bugs should cause assertion failures.  shutdown_nice() just
signals init and returns.  Giant is not re-acquired, so assertions
should fail when the caller drops the lock.  So the shutdown should
be a nasty panic.

Transiently dropping the lock is probably not fatal depending on what
the caller does.  On x86 with atkbd, nested interrupts are prevented
by masking the in the ATPIC or APIC -- Giant is not really the driver
lock, but just a lock for interfacing between related screen and keyboard
drivers.

Serial console drivers with fast interrupt handlers have much more
broken locking for ddb special keys.  It is invalid to either drop locks
or call the "any" function from a fast interrupt handler, but buggy
serial console drivers calls kbd_alt_break(), and that now calls
shutdown_nice() and other functions that cannot be called from a fast
interrupt handler.  ddb keys supply most of the shutdown_nice()
functionality for serial consoles, and there are no escape sequence to
get this without ddb or maybe another debugger, so these bugs don't
affect most users.

Handling this correctly requires much the same fix as an unsafe signal
handler, and fixes have much the same problems -- not much more than
setting a flag is safe, and the flag might never be looked at if the
system is in a bad state.  However, if a nice shutdown is possible then
the sytem must be in a good enough state to poll for flags.

For normal signal handlers, there is no problem sending a signal to init
or at least with setting a flag and waking up some thread to check the
flag.

I don't quite understand this commit.  It should be valid to send a
signal to init() in proc or non-fast ithread context including with
Giant held.  shutdown_nice() starts with PROC_LOCK(initproc), so it
would be a LOR to call it with a spinlock held, and is even more
obviously wrong to call it from kbd_alt_break() with interrupts masked
and possibly a spinlock held, but Giant is a special sleep lock that
causes fewer LORs than most sleep locks.

Actual testing shows that doesn't cause a panic, but it also doesn't
actually unlock for shutdown_nice(), since the lock is acquired twice
and only released once.  syscons has much the same extra lock sprinkling
for event handling as vt:

- intr_event_execute_handlers() acquires Giant and calls atkbdintr()
- atkbdintr() calls sckbdevent()
- sckbdevent() unnecessarily acquires Giant again
- the buggy unlocking drops Giant just once
- shutdown_nice() is called with Giant held
- the buggy unlocking fails to re-acquire Giant
- sckbdevent() releases Giant, leaving it not held
- sckbdevent() returns
- atkbdintr() returns
- intr_event_execute_handlers() releases Giant.  This should panic, but
   it apparently blocks for long enough for init to shut down first.

When I trace the last step, I get a panic which might be either from the
different timing or just a bug in kdb.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180322024846.S4293>