From owner-svn-src-head@FreeBSD.ORG Mon Nov 17 11:22:53 2008 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44814106564A; Mon, 17 Nov 2008 11:22:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id BF61A8FC14; Mon, 17 Nov 2008 11:22:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-107-115-33.carlnfd1.nsw.optusnet.com.au (c122-107-115-33.carlnfd1.nsw.optusnet.com.au [122.107.115.33]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id mAHBMb91021883 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 17 Nov 2008 22:22:42 +1100 Date: Mon, 17 Nov 2008 22:22:36 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Maksim Yevmenkin In-Reply-To: <200811162239.mAGMd4mh066533@svn.freebsd.org> Message-ID: <20081117222219.I95669@delplex.bde.org> References: <200811162239.mAGMd4mh066533@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r185013 - head/sys/dev/syscons X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2008 11:22:53 -0000 On Sun, 16 Nov 2008, Maksim Yevmenkin wrote: > Author: emax > Date: Sun Nov 16 22:39:04 2008 > New Revision: 185013 > URL: http://svn.freebsd.org/changeset/base/185013 > > Log: > More locking for syscons(4). This should prevent races with sckbdevent(). This cannot be the correct fix. It should cause panics due to failure of the (missing) assertion that no locks may be accessed in debugger mode. Locks may not be accessed in debugger mode because: (1) locks may be in an inconsistent state (when locking code is being debugged) (unless they are made to work virtually -- switch all locks to a safe state while in debugger mode, but display the unswitched state in all debugger commands) (2) deadlock is possible (would also be avoided by virtualization). > Modified: > head/sys/dev/syscons/syscons.c > > Modified: head/sys/dev/syscons/syscons.c > ============================================================================== > --- head/sys/dev/syscons/syscons.c Sun Nov 16 21:57:54 2008 (r185012) > +++ head/sys/dev/syscons/syscons.c Sun Nov 16 22:39:04 2008 (r185013) > @@ -1572,6 +1572,7 @@ sccngetch(int flags) > int s = spltty(); /* block sckbdevent and scrn_timer while we poll */ The comment still says that this is what blocks sckbdevent(). The comment was wrong too for debugger mode -- in debugger mode, interrupts are masked in hardware and other CPUs are stopped, so nothing except possibly this functions internals can call sckbdevent(). Internal calls to scrn_timer() used to be prevented by sccndbctl() setting syscons' `debugger' flag and this function and others checking this flag. This has been lost. I think the flag isn't needed and wasn't used to protect sckbdevent(), and your problem has nothing to do with debugger mode except for breaking it. Instead your problem is a layering one (continued below (*)). > int c; > > + mtx_lock(&Giant); > /* assert(sc_console != NULL) */ > > /* Acquiring Giant in debugger mode is especially invalid (except deadlock is not so likely for a recursive lock). > @@ -1583,11 +1584,13 @@ sccngetch(int flags) > sccnupdate(scp); > > if (fkeycp < fkey.len) { > + mtx_unlock(&Giant); > splx(s); > return fkey.str[fkeycp++]; > } > > if (scp->sc->kbd == NULL) { > + mtx_unlock(&Giant); > splx(s); > return -1; > } > @@ -1610,6 +1613,7 @@ sccngetch(int flags) > scp->kbd_mode = cur_mode; > kbdd_ioctl(scp->sc->kbd, KDSKBMODE, (caddr_t)&scp->kbd_mode); > kbdd_disable(scp->sc->kbd); > + mtx_unlock(&Giant); > splx(s); > > switch (KEYFLAGS(c)) { > BTW, sccngetch() shouldn't exist. It existed to demultiplex the 2 console driver entry points sc_cngetch() and sc_cncheckc(), but sc_cncheckc() has gone away. (*) syscons has several getc routines. The layering and locking for these should be something like: sc_cngetc(): must not access any locks; currently implemented by calling scgetc(); therefore: scgetc(): must not access any locks sccngetch(): bogus layering -- see above sckbdevent(): I think this is the entry point for all normal (non- low-level-console) syscons input. It is currently implemented by calling scgetc(), which must not access any locks. Therefore, any locking must be in this function. I think it currently uses Giant locking. There is still a problem for calls to sc_cngetc() from the low-console driver. These can race with sckbdevent(). In debugger mode, no locking is permitted, so syscons' getc functions must be carefully written to do only harmless things when they lose races. They are not carefully written (e.g., almost the first thing they do, shown in the above, may give a buffer overrun: if (fkeycp < fkey.len) { mtx_unlock(&Giant); splx(s); return fkey.str[fkeycp++]; } since the increment is not atomic with the bounds check), but they mostly work. (Syscons' putc routines have a much larger number of races like this, but they mostly work too. It used to be easy to cause panics by racing normal console output with printfs from an interrupt handler (use a high frequency timeout interrupt handler that prints something), but almost any locking would fix that and I think Giant-locking everything fixed it accidentally (**). So there is only a problem modes like debugger mode where locking is not permitted.) The races for sc_cngetc() were normally limited: - most calls were in debugger mode, and debugger mode limits problems - the only other calls were for things like gets() for non-auto mountroot and cngetc() for the "hit any key to reboot". This case might even supply the needed and permitted Giant locking accidentally. However, you seem to have made the races very common by (ab)using cngetc() for keyboard multiplexing. cngetc() is not designed for this. You need to know syscons' internals and do the necessary Giant locking in the caller. (**) Giant locking for printf: I can't find any. tprintf() and uprintf() used to acquire Giant to fix related problems, and even that went away with the MPSAFE tty changes. Ah, the related problem is very closely related: ttys used to require Giant locking for everything *except* low- level consoles. tprintf() and uprintf() use normal i/o so they had to supply the Giant locking. OTOH, low-level console i/o uses (broken for some console drivers including syscons) null locking. The keyboard multiplexor needs to work mainly for normal i/o (it would be a nightmare to make it work for low-level i/o, with correct null locking, since it touches a lot of code and data). Thus it needs to supply the Giant locking if syscons still needs it. It still seems to use Giant locking after the MPSAFE tty changes. I don't completely understand those. They make it even harder supply correct locking -- now you might need Giant or the tty lock, or both... In fact, the main change to sckbdevent() to make syscons MPSAFE was to use Giant locking in sckbdevent(). That seems sort of backwards (the low levels which benefit most from low latency keep high latency), but might be needed because keyboard drivers aren't MPSAFE yet, and wouldn't want to know about the tty lock even if they were. So supplying Giant locking seems to be the correct way to interface with sckbdevent(). Bruce