From owner-freebsd-arch@FreeBSD.ORG Wed Aug 26 22:07:49 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AEC651065690 for ; Wed, 26 Aug 2009 22:07:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id 30D408FC27 for ; Wed, 26 Aug 2009 22:07:48 +0000 (UTC) Received: from besplex.bde.org (c122-106-152-1.carlnfd1.nsw.optusnet.com.au [122.106.152.1]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n7QM7ZlC016388 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 27 Aug 2009 08:07:36 +1000 Date: Thu, 27 Aug 2009 08:07:35 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Marcel Moolenaar In-Reply-To: Message-ID: <20090827044818.D826@besplex.bde.org> References: <20090824174050.GI2829@hoeg.nl> <2678DC6C-3E91-420A-B43D-02E0F1F853C5@mac.com> <20090825073057.GK2829@hoeg.nl> <20090826161941.B41435@delplex.bde.org> <6B5F99F6-3A66-4AE3-89BE-973F40AE34A9@mac.com> <20090826192646.O10848@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Ed Schouten , FreeBSD Arch Subject: Re: mtx_lock_do_what_i_mean() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Aug 2009 22:07:49 -0000 On Wed, 26 Aug 2009, Marcel Moolenaar wrote: > On Aug 26, 2009, at 5:06 AM, Bruce Evans wrote: > >>>> Everything is in place to remove 0.1% of the coupling. Debugger i/o >>>> still normally goes to the same device as user and kernel i/o, so it >>>> is strongly coupled. >>> >>> That's a non sequitur. Sharing travel destinations does >>> not mean that you travel together, for example. >> >> The coupling here occurs at the destination. > > Exactly: that's why I said everything is in place to change the > destination of printf(). Except for printf()s in panic(). These can only be redirected to driver(s) similar to working console drivers would need similar complications to work. One thing that could be done now is reducing from multiple active broken console drivers to a single least-broken one for panics and debugging, or at least as a last effort for recursive panics. However, this doesn't help in the usual case where there is only 1 active console driver or in the fairly usual case where the user has only 1 set of console hardware. >>> Having printf() not even write to the console does not >>> mean that the debugger cannot keep using the low-level >>> console interfaces... >> >> It just means that printf() would be slightly broken (no longer >> synchronous, and unusable in panic()...). > > printf not being synchronous is actually what solves all > the complexity. We don't need synchronous output in the > normal case. Only for the "broken" case (i.e. kernel > panic, no root FS), do we need synchronous output. It's > the exceptional case. > > I belief common sense tells us to optimize for the common > case. We're not optimizing, but making the uncommon case actually work. >> Note that strong coupling is simplest here. > > I disagree. We've had various threads on this topic and > they had the same theme: "we have this interlock and it's > a problem. Help!" There is remarkably little understanding of the problem. The interlock is not a problem. It is the state being protected by the interlock that is a problem! A working console driver cannot just blow away the interrupted state, either hardware or software, and it cannot rely on the interrupt state being usable. > I belief that trying to solve the problem within the > existing framework is not the solution, because I belief > the framework itself is the problem. Rethinking it from > the bottom-up helps to detangle and come up with a good > implementation. I've outlined a good solution many times and implemented part of it in FreeBSD and all of it in my debugger's i/o routines: - switch to and from console i/o mode when directed to by cn_dbctl(). Natural switch points for ddb are on entry to and exit from interactive mode. Natural switch points for other i/o are not so clear -- switching for every printf() is too much. - make the switches nest as deep as necessary (3 or 4 levels is enough) - switches involve: on entry: - save all state (hardware and software) not already saved. Here it helps for upper layers to not have much state in the air. Here it is essential for upper layers to have recorded their state in some atomic way. - initialize all state (hardware and software) used by console driver. Here it helps for the upper layers to have initialized most of the state. You can always reinitialize everything, but then restoring the interrupted state will be hard. on exit: - restore as much state (hardware and software) as possible. Except, if the console i/o routines wrote to the same physical console that non-console routines write to, and this physical console is something like a frame buffer, then the something-like-a-frame-buffer state must not be restored, so that you can actually read the output. Debugger frame buffer output should go to a separate virtual console, possibly in a different video mode. If the video mode is not switched, then the hardware part of the switch involves mainly copying the frame buffer out and in. If the video mode is switched, then the switch involves reinitializing lots of hardware. Switching to a fixed mode is probably still simpler than supporting console output in all possible interrupted modes. Writing to a separate virtual console is not so good for non-debugger output, since the output needs to be made visible at some point and switching the console on every printf() is no good. Switching back and forth on every printf() is worst -- it makes the screen flicker, and the output is still invisible most of the time. IIRC, old console drivers (codrv and pcvt) wrote to ttyv0 and forced a switch there. This was inconvenient. It is probably best to always printf() to a separate virtual console (different from the debugger one), and also to the current console if possible and implemented. - after switching to console mode, the actual i/o is trival, at least if you always switch the interrupted mode to a fixed one if the interrupted mode is different. >> If debugger i/o is in a >> separate module then it has a hard time even knowing the interrupted >> state. One impossibly difficult weakly-coupled case is when normal >> i/o is done by a propietary X driver using undocumented hardware >> features from userland, with some undocumented features active at the >> time of the interrupt. > > The question is: why try so hard to solve a problem that's > specific to a case we all try our best to avoid? Isn't it > much easier to say that debugger output and console are not > the same so that you can run X on syscons and DDB over a > serial interface and if all else fails: dump a kernel core > and analyze the state offline. Some users and/or developers don't have or want to set up multiple i/o devices just for consoles. > Having an in-kernel debugger is great, but it should be > kept at "arms length" as much as possible. The moment you > start sharing interfaces or mixing functionality you're > setting yourself up for failure: either the debugger does > not work in certain cases (running X is a perfect example > of how the in-kernel debugger is totally useless) or you > complicate the kernel unnecessarily. Well, once you have working console i/o for debuggers, you almost have it for panic(), and you don't have to worry about printf() from interrupt handlers that interrupt an inadequately locked upper-level i/o routine corrupting the upper level. Syscons' spltty() (and missing) locking was always inadequate for this (it needed to be splhigh() for much the same reasons that it is now a spin mutex, but this is ugly). Working console i/o requires forces you to find most of the inadequate locking and helps find some of it (single-step through the driver until it crashes). >> Non-debugger console i/o is also impossibly >> difficult in this case. FreeBSD doesn't attempt to support it, and >> uses cn*avail* interfaces to give null i/o and less than null ddb >> support. With all the i/o in the kernel, it is possible to guess the >> hardware and driver state by peeking at driver variables and hardware >> registers. With strong coupling, it is possible to do this robustly. > > That's not true. There's no robust way for the kernel debugger > to use hardware that is under the console of process space. > If anything: output is always interrupted and disrupted by the > debugger, so even if the hardware is left in a consistent state, > the actual content on the screen may be garbled. The amount of disruption is hardware-dependent. Losing a couple of characters is unimportant, and is also avoidable for frame buffer type hardware. Switching the console and its mode to avoid disrupting output has been routine in at least userland debuggers for at least 20 years. Userland debuggers have fewer problems doing this since they can't interrupt hardware mode switches, but their switch is still a heavyweight operation. OTOH, kernel debuggers can avoid the mode switch and possibly the console switch in most cases since they are close to the driver (if the driver is entirely in the kernel). >> Upper layers must cooperate by recording enough of their state in an >> atomic way. The coupling in lower layers then consists of using the >> records and knowing that they are sufficient. > > Upper layers include user space in some cases. The state of the > 3D graphics accelerator is not something you want to have to worry > about in the kernel. Though, you do want to know the "mode" if > you want to write to the frame buffer. Graphical displays is our > weakest point and given that there's no interest in fixing it, Yes, I don't want to support any userland graphics driver. > I can say that no matter what we do in the existing framework we > will never have robust behaviour. Actually, it can be made robust (but not working) fairly easily: - clear the device's cn_avail_mask bit while in critical regions. This could be implemented fairly easily using the macro for locking the critical regions. - fix the placement of cnunavailable() in ddb, so that breakpoints in the instruction stream aren't fatal when they are hit while the console is unavailable. - the cn_avail_mask bit is managed by calling cnavail(). It is not very efficient, despite having no locking whatsoever. Fix its locking and try to optimize it. The whole of kern_cons.c has no locking except for broken locking in cnputs(). Normal mutex locking cannot be used, but cnputs() uses it. mtx_trylock() would be robust (but not working) in cnputs(). Accesses to cn_avail_mask need a bit more than atomicity, since there is another avail bit in cn_flags. - fix many other races involving accesses to cn_avail_mask. Oops. there are some that cannot be made robust fairly easily. There is 1 involving checking cnunavail() and acting on its result. This one is mosty dormant. Active ones include cnremove() called from the console control sysctl racing with cnavail() called from upper layers of device drivers that support consoles. Bruce