Date: Tue, 9 May 2017 07:02:35 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Warner Losh <imp@bsdimp.com> Cc: Konstantin Belousov <kostikbel@gmail.com>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, Vladimir Kondratyev <vladimir@kondratyev.su>, owner-freebsd-arch@freebsd.org Subject: Re: psm(4) & atkbdc(4) locking Message-ID: <20170509055914.N18777@besplex.bde.org> In-Reply-To: <CANCZdfr-B_EBLPb2mbeNd12u6PduVrzghDTbMiN4eGAhTZEhbw@mail.gmail.com> References: <a50871663d281c388e16b146496ed035@kondratyev.su> <CANCZdfrvZ0OuQhRhcpMhxFphMjBuR22HJQTNyL=NDLb1VihJRQ@mail.gmail.com> <80600e8d3da7725a66e2a1e24cbfd916@kondratyev.su> <20170508193935.GE1622@kib.kiev.ua> <CANCZdfr-B_EBLPb2mbeNd12u6PduVrzghDTbMiN4eGAhTZEhbw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 8 May 2017, Warner Losh wrote: > On Mon, May 8, 2017 at 1:39 PM, Konstantin Belousov <kostikbel@gmail.com> wrote: >> On Mon, May 08, 2017 at 09:58:27PM +0300, Vladimir Kondratyev wrote: >>> On 2017-05-08 21:39, Warner Losh wrote: >>>> On Mon, May 8, 2017 at 12:33 PM, Vladimir Kondratyev >>>> <vladimir@kondratyev.su> wrote: >>>>> Hi All >>>>> >>>>> In order to implement evdev support in psm(4) driver I`m going to add >>>>> mutexes to psm and atkbdc drivers and mark psm interrupt and cdev >>>>> handlers >>>>> MPSAFE. >>>>> Atkbd(4) is depending on Giant like before. >>>>> >>>>> Locking of these drivers seems to be low-hanging fruit as spl() calls >>>>> are >>>>> still in place but it has not occurred. High-hanging fruit. it is a lot of work to replace the buggy locking by something that works, without causing deadlocks instead of relatively harmless races. >>>>> Does someone know why it is not done yet? For reasons other than >>>>> "Nobody >>>>> took care of it"? >>>>> Any hidden traps? There is ddb (discussed below), and also syscons calling the keyboard driver. syscons is still Giant-locked, partly because keyboard drivers are. The common Giant lock works more simply than separate non Giant ones. The same non-Giant lock for both would would have some of the complexities of Giant without the support already being there. >>>> Working rock-solid reliably in ddb(4) is what tripped me up when I >>>> started down this path 10 years ago. >>> >>> I tried to avoid atkbd changes as much as possible. Patch adds atkbdc >>> mutex acquisition >>> before each hardware access and nothing more. I think mutexes should be >>> ignored in ddb mode. They are only ignored if you do it explicitly, but this is fragile. If the hardware works after ignoring locks while in kdb mode, then it must work after ignoring locks in all modes. >> Locks are not ignored in kdb_active mode, and this is reasonable. >> Instead, kdb should not acquire locks at all. (This should be ddb_active mode. Only ddb uses the keyboard. But there is only a flag for kdb.) See my recent fixes for syscons. These basically lock console output correctly, but for the keyboard they depend on keyboard drivers having no locking and not needing any (really, both syscons and keyboard drivers are supposed to be locked by a common lock which is Giant, but this doesn't work for i/o in kdb mode and is fragile for low-level console output not in kdb mode. My tests arrange for console output from almost to worst places where it shouldn't be done (from IPI STOP handlers, where the CPU doing the output must succeed despite other CPUs that have already been stopped holding console driver locks). Keyboard drivers are harder to test in this context, and are sure to fail. so I don't try hard to test them. However, kdb and panics in this context sometime reach the keyboard driver. Also, on syscons entry in such context, it tries to not use keyboard drivers if not necessary. >> Consider a situation where you need to send a composite command to >> the hardware, which consists of several registers accesses, and the >> whole sequence of accesses is covered by a lock to ensure atomicity. >> Kdb may be entered at arbitrary moment, including the middle of the >> lock-protected section. This means that kdb might be entered with the >> lock owned by some thread, not neccessary the thread which was executing >> when entrance occured. More, the hardware state is some intermediate >> state of being commanded the composite sequence, not yet finished. AT keyboards don't seem to mind some corruption of the hardware state. In general, provided a bad state doesn't destroy the device, then at worst the driver can probably recover by resetting everything after a timeout. My fixes are simpler and more complete for sio. They are simpler because the driver only has 1 lock for input and output (but this lock is also shared between sio devices, which gives problems related to those for Giant -- console output is supposed to be immediate and must be synchronous, but you are forced to wait for other devices). sio always did an almost complete state switch for console entry and exit. This is possible since the device registers are read-write and restoring them works as well as a reset for getting back to a usable state. My recent fixes mainly add some control of the nesting for the stacked states, and extra care for some !kdb cases. > Yes. This is the snag I ran into... > >> I do not think that atkbd code correctly handles this situation now, >> and simply adding a lock there probably make things worse. Yes, it would just give deadlock. See my recent fixes. atkbd currently requires Giant locking, but this is obviously impossible for low-level console i/o (even without ddb). Syscons used to do null locking in this case, without really trying. Initially it used non-null spls, but spls don't work at all for this use (since spl only locks out interrupts). Then when syscons was converted to Giant, Giant works too well for hiding itself, so syscons has almost no explicit references to Giant, and certainly none for low-level-console i/o where they wouldn't work. Giant also works well for locking out the interrupt handlers, but converting the spls to null ones made them do even less than before to accidentally limit console i/o. If you sprinkle locks that actually, work, then deadlock is certain in some cases. If you use recursive locks like syscons still does, then they will often avoid deadlock but won't actually work. They reduce to null spls if they are already held by the same thread. Typical examples of deadlocks are: - thread holding lock is stopped in IPI STOP handler - another thread calls the console driver, perhaps to report a problem in its IPI STOP handler Typical examples of recursive (or otherwise ignored) locks not actually working when the are needed are: - thread is in super-critical region of console driver and aquires lock. This should be a spinlock, and interrupts should be disabled (this is a side effect of spinlocks) - NMIs and traps cannot be disabled. They are often caused by NMI IPI STOPs and debugger traps. Since the region is super-critical, i/o should not be done, at least blindly. But recursive locks are useless for stopping the i/o, since the same thread owns the lock. Ignoring the lock in kdb mode is equally dangerous. > It did for me, since breaking to keyboard was totally broken by the > changes I made to try to lock things since it was quite likely to > interrupt things when locks were held... I had a very naive > implementation, which wasn't up to the task, so some care is needed. > But this was in the 5.x or 6.x time frame, and the locking situation > wrt GIANT is much better today than then... Not much better. There are just more deadlocks and less races now. Lots of deadlocks were added even at the printf() and msgbuf level. One deadlock in cnputs() was there for many years. Output is now droppped there instead. > I don't think it's a huge problem, but just one that the implementor > needs to be aware of... Since I was unaware of all the details up > front, I came up with a solution that couldn't possibly work so I > abandoned it. The difficulties are mostly in complex hardware and drivers. The only method that is sure to work is complete reinitialization/reset whenever needed (including before the normal driver is probed), without losing too much i/o. To take shortcuts in this, so that it doesn't take as much code as the normal driver, you have to understand the hardware better than is needed for writing the normal driver. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170509055914.N18777>