From owner-svn-src-all@FreeBSD.ORG Sun Jan 4 12:36:14 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9F095A1E; Sun, 4 Jan 2015 12:36:14 +0000 (UTC) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 5F6691150; Sun, 4 Jan 2015 12:36:13 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 11B4FD64629; Sun, 4 Jan 2015 23:36:04 +1100 (AEDT) Date: Sun, 4 Jan 2015 23:36:03 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Hans Petter Selasky Subject: Re: svn commit: r276626 - head/sys/kern In-Reply-To: <54A87C28.2030304@selasky.org> Message-ID: <20150104211113.A1445@besplex.bde.org> References: <201501031721.t03HLKHp060964@svn.freebsd.org> <20150104045338.V2929@besplex.bde.org> <54A87C28.2030304@selasky.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R8o6R7hX c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=MvYiGNustJpYgA7wAtwA:9 a=CjuIK1q_8ugA:10 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Jan 2015 12:36:14 -0000 On Sun, 4 Jan 2015, Hans Petter Selasky wrote: > On 01/03/15 21:17, Bruce Evans wrote: >> On Sat, 3 Jan 2015, Hans Petter Selasky wrote: >> >>> Log: >>> Rework r276532 a bit. Always avoid recursing into the console drivers >>> clients, hence they might not handle it very well. This change allows >>> debugging mutex problems with kernel console drivers when >>> "debug.witness.skipspin=0" is set in the boot environment. >> >> This keeps getting worse. Console drivers are required to be reentrant >> so that console i/o works in any context, including reentering the >> console driver a couple of times. Please trim quotes. [quotes trimmed excessively since 2 copies of everything is too much already] >>> ... >>> + if (mtx_owned(&cnputs_mtx)) >>> + return; >> >> The largest obvious new bug here is losing all output from panic() >> (and possibly other critical output) for panics that occur while >> the mutex is held by the same CPU. The previous change made this >> case work (for reentrant console drivers) instead of deadlocking. > Thanks for your analysis. I can get the the full backtrace if you like, but > not right now. It is quite long! > > Quickly summarized, the problem is that printf recurses from inside a > mtx_lock_spin() because of a LOR, due to some testing I'm doing, which is > because witness calls directly to printf() again. In other words, we cannot > use printf() to debug problems in the console drivers, and actually printf() > should be disabled when we are inside certain code paths, simply. I now remember more of the problem. Broken console drivers use mutex locks. The most broken ones don't even declare these locks as MTX_NOWITNESS. Thus when a LOR is detected in one of these locks, it causes endless recursion into the console driver. Finally, LORs _should_ be detected for many uses of these locks, since the fundamantal brokenness of using mutex locks in console drivers includes there being no single correct order for them. printf() is required to work in any context. It shouldn't be disabled. Making it work for witnessing of itself is especially easy compared with making it work for ddb'ing itself. For witness, the recursion only occurs in certain contexts that can be limited. For ddb, it occurs on any instruction within the console driver (when all the instructions are stepped through). printf() is perhaps the must useful place to put a ddb breakpoint at. I often sprinkle debugging printf()s which are rarely excuted and put a breakpoint a printf() instead of in all the callers. Then when the printf() is reached, I sometimes step a fair way into it. The surprising thing is that the recursion/deadlock doesn't happen more often. This seems to be because: - most systems configure MTX_SKIPSPIN, so witness never detects LORs with spinlocks - most console drivers aren't so broken as to not use MTX_NOWITNESS You said that you force !MTX_SKIPSPIN. I suspect that you also use one of the broken console drivers. (Oops, you say later that you are using vt). vt and usb_serial are obviously broken, since they do unconditional mutex locking with a non-recursive witnessed mutex, without even the kdb_active hack to ignore the lock while in ddb. The above change helps a little for these drivers. The higher-level lock prevents interference from console calls from cnputs(). All other console calls, which include all output when PRINTF_BUFR_SIZE is configured and all ddb output, just deadlock as before. Note that when the less broken console drivers are called while kdb is active, the kdb_active hack results in the locking having no effect. Recursive cases just proceed (but without risk of recursing endless unless another lock is (mis)used), so the lock is useless in this case. This case must work without the lock, and once it does the lock is not needed in non-ddb cases either. The quick fix for recursion in witness is to use MTX_NOWITNESS for all console driver locks. This is undesireable since it breaks witnessing of these locks in other contexts, but sio always did it. MTX_RECURSE might also be needed, but is too dangerous, and it takes very large locking bugs or rare races (something like printf()s in trap handlers) for recursion to occur, once you have removed the recursion for witnessing a bug in yourself. The fundamental LOR problem with using mutex locking in console drivers is: - console drivers need to be callable from anywhere, so their locks need to be ordered after all others - however, to be useful on devices with the console shared with a normal device, the same lock needs to be used for the console part as for the main part. (Sharing the lock gives potential deadlock or just too long a delay before the console i/o is done, but normally the main part releases the lock soon enough.) In the main part, the lock is not special. It shouldn't be MTX_NOWITNESS, and other locks may be acquired while it is held. For no LOR for acquiring these other locks, the other locks need to be ordered after the console lock. This conflicts with the requirement that the console lock be ordered after all others. The locking table is ordered so that there is no LOR for the normal use. Thus there is almost always a LOR for console use. This LOR is normally harmless, but not always. It may indicate larger locking bugs in console drivers. To avoid seeing this LOR and getting endless recursion from it, MTX_NOWITNESS must be used. I tried reordering the locking table to put console locks last. This seemed to work at first, but that was only because MTX_SKIPSPIN was enabled. When spinlocks were properly witnessed, (false) LORs were detected in normal operation. I now use the following which gives only minor reording by pushing down console locks as far as possible. This was only done carefully for console and serial drivers that I used (sio, cy and syscons). X diff -c2 ./kern/subr_witness.c~ ./kern/subr_witness.c X *** ./kern/subr_witness.c~ Tue Apr 10 01:51:12 2012 X --- ./kern/subr_witness.c Tue Oct 21 19:14:20 2014 X *************** X *** 629,635 **** X #endif X { "rm.mutex_mtx", &lock_class_mtx_spin }, X { "sio", &lock_class_mtx_spin }, X { "scrlock", &lock_class_mtx_spin }, X - #ifdef __i386__ X { "cy", &lock_class_mtx_spin }, X #endif X --- 629,637 ---- X #endif X { "rm.mutex_mtx", &lock_class_mtx_spin }, X + #define MISORDERED X + #undef MISORDERED X + #ifdef MISORDERED X { "sio", &lock_class_mtx_spin }, X { "scrlock", &lock_class_mtx_spin }, X { "cy", &lock_class_mtx_spin }, X #endif cy is neither i386-specific or a console driver. X *************** X *** 639,643 **** X --- 641,647 ---- X #endif X { "scc_hwmtx", &lock_class_mtx_spin }, X + #ifdef MISORDERED X { "uart_hwmtx", &lock_class_mtx_spin }, X + #endif X { "fast_taskqueue", &lock_class_mtx_spin }, X { "intr table", &lock_class_mtx_spin }, X *************** X *** 650,662 **** X --- 654,679 ---- X { "rm_spinlock", &lock_class_mtx_spin }, X { "turnstile chain", &lock_class_mtx_spin }, X + #ifndef MISORDERED X + { "sio", &lock_class_mtx_spin }, X + { "cy", &lock_class_mtx_spin }, X + #endif X { "turnstile lock", &lock_class_mtx_spin }, X { "sched lock", &lock_class_mtx_spin }, X { "td_contested", &lock_class_mtx_spin }, When the interrupt handler calls the scheduler, its lock must be ordered before scheduler locks. X + #ifndef MISORDERED X + { "scrlock", &lock_class_mtx_spin }, X + #endif X { "callout", &lock_class_mtx_spin }, X { "entropy harvest mutex", &lock_class_mtx_spin }, X + #ifdef MISORDERED X { "syscons video lock", &lock_class_mtx_spin }, X + #endif X #ifdef SMP X { "smp rendezvous", &lock_class_mtx_spin }, X #endif X + #ifndef MISORDERED X + { "uart_hwmtx", &lock_class_mtx_spin }, X + { "syscons video lock", &lock_class_mtx_spin }, X + #endif I think uart can be pushed down this far. Its locked regions are too short, but I think it releases the lock before acquiring others, so it can be a leaf lock for all uses. This has barely been tested. X #ifdef __powerpc__ X { "tlb0", &lock_class_mtx_spin }, > Instead of a nice LOR printout, I immediately hit a panic, because other > mutexes marked as non-recursive suddenly got recursed inside the console > drivers, in this case "dev/vt". For now, printouts are not allowed if inside > "cnputc()". This at least got me on with the debugging. Previously > "cnputs_mtx" was marked non-recursive, and I think that is correct. So it was just new bug in vt. Try the quick fix for its console mutex. Or better, use syscons and use the quick fix for its mutexes. Its bugs are better understood and partly fixed in a one of my versions. vt is also missing the kdb_active hack. It also has some lower-level spin mutexes (mainly VTBUF_LOCK()?). This gives obvious deadlocks. E.g., one CPU may be doing normal outout and hold VTBUF_LOCK(). Then a debugger trap on another CPU will first stop the first CPU and then deadlock on VTBUF_LOCK(). If the kdb_active hack is used to avoid this deadlock, then the locking simply doesn't work. vt's locks don't seem to be in the static LOR table. The table is small, so there must be many other locks like that. I don't see how a default order can work. Non-recursive is mostly correct. The recursion indicates another bug. Bruce