From owner-svn-src-all@freebsd.org Thu Jul 16 17:45:46 2015 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A7A499A3389; Thu, 16 Jul 2015 17:45:46 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 517A81DC6; Thu, 16 Jul 2015 17:45:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7C4FA1046508; Fri, 17 Jul 2015 03:45:37 +1000 (AEST) Date: Fri, 17 Jul 2015 03:45:36 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Sean Bruno cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r285639 - head/sys/dev/e1000 In-Reply-To: <201507161632.t6GGWwJA072336@repo.freebsd.org> Message-ID: <20150717030840.B3034@besplex.bde.org> References: <201507161632.t6GGWwJA072336@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=ItbjC+Lg c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=KZVWSx2yD1shGd9gcesA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jul 2015 17:45:46 -0000 On Thu, 16 Jul 2015, Sean Bruno wrote: > Log: > Add an adapter CORE lock in the DDB hook em_dump_queue to avoid WITNESS > panic in em_init_locked() while debugging. It is a bug to lock anything from within ddb. Witness or something should panic when such a lock is attempted. > Modified: head/sys/dev/e1000/if_em.c > ============================================================================== > --- head/sys/dev/e1000/if_em.c Thu Jul 16 16:08:40 2015 (r285638) > +++ head/sys/dev/e1000/if_em.c Thu Jul 16 16:32:57 2015 (r285639) > @@ -5998,7 +5998,9 @@ DB_COMMAND(em_reset_dev, em_ddb_reset_de The primary bug is probably existence of this command. You just can't call arbitary code from within ddb. It wanders into locks galore. It should detect this and panic, but is usually not that smart. It is more likely to deadlock. The existence of the dump command is an even larger error. The existence of the panic command is a smaller error. Panic enters a separate state, similar to ddb's but not as strong. Panic is a last resort, so deadlocks and recursive panics in it are acceptable. It has some defense against endless recursion. > dev = devclass_get_device(dc, index); > if (device_get_driver(dev) == &em_driver) { New-bus calls should be locked by Giant or something like that. This is probably not done, and the bug is apparently not detected. > struct adapter *adapter = device_get_softc(dev); > + EM_CORE_LOCK(adapter); > em_init_locked(adapter); A bug was previously detected here by an assertion that the lock is held. Acquiring the lock breaks the the detection. em's mutex is not recursive, so attempting to acquire it from within ddb like this gives deadlock if it is already held. In the previous version, suppose you turn off witness so that the bug is detected. Then em's mutext locking just doesn't work. > + EM_CORE_UNLOCK(adapter); > } > } > } I used checks like the following in a very old version to detect broken locking calls from within ddb. This had to be turned off because there are too many broken callers: X diff -u2 kern_mutex.c~ kern_mutex.c X --- kern_mutex.c~ Wed Apr 7 20:39:12 2004 X +++ kern_mutex.c Sat Feb 3 04:25:00 2007 X @@ -244,4 +246,9 @@ X { X X +#ifdef DDB1 X + if (db_active) X + db_printf("_mtx_lock_flags: called with db_active @ %s:%d\n", X + file, line); X +#endif X MPASS(curthread != NULL); X KASSERT(m->mtx_object.lo_class == &lock_class_mtx_sleep, X @@ -348,4 +355,10 @@ X { X X +#ifdef DDB1 X + if (db_active) X + db_printf( X + "_mtx_lock_spin_flags: called with db_active @ %s:%d\n", X + file, line); X +#endif X MPASS(curthread != NULL); X KASSERT(m->mtx_object.lo_class == &lock_class_mtx_spin, X @@ -432,4 +445,9 @@ X #endif X X +#ifdef DDB1 X + if (db_active) X + db_printf("_mtx_lock_sleep: called with db_active @ %s:%d\n", X + file, line); X +#endif X if (mtx_owned(m)) { X KASSERT((m->mtx_object.lo_flags & LO_RECURSABLE) != 0, X @@ -568,4 +585,9 @@ X int i = 0; X X +#ifdef DDB1 X + if (db_active) X + db_printf("_mtx_lock_spin: called with db_active @ %s:%d\n", X + file, line); X +#endif X if (LOCK_LOG_TEST(&m->mtx_object, opts)) X CTR1(KTR_LOCK, "_mtx_lock_spin: %p spinning", m); The correct way to implement many ddb commands and all instances of call commands from within ddb is to use a trampoline like gdb does in userland (or just like signals are handled in userland). ddb must be exited from to run the command, and must regain control after running the command. The dangers of calling arbitrary code are then more obvious. The code is then run without ddb's stopping of other CPUs and masking of interrupts on the current CPU. Sometimes that allows it to work, but sometimes it just races more with other CPUs. The code acts at it is run by a trap handler that may be entered by any CPU at any address. An intermediate mode where ddb exits but keeps other CPUs stopped and interrupts masked might be useful but is hard to implement. ddb is too stupid to implement this even for trace traps. Single stepping is implemented by exiting ddb for every step (thus starting other CPUs, etc., for every step) after setting the trace flag to arrange for a trap after the next instruction. It is the normal path of execution that is exited to, so this is relatively safe, but ddb still loses all control and other threads and interrupts may change the state that you are trying to see changed 1 step at a time. Bruce