From owner-svn-src-all@freebsd.org Thu Aug 8 10:55:50 2019 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C0AEFCAB18; Thu, 8 Aug 2019 10:55:50 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 4644zV2GrGz49Gp; Thu, 8 Aug 2019 10:55:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 1922D3623A2; Thu, 8 Aug 2019 20:55:46 +1000 (AEST) Date: Thu, 8 Aug 2019 20:55:44 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Conrad Meyer cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r350713 - head/sys/kern In-Reply-To: <201908080042.x780gThf078293@repo.freebsd.org> Message-ID: <20190808183733.O1073@besplex.bde.org> References: <201908080042.x780gThf078293@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=s7fTf2Y6X_s07L22htsA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 4644zV2GrGz49Gp X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-6.76 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.76)[-0.757,0] X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Aug 2019 10:55:50 -0000 On Thu, 8 Aug 2019, Conrad Meyer wrote: > Log: > ddb(4): Add 'sysctl' command > > Implement `sysctl` in `ddb` by overriding `SYSCTL_OUT`. When handling the > req, we install custom ddb in/out handlers. The out handler prints straight > to the debugger, while the in handler ignores all input. This is intended > to allow us to print just about any sysctl. > > There is a known issue when used from ddb(4) entered via 'sysctl > debug.kdb.enter=1'. The DDB mode does not quite prevent all lock > interactions, and it is possible for the recursive Giant lock to be unlocked > when the ddb(4) 'sysctl' command is used. This may result in a panic on > return from ddb(4) via 'c' (continue). Obviously, this is not a problem > when debugging already-paniced systems. Locking for kdb was grossly broken in r339917, by setting and unsetting the SCHEDULER_STOPPED() flag on entry and exit to kdb. This flag is only usable during panic() since when it is set all mutex operations potentially corrupt their mutex, and unsetting the flag doesn't undo the corruption. Many different types of corruption are possible, but my simplest example if like the one here for Giant, but for a spinlock. kdb runs happily while SCHEDULER_STOPPED() is set, but if it corrupts a mutex then the corruption is fatal later when SCHEDULER_STOPPED() is clear (setting SCHEDULER_STOPPED() breaks early detection of the corruption in most cases). kern_sysctl.c does only 1 pair of mutex calls. In sysctl_root_handle_locked(), it locks and unlocks Giant for the !MPSAFE case. If that is the only mutex call, then the problem is easy to fix by not making these calls if kdb_active. One of my versions of kdb asserts that no mutex calls are made if kdb_active. This makes it easy to find buggy mutex calls. Unfortunately, low-quality low level console drivers like vt make such calls unconditionally. I used this to avoid all the buggy calls in syscons (a typical error there was for to wander into wakeup or timeout code which uses mutexes. This was "fixed" by removing things like beeping for errors. The keyboard driver uses Giant fundamentally. This was "fixed" using the usual hack of not doing the mutex call if kdb_active). > ... > Modified: head/sys/kern/kern_sysctl.c > ============================================================================== > --- head/sys/kern/kern_sysctl.c Thu Aug 8 00:33:23 2019 (r350712) > +++ head/sys/kern/kern_sysctl.c Thu Aug 8 00:42:29 2019 (r350713) > ... > +/* > + * Run a sysctl handler with the DDB oldfunc and newfunc attached. > + * Instead of copying any output to a buffer we'll dump it right to > + * the console. > + */ > +static int > +db_sysctl(struct sysctl_oid *oidp, int *name, u_int namelen, > + void *old, size_t *oldlenp, size_t *retval, int flags) > +{ > + struct sysctl_req req; > + int error; > + > + /* Setup the request */ > + bzero(&req, sizeof req); > + req.td = kdb_thread; > + req.oldfunc = sysctl_old_ddb; > + req.newfunc = sysctl_new_ddb; > + req.lock = REQ_UNWIRED; > + if (oldlenp) { > + req.oldlen = *oldlenp; > + } > + req.validlen = req.oldlen; > + if (old) { > + req.oldptr = old; > + } > + > + /* Setup our globals for sysctl_old_ddb */ > + g_ddb_oid = oidp; > + g_ddb_sysctl_flags = flags; > + g_ddb_sysctl_printed = 0; This has many style bugs. So does the rest of the file. > + > + error = sysctl_root(0, name, namelen, &req); This makes invalid mutex calls for all !MPSAFE sysctls. Setting SCHEDULER_STOPPED() for kdb is only very broken for spin mutexes. Then for a contested mutex, mtx_lock_spin() ends up doing nothing except invalid lock and/or kdtrace profiling when it finds SCHEDULER_STOPPED() set. but mtx_unlock_spin() never checks SCHEDULER_STOPPED(), so it ends up corrupting the mutex and the spinlock state by attempting to unlock a mutex whose corresponding locking was not done. INVARIANTS causes similar bugs for recursed-on spin mutexes. Without INVARIANTS, everything is done inline without checking SCHEDULER_STOPPED(), so it works well enough in practice in uncontested non-recursive cases. With INVARIANTS, the null locking gives no increment of the recursion counter and no call to spinlock_enter(), but the non-null unlocking decrements the recursion counter and calls spinlock_exit(). For sleep mutexes, unlocking checks SCHEDULER_STOPPED() consistently with locking, so the bugs are smaller. The easiest way to demonstrate them is to configure INVARIANTS and WITNESS. Starting with Giant unheld, mtx_lock(&Giant) does nothing with the mutex except invalid lock and/or kdtrace profiling. mtx_unlock) calls __mtx_unlock_flags() which calls __mtx_assert(m, MA_OWNED), but this check is subverted by not doing it when SCHEDULER_STOPPED(). But before that, __mtx_unlock_flags() calls WITNESS_UNLOCK() and witness_unlock() is not similarly subverted. Also, WITNESS uses a critical spin mutex w_mtx, and this is corrupted as above. Bruce