From owner-svn-src-all@FreeBSD.ORG Sun May 12 06:09:39 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E50912A2; Sun, 12 May 2013 06:09:39 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 7441FCDC; Sun, 12 May 2013 06:09:39 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id E4C2B1A3C80; Sat, 11 May 2013 23:09:36 -0700 (PDT) Message-ID: <518F320D.3070304@mu.org> Date: Sat, 11 May 2013 23:09:17 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Jeff Roberson Subject: Re: svn commit: r250411 - in head/sys: conf kern sys References: <201305091628.r49GSI33039873@svn.freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcel Moolenaar , jhb@freebsd.org, svn-src-all@freebsd.org, alfred@freebsd.org, attilio@freebsd.org, src-committers@freebsd.org, svn-src-head@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 May 2013 06:09:40 -0000 Can we just admit to ourselves that tweaks to debugging macros/printing and WITNESS are our kernel developer's "bikeshed zone" and get over the fact that people's needs may diverge and changing non-default behavior in non-critical paths is not going to be the death of the kernel as we know it? I could certainly believe that this sort of thing needs long and thorough discussion if it wasn't the equivalent of style tweaks to manpages. Let's leave the long and lengthy discussions to things that matter such as standards compliance, ABI, API and really cool performance and stability stuff. -Alfred On 5/11/13 5:43 PM, Jeff Roberson wrote: > On Thu, 9 May 2013, Marcel Moolenaar wrote: > >> Author: marcel >> Date: Thu May 9 16:28:18 2013 >> New Revision: 250411 >> URL: http://svnweb.freebsd.org/changeset/base/250411 >> >> Log: >> Add option WITNESS_NO_VNODE to suppress printing LORs between VNODE >> locks. To support this, VNODE locks are created with the LK_IS_VNODE >> flag. This flag is propagated down using the LO_IS_VNODE flag. >> >> Note that WITNESS still records the LOR. Only the printing and the >> optional entering into the kernel debugger is bypassed with the >> WITNESS_NO_VNODE option. > > I'm replying to the original commit because the resulting thread got > way out of hand. We need to all take a deep breath and take a > pragmatic approach to solving the problem at hand. > > Let me first say I understand the utility here as this is also coming > up in my organization. Test, and users, do not want to see erroneous > warning messages. I understand that. Let's find a solution. > > Secondly, I think this project has grown too far for us to commit > changes like this without some focused discussion. We need to be more > mindful of the size of the impact and the number of people who are > interested in a particular area. I'm not picking on you Marcel > because this sort of thing has been coming up lately and we have all > been guilty of it from time to time. There are more companies and > individuals than ever trying to push work into the repository and > we're having some growing pains. > > I am intimately familiar with the problems that lead to these > erroneous witness messages as I have tracked down many of them and am > even responsible for the code that generates them in some cases. Let > me first outline a handful of generic problems. The root cause is > that witness can not determine the real order between two locks due to > relationships too complex to describe with a pair of strings. > > One example, which has been brought up, is the hierarchical nature of > vnode locks. This impacts vnodes within one filesystem but it also > involves vnodes between two different filesystems as you cross mount > points. We can construct perfectly valid and deadlock free chains of > mount points that have two different filesystem types in different > orders which will LOR at the boundaries. We already skip duplicates > to avoid this problem within each filesystem. We need to skip > cross-filesystem duplicates, most desirably at the few specific places > where this happens. This problem comes up especially for devfs because > we lock devvps while file vnodes are locked but we lock devfs > directories after the rootfs lock when crossing mountpoints in lookup. > > A second example, is locks of a fundamentally different type that have > a complex ordering relationship. For example, a vnode lock may be > acquired after a buf lock belonging to the parent's directory block. > A cg buf lock may be acquired after any file buf lock. Here we want > to ignore interactions between these two specific types at this > particular location but not others as they may be unsafe. > > The third example, is a complex locking pattern with shared locks as > presented by dirhash. We are seeing a similar pattern develop in the > vm where we are going to use an exclusive object lock to protect pages > or a shared object lock + a page lock. The semantics only get more > complex as we push for more scalability. I expect to see more of these > patterns develop. > > None of these problems can be solved with names alone. So far we've > just lived with the warnings and we're no longer willing to accept > that. What we need is a solution that blesses the specific instances > and the specific lock classes involved without silencing legitimate > warnings that may only occur after new code is added. For example, it > may be safe to add a sx lock around some vnode code but you may not > notice that you LOR if you silence all witness warnings related to the > vnode lock site. > > I believe that the perfect solution would be a mechanism that could > teach witness about and enforce these specific relationships. > However, that may be computationally prohibitive and too complex to > code. A more reasonable option would be to bless the specific > relationships at the specific call sites. Turning all witness off at > particular sites or with particular types renders important > infrastructure useless for very large functional areas. It's also > important to distinguish between squelching the error message from > eliminating the other state that is saved at lock sites. > > We already have lock names and types. What I would propose we do is > make the type 'vnode' for all vnodes and 'buf' for all bufs with the > names used for the specific filesystems. Then you could specify a > DUPOK that automatically blesses any filesystem to filesystem related > LORs. In this way witness still records the call sites and unrelated > LORs or panics still have the acquisition information. You could > eventually unwind this to only DUPOK at the specific currently known > places that we anticipate multiple vnodes. > > To solve the buf and other complex LORs involving multiple types you > would make lock variants that accept a blessed list of the specific > locks you are blessing. We already have support for a blessed list in > witness. We just want something like this per-call. > > For example; When acquiring a child vnode lock after a parent lock > when a parent's buf is held I would do something like this: > > vn_lock_flags(vp, LK_EXCLUSIVE, { "vnode", "bufwait", NULL }); > > Written properly dead code elimination will take care of it for > non-debug builds. You could come up with other ways of writing this > that don't involve passing pointers down the stack. For example, > making a unique token for this blessed pair and passing it in the high > bits of the flags. I don't really care about the particular > implementation but I think this is the right model. > > Thanks, > Jeff > >> >> Modified: >> head/sys/conf/options >> head/sys/kern/kern_lock.c >> head/sys/kern/subr_witness.c >> head/sys/kern/vfs_subr.c >> head/sys/sys/lock.h >> head/sys/sys/lockmgr.h >> >> Modified: head/sys/conf/options >> ============================================================================== >> >> --- head/sys/conf/options Thu May 9 16:09:39 2013 (r250410) >> +++ head/sys/conf/options Thu May 9 16:28:18 2013 (r250411) >> @@ -672,6 +672,7 @@ KTR_ENTRIES opt_global.h >> KTR_VERBOSE opt_ktr.h >> WITNESS opt_global.h >> WITNESS_KDB opt_witness.h >> +WITNESS_NO_VNODE opt_witness.h >> WITNESS_SKIPSPIN opt_witness.h >> >> # options for ACPI support >> >> Modified: head/sys/kern/kern_lock.c >> ============================================================================== >> >> --- head/sys/kern/kern_lock.c Thu May 9 16:09:39 2013 (r250410) >> +++ head/sys/kern/kern_lock.c Thu May 9 16:28:18 2013 (r250411) >> @@ -393,6 +393,8 @@ lockinit(struct lock *lk, int pri, const >> iflags |= LO_WITNESS; >> if (flags & LK_QUIET) >> iflags |= LO_QUIET; >> + if (flags & LK_IS_VNODE) >> + iflags |= LO_IS_VNODE; >> iflags |= flags & (LK_ADAPTIVE | LK_NOSHARE); >> >> lk->lk_lock = LK_UNLOCKED; >> >> Modified: head/sys/kern/subr_witness.c >> ============================================================================== >> >> --- head/sys/kern/subr_witness.c Thu May 9 16:09:39 2013 (r250410) >> +++ head/sys/kern/subr_witness.c Thu May 9 16:28:18 2013 (r250411) >> @@ -1289,7 +1289,19 @@ witness_checkorder(struct lock_object *l >> w->w_reversed = w1->w_reversed = 1; >> witness_increment_graph_generation(); >> mtx_unlock_spin(&w_mtx); >> - >> + >> +#ifdef WITNESS_NO_VNODE >> + /* >> + * There are known LORs between VNODE locks. They are >> + * not an indication of a bug. VNODE locks are flagged >> + * as such (LO_IS_VNODE) and we don't yell if the LOR >> + * is between 2 VNODE locks. >> + */ >> + if ((lock->lo_flags & LO_IS_VNODE) != 0 && >> + (lock1->li_lock->lo_flags & LO_IS_VNODE) != 0) >> + return; >> +#endif >> + >> /* >> * Ok, yell about it. >> */ >> >> Modified: head/sys/kern/vfs_subr.c >> ============================================================================== >> >> --- head/sys/kern/vfs_subr.c Thu May 9 16:09:39 2013 (r250410) >> +++ head/sys/kern/vfs_subr.c Thu May 9 16:28:18 2013 (r250411) >> @@ -1037,7 +1037,7 @@ alloc: >> * By default, don't allow shared locks unless filesystems >> * opt-in. >> */ >> - lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE); >> + lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE | >> LK_IS_VNODE); >> /* >> * Initialize bufobj. >> */ >> >> Modified: head/sys/sys/lock.h >> ============================================================================== >> >> --- head/sys/sys/lock.h Thu May 9 16:09:39 2013 (r250410) >> +++ head/sys/sys/lock.h Thu May 9 16:28:18 2013 (r250411) >> @@ -79,6 +79,7 @@ struct lock_class { >> #define LO_SLEEPABLE 0x00100000 /* Lock may be held while >> sleeping. */ >> #define LO_UPGRADABLE 0x00200000 /* Lock may be >> upgraded/downgraded. */ >> #define LO_DUPOK 0x00400000 /* Don't check for duplicate >> acquires */ >> +#define LO_IS_VNODE 0x00800000 /* Tell WITNESS about a >> VNODE lock */ >> #define LO_CLASSMASK 0x0f000000 /* Class index bitmask. */ >> #define LO_NOPROFILE 0x10000000 /* Don't profile this lock */ >> >> >> Modified: head/sys/sys/lockmgr.h >> ============================================================================== >> >> --- head/sys/sys/lockmgr.h Thu May 9 16:09:39 2013 (r250410) >> +++ head/sys/sys/lockmgr.h Thu May 9 16:28:18 2013 (r250411) >> @@ -146,6 +146,7 @@ _lockmgr_args_rw(struct lock *lk, u_int >> #define LK_NOWITNESS 0x000010 >> #define LK_QUIET 0x000020 >> #define LK_ADAPTIVE 0x000040 >> +#define LK_IS_VNODE 0x000080 /* Tell WITNESS about a VNODE >> lock */ >> >> /* >> * Additional attributes to be used in lockmgr(). >> >