Date: Fri, 10 May 2013 11:51:20 -0700 From: Marcel Moolenaar <marcel@xcllnt.net> To: John Baldwin <jhb@freebsd.org> Cc: attilio@freebsd.org, svn-src-head@freebsd.org, svn-src-all@freebsd.org, Marcel Moolenaar <marcel@freebsd.org>, src-committers@freebsd.org Subject: Re: svn commit: r250411 - in head/sys: conf kern sys Message-ID: <6CBEB766-087B-41F4-B549-2D60F4FD2701@xcllnt.net> In-Reply-To: <201305101211.35808.jhb@freebsd.org> References: <201305091628.r49GSI33039873@svn.freebsd.org> <201305100952.45101.jhb@freebsd.org> <405C7C78-A626-4836-BD90-16FD08DD3196@xcllnt.net> <201305101211.35808.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On May 10, 2013, at 9:11 AM, John Baldwin <jhb@freebsd.org> wrote: > On Friday, May 10, 2013 11:46:54 am Marcel Moolenaar wrote: >>> >>> 2) vnode locks from a local filesystem that report a LOR with a "devfs" >>> vnode. Typical reports are either "ufs" -> "devfs" or in some cases >>> "ufs" -> "devfs" -> "ufs". As with 1), I would much rather tag the >>> offending location than to disable all WITNESS checking on vnode locks. >> >> With more file system types in use, this will get mixed up with the >> other file systems and noise you get is rather severe. It is a big >> problem for us at Juniper. > > Note, it is very specific that the second lock is always "devfs". I think > that points to this being isolated to a few specific places, not a generic > ordering problem. Alas, that's not the case. These LORs are reported between ufs and unionfs, or ufs and isofs, etc. It's not just between something and devfs. > /* > * Locking: > * > * The relationship between inode and dirhash is protected either by an > * exclusive vnode lock or the vnode interlock where a shared vnode lock > * may be used. The dirhash_mtx is acquired after the dirhash lock. To > * handle teardown races, code wishing to lock the dirhash for an inode > * when using a shared vnode lock must obtain a private reference on the > * dirhash while holding the vnode interlock. They can drop it once they > * have obtained the dirhash lock and verified that the dirhash wasn't > * recycled while they waited for the dirhash lock. > * > * ufsdirhash_build() acquires a shared lock on the dirhash when it is > * successful. This lock is released after a call to ufsdirhash_lookup(). > * > * Functions requiring exclusive access use ufsdirhash_acquire() which may > * free a dirhash structure that was recycled by ufsdirhash_recycle(). > * > * The dirhash lock may be held across io operations. > * > * WITNESS reports a lock order reversal between the "bufwait" lock > * and the "dirhash" lock. However, this specific reversal will not > * cause a deadlock. To get a deadlock, one would have to lock a > * buffer followed by the dirhash while a second thread locked a > * buffer while holding the dirhash lock. The second order can happen > * under a shared or exclusive vnode lock for the associated directory > * in lookup(). The first order, however, can only happen under an > * exclusive vnode lock (e.g. unlink(), rename(), etc.). Thus, for > * a thread to be doing a "bufwait" -> "dirhash" order, it has to hold > * an exclusive vnode lock. That exclusive vnode lock will prevent > * any other threads from doing a "dirhash" -> "bufwait" order. > */ > > So the issue here is that shared locks and exclusive locks are not quite > the same. Perhaps this is something that WITNESS could "know" generically, > and one solution might be to use separate witness "objects" for shared > acquires on a lock vs exclusive acquires on a lock, but I'm not sure > that wouldn't result in false negatives (i.e. WITNESS not reporting a > LOR that can deadlock). My intuition is that the general rule would not > be safe, so I would rather annotate the specific false positive in this > case to remove the noise of a known-false-positive. I'm not sure the only options we have are to ignore the problem or implement a general fix. If we set out to silence witness for the known false positives then it's ok to handle them on a case by case basis. We'll see patterns soon enough and then re-code the solutions in terms of those patterns. If we're lucky we see a single general solution, but if not, then it's fine to have a handful of special case. The worse we can do is not address it at all. My commit was met with a statement that comes really close to "you made witness useless", after which the argument changed to we can't make witness better. At this time my change seems like the best of all possible suggestions. > rules to the duplicate check currently). Vnode locks are another example > of this actually. You can lock a child of a directory while holding the > vnode lock of the parent directory but not vice versa. That particular > rule would be very hard to verify in WITNESS directly (given two vnode > lock pointers, could you "prove" that one is a parent directory of the > other non-trivially in witness_checkorder()? I seriously doubt it), so we > accept some false-negatives to avoid false-positive LORs on every path > lookup. Thus, in practice I think your goal of having WITNESS understand > every possible nuance is a bit unrealistic and not consistent with our > use of it historically. I'm not sure where you get the notion that my goal is to have witness understand every possible nuance. My commit certainly is not achieving such a goal. In fact, it's accepting the imperfections we have by not have those imperfections render to tool unusable. All I want is for witness to be silent about false positives or known uncertainties so that we can establish clear processes in case witness does warn. The extend to which we can be successful is not known to me, but I do know that keeping silent for LORs between vnode locks is good enough for me now and useful to FreeBSD users in general. If a vnode lock is an example of class of locking paradigms, then we just need to rename the flags I committed and have it naturally cover the entire class. If we can add complexity to witness to handle some nuances then it's good to have it available with a kernel option. People writing file systems are helped by having vnode locks checked properly and as completely as possible (i.e. with all possible nuances if such is doable). This is not to say that such is a mode of operation then must be used or work as a default. In short: we need more flexibility in general -- at the minimum in what witness reports or keeps quiet about, but ideally we need more flexibility in the amount of checking we allow witness to do for classes or kinds of locks. Such implies providing it with more of the nuances in some cases and it also implies that it'll be more costly in general. That's all fine. We just need a good default that works for most people and kernel options to allow people to tune it further to their needs. -- Marcel Moolenaar marcel@xcllnt.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6CBEB766-087B-41F4-B549-2D60F4FD2701>