Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 May 2013 11:51:20 -0700
From:      Marcel Moolenaar <marcel@xcllnt.net>
To:        John Baldwin <jhb@freebsd.org>
Cc:        attilio@freebsd.org, svn-src-head@freebsd.org, svn-src-all@freebsd.org, Marcel Moolenaar <marcel@freebsd.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r250411 - in head/sys: conf kern sys
Message-ID:  <6CBEB766-087B-41F4-B549-2D60F4FD2701@xcllnt.net>
In-Reply-To: <201305101211.35808.jhb@freebsd.org>
References:  <201305091628.r49GSI33039873@svn.freebsd.org> <201305100952.45101.jhb@freebsd.org> <405C7C78-A626-4836-BD90-16FD08DD3196@xcllnt.net> <201305101211.35808.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On May 10, 2013, at 9:11 AM, John Baldwin <jhb@freebsd.org> wrote:

> On Friday, May 10, 2013 11:46:54 am Marcel Moolenaar wrote:
>>> 
>>> 2) vnode locks from a local filesystem that report a LOR with a "devfs"
>>>  vnode.  Typical reports are either "ufs" -> "devfs" or in some cases
>>>  "ufs" -> "devfs" -> "ufs".  As with 1), I would much rather tag the
>>>  offending location than to disable all WITNESS checking on vnode locks.
>> 
>> With more file system types in use, this will get mixed up with the
>> other file systems and noise you get is rather severe. It is a big
>> problem for us at Juniper.
> 
> Note, it is very specific that the second lock is always "devfs".  I think
> that points to this being isolated to a few specific places, not a generic
> ordering problem.

Alas, that's not the case. These LORs are reported between ufs and unionfs,
or ufs and isofs, etc. It's not just between something and devfs.


> /*
> * Locking:
> *
> * The relationship between inode and dirhash is protected either by an
> * exclusive vnode lock or the vnode interlock where a shared vnode lock
> * may be used.  The dirhash_mtx is acquired after the dirhash lock.  To
> * handle teardown races, code wishing to lock the dirhash for an inode
> * when using a shared vnode lock must obtain a private reference on the
> * dirhash while holding the vnode interlock.  They can drop it once they
> * have obtained the dirhash lock and verified that the dirhash wasn't
> * recycled while they waited for the dirhash lock.
> *
> * ufsdirhash_build() acquires a shared lock on the dirhash when it is
> * successful.  This lock is released after a call to ufsdirhash_lookup().
> *
> * Functions requiring exclusive access use ufsdirhash_acquire() which may
> * free a dirhash structure that was recycled by ufsdirhash_recycle().
> *
> * The dirhash lock may be held across io operations.
> *
> * WITNESS reports a lock order reversal between the "bufwait" lock
> * and the "dirhash" lock.  However, this specific reversal will not
> * cause a deadlock.  To get a deadlock, one would have to lock a
> * buffer followed by the dirhash while a second thread locked a
> * buffer while holding the dirhash lock.  The second order can happen
> * under a shared or exclusive vnode lock for the associated directory
> * in lookup().  The first order, however, can only happen under an
> * exclusive vnode lock (e.g. unlink(), rename(), etc.).  Thus, for
> * a thread to be doing a "bufwait" -> "dirhash" order, it has to hold
> * an exclusive vnode lock.  That exclusive vnode lock will prevent
> * any other threads from doing a "dirhash" -> "bufwait" order.
> */
> 
> So the issue here is that shared locks and exclusive locks are not quite
> the same.  Perhaps this is something that WITNESS could "know" generically,
> and one solution might be to use separate witness "objects" for shared
> acquires on a lock vs exclusive acquires on a lock, but I'm not sure
> that wouldn't result in false negatives (i.e. WITNESS not reporting a
> LOR that can deadlock).  My intuition is that the general rule would not
> be safe, so I would rather annotate the specific false positive in this
> case to remove the noise of a known-false-positive.

I'm not sure the only options we have are to ignore the problem
or implement a general fix. If we set out to silence witness for
the known false positives then it's ok to handle them on a case
by case basis. We'll see patterns soon enough and then re-code
the solutions in terms of those patterns. If we're lucky we see
a single general solution, but if not, then it's fine to have a
handful of special case. The worse we can do is not address it
at all.

My commit was met with a statement that comes really close to
"you made witness useless", after which the argument changed to
we can't make witness better. At this time my change seems like
the best of all possible suggestions.

> rules to the duplicate check currently).  Vnode locks are another example
> of this actually.  You can lock a child of a directory while holding the
> vnode lock of the parent directory but not vice versa.  That particular
> rule would be very hard to verify in WITNESS directly (given two vnode
> lock pointers, could you "prove" that one is a parent directory of the
> other non-trivially in witness_checkorder()?  I seriously doubt it), so we
> accept some false-negatives to avoid false-positive LORs on every path
> lookup.  Thus, in practice I think your goal of having WITNESS understand
> every possible nuance is a bit unrealistic and not consistent with our
> use of it historically.

I'm not sure where you get the notion that my goal is to have witness
understand every possible nuance. My commit certainly is not achieving
such a goal. In fact, it's accepting the imperfections we have by not
have those imperfections render to tool unusable.

All I want is for witness to be silent about false positives or known
uncertainties so that we can establish clear processes in case witness
does warn. The extend to which we can be successful is not known to me,
but I do know that keeping silent for LORs between vnode locks is good
enough for me now and useful to FreeBSD users in general.

If a vnode lock is an example of class of locking paradigms, then we
just need to rename the flags I committed and have it naturally cover
the entire class.

If we can add complexity to witness to handle some nuances then it's
good to have it available with a kernel option. People writing file
systems are helped by having vnode locks checked properly and as
completely as possible (i.e. with all possible nuances if such is
doable). This is not to say that such is a mode of operation then
must be used or work as a default.

In short: we need more flexibility in general -- at the minimum in
what witness reports or keeps quiet about, but ideally we need more
flexibility in the amount of checking we allow witness to do for
classes or kinds of locks. Such implies providing it with more of
the nuances in some cases and it also implies that it'll be more
costly in general. That's all fine. We just need a good default that
works for most people and kernel options to allow people to tune it
further to their needs.

-- 
Marcel Moolenaar
marcel@xcllnt.net





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6CBEB766-087B-41F4-B549-2D60F4FD2701>