From owner-freebsd-current@FreeBSD.ORG  Sun May 26 22:22:56 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 5676FDBA;
 Sun, 26 May 2013 22:22:56 +0000 (UTC) (envelope-from jilles@stack.nl)
Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104])
 by mx1.freebsd.org (Postfix) with ESMTP id 17B5EF8F;
 Sun, 26 May 2013 22:22:56 +0000 (UTC)
Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131])
 by mx1.stack.nl (Postfix) with ESMTP id 8A225359319;
 Mon, 27 May 2013 00:22:54 +0200 (CEST)
Received: by snail.stack.nl (Postfix, from userid 1677)
 id 6AEF728493; Mon, 27 May 2013 00:22:54 +0200 (CEST)
Date: Mon, 27 May 2013 00:22:54 +0200
From: Jilles Tjoelker <jilles@stack.nl>
To: Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>
Subject: Re: FreeBSD-HEAD gets stuck on vnode operations
Message-ID: <20130526222254.GB40375@stack.nl>
References: <5190CBEC.5000704@citrix.com> <20130514163149.GS3047@kib.kiev.ua>
 <51927143.4080102@citrix.com> <201305201434.55406.jhb@freebsd.org>
 <51A0FA43.2040503@citrix.com> <51A26245.9060707@citrix.com>
 <20130526202058.GA40375@stack.nl> <51A275F7.9030401@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <51A275F7.9030401@citrix.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-current@freebsd.org,
 "current@freebsd.org" <current@freebsd.org>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 May 2013 22:22:56 -0000

On Sun, May 26, 2013 at 10:52:07PM +0200, Roger Pau Monné wrote:
> On 26/05/13 22:20, Jilles Tjoelker wrote:
> > Instead of a pause() that may be too short or too long, how about
> > waiting for the necessary lock? In other words, replace the kern_yield()
> > call with VI_LOCK(vp); VI_UNLOCK(vp);. This is also the usual approach
> > to acquire two locks without imposing an order between them.

> Since there might be more than one locked vnode, waiting on a specific
> locked vnode seemed rather arbitrary, but I agree that the pause is also
> rather arbitrary.

> Also, can we be sure that the v_interlock mutex will not be destroyed
> while the syncer process is waiting for it to be unlocked?

I think this is a major problem. My idea was too easy and will not work.

That said, the code in mnt_vnode_next_active() appears to implement some
sort of adaptive spinning for SMP. It tries VI_TRYLOCK for 200ms
(default value of hogticks) and then yields. This is far too long for a
mutex lock and if it takes that long it means that either the thread
owning the lock is blocked by us somehow or someone is abusing a mutex
to work like a sleepable lock such as by spinning or DELAY.

Given that it has been spinning for 200ms, it is not so bad to pause for
one additional microsecond.

The adaptive spinning was added fairly recently, so apparently it
happens fairly frequently that VI_TRYLOCK fails transiently.
Unfortunately, the real adaptive spinning code cannot be used because it
will spin forever as long as the thread owning v_interlock is running,
including when that is because it is spinning for vnode_free_list_mtx.
Perhaps we can try to VI_TRYLOCK a certain number of times. It is also
possible to check the contested bit of vnode_free_list_mtx
(sys/netgraph/netflow/netflow.c does something similar) and stop
spinning in that case.

A cpu_spinwait() invocation should also be added to the spin loop.

-- 
Jilles Tjoelker