Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 May 2013 22:20:58 +0200
From:      Jilles Tjoelker <jilles@stack.nl>
To:        Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-current@freebsd.org, "current@freebsd.org" <current@freebsd.org>
Subject:   Re: FreeBSD-HEAD gets stuck on vnode operations
Message-ID:  <20130526202058.GA40375@stack.nl>
In-Reply-To: <51A26245.9060707@citrix.com>
References:  <5190CBEC.5000704@citrix.com> <20130514163149.GS3047@kib.kiev.ua> <51927143.4080102@citrix.com> <201305201434.55406.jhb@freebsd.org> <51A0FA43.2040503@citrix.com> <51A26245.9060707@citrix.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, May 26, 2013 at 09:28:05PM +0200, Roger Pau Monné wrote:
> On 25/05/13 19:52, Roger Pau Monné wrote:
> > On 20/05/13 20:34, John Baldwin wrote:
> >> On Tuesday, May 14, 2013 1:15:47 pm Roger Pau Monné wrote:
> >>> On 14/05/13 18:31, Konstantin Belousov wrote:
> >>>> On Tue, May 14, 2013 at 06:08:45PM +0200, Roger Pau Monn? wrote:
> >>>>> On 13/05/13 17:00, Konstantin Belousov wrote:
> >>>>>> On Mon, May 13, 2013 at 04:33:04PM +0200, Roger Pau Monn? wrote:
> >>>>>>> On 13/05/13 13:18, Roger Pau Monn? wrote:

> >>>>> Thanks for taking a look,

> >>>>>>> I would like to explain this a little bit more, the syncer process
> >>>>>>> doesn't get blocked on the _mtx_trylock_flags_ call, it just continues
> >>>>>>> looping forever in what seems to be an endless loop around
> >>>>>>> mnt_vnode_next_active/ffs_sync. Also while in this state there is no
> >>>>>>> noticeable disk activity, so I'm unsure of what is happening.
> >>>>>> How many CPUs does your VM have ?

> >>>>> 7 vCPUs, but I've also seen this issue with 4 and 16 vCPUs.

> >>>>>> The loop you describing means that other thread owns the vnode
> >>>>>> interlock. Can you track what this thread does ? E.g. look at the
> >>>>>> vp->v_interlock.mtx_lock, which is basically a pointer to the struct
> >>>>>> thread owning the mutex, clear low bits as needed. Then you can
> >>>>>> inspect the thread and get a backtrace.

> >>>>> There are no other threads running, only syncer is running on CPU 1 (see
> >>>>> ps in previous email). All other CPUs are idle, and as seen from the ps
> >>>>> quite a lot of threads are blocked in vnode related operations, either
> >>>>> "*Name Cac", "*vnode_fr" or "*vnode in". I've also attached the output
> >>>>> of alllocks in the previous email.
> >>>> This is not useful.  You need to look at the mutex which fails the
> >>>> trylock operation in the mnt_vnode_next_active(), see who owns it,
> >>>> and then 'unwind' the locking dependencies from there.

> >>> Sorry, now I get it, let's see if I can find the locked vnodes and the
> >>> thread that owns them...

> >> You can use 'show lock <address of vp->v_interlock>' to find an owning
> >> thread and then use 'show sleepchain <thread>'.  If you are using kgdb on the 
> >> live system (probably easier) then you can grab my scripts at 
> >> www.freebsd.org/~jhb/gdb/ (do 'cd /path/to/scripts; source gdb6').  You can 
> >> then find the offending thread and do 'mtx_owner &vp->v_interlock' and then
> >> 'sleepchain <tid>'

> I've been looking into this issue a little bit more, and the lock
> dependencies look right to me, the lockup happens when the thread owning
> the v_interlock mutex tries to acquire the vnode_free_list_mtx mutex
> which is already owned by the syncer thread, at this point, the thread
> owning the v_interlock mutex goes to sleep, and the syncer process will
> start doing a sequence of:

> VI_TRYLOCK -> mtx_unlock vnode_free_list_mtx -> kern_yield -> mtx_lock
> vnode_free_list_mtx ...

> It seems like kern_yield, which I assume is placed there in order to
> allow the thread owning v_interlock to be able to also lock
> vnode_free_list_mtx, doesn't get a window big enough to wake up the
> waiting thread and get the vnode_free_list_mtx mutex. Since the syncer
> is the only process runnable on the CPU there is no context switch, and
> the syncer process continues to run.

> Relying on kern_yield to provide a window big enough that allows any
> other thread waiting on vnode_free_list_mtx to run doesn't seem like a
> good idea on SMP systems. I've not tested this on bare metal, but waking
> up an idle CPU in a virtualized environment might be more expensive than
> doing it on bare metal.

> Bear in mind that I'm not familiar with either the scheduler or the ufs
> code, my proposed naive fix is to replace the kern_yield call with a
> pause, that will allow any other threads waiting on vnode_free_list_mtx
> to lock the vnode_free_list_mtx mutex and finish whatever they are doing
> and release the v_interlock mutex, so the syncer thread can also finish
> it's work. I've tested the patch for a couple of hours and seems to be
> fine, I haven't been able to reproduce the issue anymore.

Instead of a pause() that may be too short or too long, how about
waiting for the necessary lock? In other words, replace the kern_yield()
call with VI_LOCK(vp); VI_UNLOCK(vp);. This is also the usual approach
to acquire two locks without imposing an order between them.

I expect blocking on a mutex to be safe enough; a mutex may not be held
across waiting for hardware or other events.

> >From fec90f7bb9cdf05b49d11dbe4930d3c595c147f5 Mon Sep 17 00:00:00 2001
> From: Roger Pau Monne <roger.pau@citrix.com>
> Date: Sun, 26 May 2013 19:55:43 +0200
> Subject: [PATCH] mnt_vnode_next_active: replace kern_yield with pause
> 
> On SMP systems there is no way to assure that a kern_yield will allow
> any other threads waiting on the vnode_free_list_mtx to be able to
> acquire it. The syncer process can get stuck in a loop trying to lock
> the v_interlock mutex, without allowing other threads waiting on
> vnode_free_list_mtx to run. Replace the kern_yield with a pause, that
> should allow any thread owning v_interlock and waiting on
> vnode_free_list_mtx to finish it's work and release v_interlock.
> ---
>  sys/kern/vfs_subr.c |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)
> 
> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
> index 0da6764..597f4b7 100644
> --- a/sys/kern/vfs_subr.c
> +++ b/sys/kern/vfs_subr.c
> @@ -4703,7 +4703,15 @@ restart:
>  			if (mp_ncpus == 1 || should_yield()) {
>  				TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist);
>  				mtx_unlock(&vnode_free_list_mtx);
> -				kern_yield(PRI_USER);
> +				/*
> +				 * There is another thread owning the
> +				 * v_interlock mutex and possibly waiting on
> +				 * vnode_free_list_mtx, so pause in order for
> +				 * it to acquire the vnode_free_list_mtx
> +				 * mutex and finish the work, releasing
> +				 * v_interlock when finished.
> +				 */
> +				pause("vi_lock", 1);
>  				mtx_lock(&vnode_free_list_mtx);
>  				goto restart;
>  			}
> -- 
> 1.7.7.5 (Apple Git-26)

-- 
Jilles Tjoelker



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130526202058.GA40375>