From owner-freebsd-current@FreeBSD.ORG Sun May 26 20:21:23 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7ABD8788; Sun, 26 May 2013 20:21:23 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (unknown [IPv6:2001:610:1108:5012::107]) by mx1.freebsd.org (Postfix) with ESMTP id F142ABF1; Sun, 26 May 2013 20:21:22 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id C6D3512013C; Sun, 26 May 2013 22:21:01 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 9EC1B28493; Sun, 26 May 2013 22:20:58 +0200 (CEST) Date: Sun, 26 May 2013 22:20:58 +0200 From: Jilles Tjoelker To: Roger Pau =?iso-8859-1?Q?Monn=E9?= Subject: Re: FreeBSD-HEAD gets stuck on vnode operations Message-ID: <20130526202058.GA40375@stack.nl> References: <5190CBEC.5000704@citrix.com> <20130514163149.GS3047@kib.kiev.ua> <51927143.4080102@citrix.com> <201305201434.55406.jhb@freebsd.org> <51A0FA43.2040503@citrix.com> <51A26245.9060707@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <51A26245.9060707@citrix.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Konstantin Belousov , freebsd-current@freebsd.org, "current@freebsd.org" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 May 2013 20:21:23 -0000 On Sun, May 26, 2013 at 09:28:05PM +0200, Roger Pau Monné wrote: > On 25/05/13 19:52, Roger Pau Monné wrote: > > On 20/05/13 20:34, John Baldwin wrote: > >> On Tuesday, May 14, 2013 1:15:47 pm Roger Pau Monné wrote: > >>> On 14/05/13 18:31, Konstantin Belousov wrote: > >>>> On Tue, May 14, 2013 at 06:08:45PM +0200, Roger Pau Monn? wrote: > >>>>> On 13/05/13 17:00, Konstantin Belousov wrote: > >>>>>> On Mon, May 13, 2013 at 04:33:04PM +0200, Roger Pau Monn? wrote: > >>>>>>> On 13/05/13 13:18, Roger Pau Monn? wrote: > >>>>> Thanks for taking a look, > >>>>>>> I would like to explain this a little bit more, the syncer process > >>>>>>> doesn't get blocked on the _mtx_trylock_flags_ call, it just continues > >>>>>>> looping forever in what seems to be an endless loop around > >>>>>>> mnt_vnode_next_active/ffs_sync. Also while in this state there is no > >>>>>>> noticeable disk activity, so I'm unsure of what is happening. > >>>>>> How many CPUs does your VM have ? > >>>>> 7 vCPUs, but I've also seen this issue with 4 and 16 vCPUs. > >>>>>> The loop you describing means that other thread owns the vnode > >>>>>> interlock. Can you track what this thread does ? E.g. look at the > >>>>>> vp->v_interlock.mtx_lock, which is basically a pointer to the struct > >>>>>> thread owning the mutex, clear low bits as needed. Then you can > >>>>>> inspect the thread and get a backtrace. > >>>>> There are no other threads running, only syncer is running on CPU 1 (see > >>>>> ps in previous email). All other CPUs are idle, and as seen from the ps > >>>>> quite a lot of threads are blocked in vnode related operations, either > >>>>> "*Name Cac", "*vnode_fr" or "*vnode in". I've also attached the output > >>>>> of alllocks in the previous email. > >>>> This is not useful. You need to look at the mutex which fails the > >>>> trylock operation in the mnt_vnode_next_active(), see who owns it, > >>>> and then 'unwind' the locking dependencies from there. > >>> Sorry, now I get it, let's see if I can find the locked vnodes and the > >>> thread that owns them... > >> You can use 'show lock
v_interlock>' to find an owning > >> thread and then use 'show sleepchain '. If you are using kgdb on the > >> live system (probably easier) then you can grab my scripts at > >> www.freebsd.org/~jhb/gdb/ (do 'cd /path/to/scripts; source gdb6'). You can > >> then find the offending thread and do 'mtx_owner &vp->v_interlock' and then > >> 'sleepchain ' > I've been looking into this issue a little bit more, and the lock > dependencies look right to me, the lockup happens when the thread owning > the v_interlock mutex tries to acquire the vnode_free_list_mtx mutex > which is already owned by the syncer thread, at this point, the thread > owning the v_interlock mutex goes to sleep, and the syncer process will > start doing a sequence of: > VI_TRYLOCK -> mtx_unlock vnode_free_list_mtx -> kern_yield -> mtx_lock > vnode_free_list_mtx ... > It seems like kern_yield, which I assume is placed there in order to > allow the thread owning v_interlock to be able to also lock > vnode_free_list_mtx, doesn't get a window big enough to wake up the > waiting thread and get the vnode_free_list_mtx mutex. Since the syncer > is the only process runnable on the CPU there is no context switch, and > the syncer process continues to run. > Relying on kern_yield to provide a window big enough that allows any > other thread waiting on vnode_free_list_mtx to run doesn't seem like a > good idea on SMP systems. I've not tested this on bare metal, but waking > up an idle CPU in a virtualized environment might be more expensive than > doing it on bare metal. > Bear in mind that I'm not familiar with either the scheduler or the ufs > code, my proposed naive fix is to replace the kern_yield call with a > pause, that will allow any other threads waiting on vnode_free_list_mtx > to lock the vnode_free_list_mtx mutex and finish whatever they are doing > and release the v_interlock mutex, so the syncer thread can also finish > it's work. I've tested the patch for a couple of hours and seems to be > fine, I haven't been able to reproduce the issue anymore. Instead of a pause() that may be too short or too long, how about waiting for the necessary lock? In other words, replace the kern_yield() call with VI_LOCK(vp); VI_UNLOCK(vp);. This is also the usual approach to acquire two locks without imposing an order between them. I expect blocking on a mutex to be safe enough; a mutex may not be held across waiting for hardware or other events. > >From fec90f7bb9cdf05b49d11dbe4930d3c595c147f5 Mon Sep 17 00:00:00 2001 > From: Roger Pau Monne > Date: Sun, 26 May 2013 19:55:43 +0200 > Subject: [PATCH] mnt_vnode_next_active: replace kern_yield with pause > > On SMP systems there is no way to assure that a kern_yield will allow > any other threads waiting on the vnode_free_list_mtx to be able to > acquire it. The syncer process can get stuck in a loop trying to lock > the v_interlock mutex, without allowing other threads waiting on > vnode_free_list_mtx to run. Replace the kern_yield with a pause, that > should allow any thread owning v_interlock and waiting on > vnode_free_list_mtx to finish it's work and release v_interlock. > --- > sys/kern/vfs_subr.c | 10 +++++++++- > 1 files changed, 9 insertions(+), 1 deletions(-) > > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > index 0da6764..597f4b7 100644 > --- a/sys/kern/vfs_subr.c > +++ b/sys/kern/vfs_subr.c > @@ -4703,7 +4703,15 @@ restart: > if (mp_ncpus == 1 || should_yield()) { > TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist); > mtx_unlock(&vnode_free_list_mtx); > - kern_yield(PRI_USER); > + /* > + * There is another thread owning the > + * v_interlock mutex and possibly waiting on > + * vnode_free_list_mtx, so pause in order for > + * it to acquire the vnode_free_list_mtx > + * mutex and finish the work, releasing > + * v_interlock when finished. > + */ > + pause("vi_lock", 1); > mtx_lock(&vnode_free_list_mtx); > goto restart; > } > -- > 1.7.7.5 (Apple Git-26) -- Jilles Tjoelker