From owner-freebsd-current@FreeBSD.ORG Sun May 26 19:28:19 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3AC1B9DE; Sun, 26 May 2013 19:28:19 +0000 (UTC) (envelope-from roger.pau@citrix.com) Received: from SMTP.EU.CITRIX.COM (smtp.eu.citrix.com [46.33.159.39]) by mx1.freebsd.org (Postfix) with ESMTP id 2B9109B1; Sun, 26 May 2013 19:28:17 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.87,746,1363132800"; d="scan'208,223";a="4997408" Received: from lonpex01cl02.citrite.net ([10.30.203.102]) by LONPIPO01.EU.CITRIX.COM with ESMTP/TLS/AES128-SHA; 26 May 2013 19:28:07 +0000 Received: from [192.168.1.30] (10.30.203.1) by LONPEX01CL02.citrite.net (10.30.203.102) with Microsoft SMTP Server id 14.2.342.3; Sun, 26 May 2013 20:28:06 +0100 Message-ID: <51A26245.9060707@citrix.com> Date: Sun, 26 May 2013 21:28:05 +0200 From: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: John Baldwin Subject: Re: FreeBSD-HEAD gets stuck on vnode operations References: <5190CBEC.5000704@citrix.com> <20130514163149.GS3047@kib.kiev.ua> <51927143.4080102@citrix.com> <201305201434.55406.jhb@freebsd.org> <51A0FA43.2040503@citrix.com> In-Reply-To: <51A0FA43.2040503@citrix.com> Content-Type: multipart/mixed; boundary="------------090606090603060600080202" X-Originating-IP: [10.30.203.1] Cc: Konstantin Belousov , freebsd-current@freebsd.org, "current@freebsd.org" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 May 2013 19:28:19 -0000 --------------090606090603060600080202 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 8bit On 25/05/13 19:52, Roger Pau Monné wrote: > On 20/05/13 20:34, John Baldwin wrote: >> On Tuesday, May 14, 2013 1:15:47 pm Roger Pau Monné wrote: >>> On 14/05/13 18:31, Konstantin Belousov wrote: >>>> On Tue, May 14, 2013 at 06:08:45PM +0200, Roger Pau Monn? wrote: >>>>> On 13/05/13 17:00, Konstantin Belousov wrote: >>>>>> On Mon, May 13, 2013 at 04:33:04PM +0200, Roger Pau Monn? wrote: >>>>>>> On 13/05/13 13:18, Roger Pau Monn? wrote: >>>>> >>>>> Thanks for taking a look, >>>>> >>>>>>> I would like to explain this a little bit more, the syncer process >>>>>>> doesn't get blocked on the _mtx_trylock_flags_ call, it just continues >>>>>>> looping forever in what seems to be an endless loop around >>>>>>> mnt_vnode_next_active/ffs_sync. Also while in this state there is no >>>>>>> noticeable disk activity, so I'm unsure of what is happening. >>>>>> How many CPUs does your VM have ? >>>>> >>>>> 7 vCPUs, but I've also seen this issue with 4 and 16 vCPUs. >>>>> >>>>>> >>>>>> The loop you describing means that other thread owns the vnode >>>>>> interlock. Can you track what this thread does ? E.g. look at the >>>>>> vp->v_interlock.mtx_lock, which is basically a pointer to the struct >>>>>> thread owning the mutex, clear low bits as needed. Then you can >>>>>> inspect the thread and get a backtrace. >>>>> >>>>> There are no other threads running, only syncer is running on CPU 1 (see >>>>> ps in previous email). All other CPUs are idle, and as seen from the ps >>>>> quite a lot of threads are blocked in vnode related operations, either >>>>> "*Name Cac", "*vnode_fr" or "*vnode in". I've also attached the output >>>>> of alllocks in the previous email. >>>> This is not useful. You need to look at the mutex which fails the >>>> trylock operation in the mnt_vnode_next_active(), see who owns it, >>>> and then 'unwind' the locking dependencies from there. >>> >>> Sorry, now I get it, let's see if I can find the locked vnodes and the >>> thread that owns them... >> >> You can use 'show lock
v_interlock>' to find an owning >> thread and then use 'show sleepchain '. If you are using kgdb on the >> live system (probably easier) then you can grab my scripts at >> www.freebsd.org/~jhb/gdb/ (do 'cd /path/to/scripts; source gdb6'). You can >> then find the offending thread and do 'mtx_owner &vp->v_interlock' and then >> 'sleepchain ' Hello, I've been looking into this issue a little bit more, and the lock dependencies look right to me, the lockup happens when the thread owning the v_interlock mutex tries to acquire the vnode_free_list_mtx mutex which is already owned by the syncer thread, at this point, the thread owning the v_interlock mutex goes to sleep, and the syncer process will start doing a sequence of: VI_TRYLOCK -> mtx_unlock vnode_free_list_mtx -> kern_yield -> mtx_lock vnode_free_list_mtx ... It seems like kern_yield, which I assume is placed there in order to allow the thread owning v_interlock to be able to also lock vnode_free_list_mtx, doesn't get a window big enough to wake up the waiting thread and get the vnode_free_list_mtx mutex. Since the syncer is the only process runnable on the CPU there is no context switch, and the syncer process continues to run. Relying on kern_yield to provide a window big enough that allows any other thread waiting on vnode_free_list_mtx to run doesn't seem like a good idea on SMP systems. I've not tested this on bare metal, but waking up an idle CPU in a virtualized environment might be more expensive than doing it on bare metal. Bear in mind that I'm not familiar with either the scheduler or the ufs code, my proposed naive fix is to replace the kern_yield call with a pause, that will allow any other threads waiting on vnode_free_list_mtx to lock the vnode_free_list_mtx mutex and finish whatever they are doing and release the v_interlock mutex, so the syncer thread can also finish it's work. I've tested the patch for a couple of hours and seems to be fine, I haven't been able to reproduce the issue anymore. --------------090606090603060600080202 Content-Type: text/plain; charset="UTF-8"; x-mac-type=0; x-mac-creator=0; name="0001-mnt_vnode_next_active-replace-kern_yield-with-pause.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-mnt_vnode_next_active-replace-kern_yield-with-pause.pat"; filename*1="ch" >From fec90f7bb9cdf05b49d11dbe4930d3c595c147f5 Mon Sep 17 00:00:00 2001 From: Roger Pau Monne Date: Sun, 26 May 2013 19:55:43 +0200 Subject: [PATCH] mnt_vnode_next_active: replace kern_yield with pause On SMP systems there is no way to assure that a kern_yield will allow any other threads waiting on the vnode_free_list_mtx to be able to acquire it. The syncer process can get stuck in a loop trying to lock the v_interlock mutex, without allowing other threads waiting on vnode_free_list_mtx to run. Replace the kern_yield with a pause, that should allow any thread owning v_interlock and waiting on vnode_free_list_mtx to finish it's work and release v_interlock. --- sys/kern/vfs_subr.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 0da6764..597f4b7 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -4703,7 +4703,15 @@ restart: if (mp_ncpus == 1 || should_yield()) { TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist); mtx_unlock(&vnode_free_list_mtx); - kern_yield(PRI_USER); + /* + * There is another thread owning the + * v_interlock mutex and possibly waiting on + * vnode_free_list_mtx, so pause in order for + * it to acquire the vnode_free_list_mtx + * mutex and finish the work, releasing + * v_interlock when finished. + */ + pause("vi_lock", 1); mtx_lock(&vnode_free_list_mtx); goto restart; } -- 1.7.7.5 (Apple Git-26) --------------090606090603060600080202--