From owner-freebsd-current@FreeBSD.ORG Mon May 27 08:19:58 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 870D0AFD; Mon, 27 May 2013 08:19:58 +0000 (UTC) (envelope-from roger.pau@citrix.com) Received: from SMTP.EU.CITRIX.COM (smtp.eu.citrix.com [46.33.159.39]) by mx1.freebsd.org (Postfix) with ESMTP id 01991BD5; Mon, 27 May 2013 08:19:57 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.87,749,1363132800"; d="scan'208";a="5004764" Received: from lonpex01cl03.citrite.net ([10.30.203.103]) by LONPIPO01.EU.CITRIX.COM with ESMTP/TLS/AES128-SHA; 27 May 2013 08:19:53 +0000 Received: from [192.168.1.30] (10.30.203.1) by LONPEX01CL03.citrite.net (10.30.203.103) with Microsoft SMTP Server id 14.2.342.3; Mon, 27 May 2013 09:19:52 +0100 Message-ID: <51A31727.9040909@citrix.com> Date: Mon, 27 May 2013 10:19:51 +0200 From: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: FreeBSD-HEAD gets stuck on vnode operations References: <5190CBEC.5000704@citrix.com> <20130514163149.GS3047@kib.kiev.ua> <51927143.4080102@citrix.com> <201305201434.55406.jhb@freebsd.org> <51A0FA43.2040503@citrix.com> <51A26245.9060707@citrix.com> <20130526202058.GA40375@stack.nl> <51A275F7.9030401@citrix.com> <20130526222254.GB40375@stack.nl> <20130527060727.GW3047@kib.kiev.ua> In-Reply-To: <20130527060727.GW3047@kib.kiev.ua> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.30.203.1] Cc: freebsd-current@freebsd.org, "current@freebsd.org" , Jilles Tjoelker X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 May 2013 08:19:58 -0000 On 27/05/13 08:07, Konstantin Belousov wrote: > On Mon, May 27, 2013 at 12:22:54AM +0200, Jilles Tjoelker wrote: >> On Sun, May 26, 2013 at 10:52:07PM +0200, Roger Pau Monn? wrote: >>> On 26/05/13 22:20, Jilles Tjoelker wrote: >>>> Instead of a pause() that may be too short or too long, how about >>>> waiting for the necessary lock? In other words, replace the kern_yield() >>>> call with VI_LOCK(vp); VI_UNLOCK(vp);. This is also the usual approach >>>> to acquire two locks without imposing an order between them. >> >>> Since there might be more than one locked vnode, waiting on a specific >>> locked vnode seemed rather arbitrary, but I agree that the pause is also >>> rather arbitrary. >> >>> Also, can we be sure that the v_interlock mutex will not be destroyed >>> while the syncer process is waiting for it to be unlocked? >> >> I think this is a major problem. My idea was too easy and will not work. >> >> That said, the code in mnt_vnode_next_active() appears to implement some >> sort of adaptive spinning for SMP. It tries VI_TRYLOCK for 200ms >> (default value of hogticks) and then yields. This is far too long for a >> mutex lock and if it takes that long it means that either the thread >> owning the lock is blocked by us somehow or someone is abusing a mutex >> to work like a sleepable lock such as by spinning or DELAY. >> >> Given that it has been spinning for 200ms, it is not so bad to pause for >> one additional microsecond. >> >> The adaptive spinning was added fairly recently, so apparently it >> happens fairly frequently that VI_TRYLOCK fails transiently. >> Unfortunately, the real adaptive spinning code cannot be used because it >> will spin forever as long as the thread owning v_interlock is running, >> including when that is because it is spinning for vnode_free_list_mtx. >> Perhaps we can try to VI_TRYLOCK a certain number of times. It is also >> possible to check the contested bit of vnode_free_list_mtx >> (sys/netgraph/netflow/netflow.c does something similar) and stop >> spinning in that case. >> >> A cpu_spinwait() invocation should also be added to the spin loop. > > There are two 'proper' solutions for this issue: > > One is to change the handling of the vnode lifecycle to allow the > safe block for the vnode interlock acquisition. In particular, the > change would add some stability of the vnode memory when vnode is > put on the free list. As example, the vnode zone could be marked as > type-stable again, and then the vnode interlock can be obtained with > dropped free list lock. Arguably, marking the zone as non-freeable would > be a regression, esp. for the zone which accounts for largest allocation > on the kernel memory. > > Another one is to somehow ensure that the priority is properly > propagated from the spinning thread to the vnode interlock owner. > I think that it is desirable to donate some amount of priority > from the spinning thread. Unfortunately, I was unable to come > up with elegant solution for this which would be also contained > and did not require rewamp of the mutex interfaces. > > BTW, if anybody come up with the idea of the restructuring the free list > handling to avoid the free list/vnode interlock LOR altogether, it would > be the best. > > I do not have objections against the pause() addition, but I would > argue that should_yield() should be removed then, switching the code to > unconditionally pause when the collision detected. Taking the idea from Jilles, what about replacing should_yield with a check to see if the vnode_free_list_mtx mutex is contented? That would prevent us from doing unnecessary pauses, and only releasing the vnode_free_list_mtx mutex when there's someone else that actually needs it.