FreeBSD Mail Archives

Date:      Mon, 27 May 2013 09:07:27 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Jilles Tjoelker <jilles@stack.nl>
Cc:        freebsd-current@freebsd.org, "current@freebsd.org" <current@freebsd.org>, Roger Pau Monn? <roger.pau@citrix.com>
Subject:   Re: FreeBSD-HEAD gets stuck on vnode operations
Message-ID:  <20130527060727.GW3047@kib.kiev.ua>
In-Reply-To: <20130526222254.GB40375@stack.nl>
References:  <5190CBEC.5000704@citrix.com> <20130514163149.GS3047@kib.kiev.ua> <51927143.4080102@citrix.com> <201305201434.55406.jhb@freebsd.org> <51A0FA43.2040503@citrix.com> <51A26245.9060707@citrix.com> <20130526202058.GA40375@stack.nl> <51A275F7.9030401@citrix.com> <20130526222254.GB40375@stack.nl>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Mon, May 27, 2013 at 12:22:54AM +0200, Jilles Tjoelker wrote:
> On Sun, May 26, 2013 at 10:52:07PM +0200, Roger Pau Monn? wrote:
> > On 26/05/13 22:20, Jilles Tjoelker wrote:
> > > Instead of a pause() that may be too short or too long, how about
> > > waiting for the necessary lock? In other words, replace the kern_yield()
> > > call with VI_LOCK(vp); VI_UNLOCK(vp);. This is also the usual approach
> > > to acquire two locks without imposing an order between them.
> 
> > Since there might be more than one locked vnode, waiting on a specific
> > locked vnode seemed rather arbitrary, but I agree that the pause is also
> > rather arbitrary.
> 
> > Also, can we be sure that the v_interlock mutex will not be destroyed
> > while the syncer process is waiting for it to be unlocked?
> 
> I think this is a major problem. My idea was too easy and will not work.
> 
> That said, the code in mnt_vnode_next_active() appears to implement some
> sort of adaptive spinning for SMP. It tries VI_TRYLOCK for 200ms
> (default value of hogticks) and then yields. This is far too long for a
> mutex lock and if it takes that long it means that either the thread
> owning the lock is blocked by us somehow or someone is abusing a mutex
> to work like a sleepable lock such as by spinning or DELAY.
> 
> Given that it has been spinning for 200ms, it is not so bad to pause for
> one additional microsecond.
> 
> The adaptive spinning was added fairly recently, so apparently it
> happens fairly frequently that VI_TRYLOCK fails transiently.
> Unfortunately, the real adaptive spinning code cannot be used because it
> will spin forever as long as the thread owning v_interlock is running,
> including when that is because it is spinning for vnode_free_list_mtx.
> Perhaps we can try to VI_TRYLOCK a certain number of times. It is also
> possible to check the contested bit of vnode_free_list_mtx
> (sys/netgraph/netflow/netflow.c does something similar) and stop
> spinning in that case.
> 
> A cpu_spinwait() invocation should also be added to the spin loop.

There are two 'proper' solutions for this issue:

One is to change the handling of the vnode lifecycle to allow the
safe block for the vnode interlock acquisition. In particular, the
change would add some stability of the vnode memory when vnode is
put on the free list. As example, the vnode zone could be marked as
type-stable again, and then the vnode interlock can be obtained with
dropped free list lock. Arguably, marking the zone as non-freeable would
be a regression, esp. for the zone which accounts for largest allocation
on the kernel memory.

Another one is to somehow ensure that the priority is properly
propagated from the spinning thread to the vnode interlock owner.
I think that it is desirable to donate some amount of priority
from the spinning thread.  Unfortunately, I was unable to come
up with elegant solution for this which would be also contained
and did not require rewamp of the mutex interfaces.

BTW, if anybody come up with the idea of the restructuring the free list
handling to avoid the free list/vnode interlock LOR altogether, it would
be the best.

I do not have objections against the pause() addition, but I would
argue that should_yield() should be removed then, switching the code to
unconditionally pause when the collision detected.

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJRovgfAAoJEJDCuSvBvK1BmNIP/je3R6P5ySJnhK1RdeHvYCuV
GElJUmlP7IC8b+PLK3Hr+FcjWITFPIpGsjDR4znGJ1tgrbBlEA41BbZbZYHMhBG2
2VrK4yJmFZVSV2SAMjaFEORqz8vbI9fJrt7SUSbYFUtPudM3Iyt6KtBjuEfR5oWA
t2NYpN8T5Kxp027kKr5Yy335bDhLYSOSQ0OoFdJjcCle4ZHjSfKzPYNW/NJ6QVVE
m4VXvHXSzWznKwl2M9lgvVb6PLFgwkoNEYqNb1qVgtrGSehJAdfBfm/1XoJbnt7F
Jm7jOKBNfnttY26DH3gJLFN02dw6G0BLwJ3Krf/a6xmUxtNAWOMxNHd4/u33yqD+
oxM3QQh/Sbb4GwRpRYxsh97qZ7vuGLuhQ5OwneJ1colRobktHkfhvxM12VGRf/iX
+s06qmBIeRdpjv78SvCGFdpQGUuaKQH0w3XCnEZ5uA9h5WFCptB888EKGpoPRnCO
R8cjgZ3FiJTmEpHRPhWDZdAvxoEiNaRVPJuoPYKR3UnCJ+QJMLkC4xgQflzZAs94
WjwBKc4JVbz4AV4YDjYkkWok5Tz2O9ZKP8vxJv1UAEhqAu+mzLcr3nbdhdOUoIw3
uKh5z6AwpCa+TPju2wm5mZPpjt26jb94sKpL0yGk74xTFZNAzjh6AwUY/rvAxLP4
+wdJmMMp5HXNhOQnaada
=B3in
-----END PGP SIGNATURE-----

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130527060727.GW3047>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation