Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 May 2013 13:23:06 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Roger Pau Monn? <roger.pau@citrix.com>
Cc:        freebsd-current@freebsd.org, "current@freebsd.org" <current@freebsd.org>, Jilles Tjoelker <jilles@stack.nl>
Subject:   Re: FreeBSD-HEAD gets stuck on vnode operations
Message-ID:  <20130527102306.GB3047@kib.kiev.ua>
In-Reply-To: <51A31727.9040909@citrix.com>
References:  <20130514163149.GS3047@kib.kiev.ua> <51927143.4080102@citrix.com> <201305201434.55406.jhb@freebsd.org> <51A0FA43.2040503@citrix.com> <51A26245.9060707@citrix.com> <20130526202058.GA40375@stack.nl> <51A275F7.9030401@citrix.com> <20130526222254.GB40375@stack.nl> <20130527060727.GW3047@kib.kiev.ua> <51A31727.9040909@citrix.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--xcDMtsbEj34k8mAw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, May 27, 2013 at 10:19:51AM +0200, Roger Pau Monn? wrote:
> On 27/05/13 08:07, Konstantin Belousov wrote:
> > On Mon, May 27, 2013 at 12:22:54AM +0200, Jilles Tjoelker wrote:
> >> On Sun, May 26, 2013 at 10:52:07PM +0200, Roger Pau Monn? wrote:
> >>> On 26/05/13 22:20, Jilles Tjoelker wrote:
> >>>> Instead of a pause() that may be too short or too long, how about
> >>>> waiting for the necessary lock? In other words, replace the kern_yie=
ld()
> >>>> call with VI_LOCK(vp); VI_UNLOCK(vp);. This is also the usual approa=
ch
> >>>> to acquire two locks without imposing an order between them.
> >>
> >>> Since there might be more than one locked vnode, waiting on a specific
> >>> locked vnode seemed rather arbitrary, but I agree that the pause is a=
lso
> >>> rather arbitrary.
> >>
> >>> Also, can we be sure that the v_interlock mutex will not be destroyed
> >>> while the syncer process is waiting for it to be unlocked?
> >>
> >> I think this is a major problem. My idea was too easy and will not wor=
k.
> >>
> >> That said, the code in mnt_vnode_next_active() appears to implement so=
me
> >> sort of adaptive spinning for SMP. It tries VI_TRYLOCK for 200ms
> >> (default value of hogticks) and then yields. This is far too long for a
> >> mutex lock and if it takes that long it means that either the thread
> >> owning the lock is blocked by us somehow or someone is abusing a mutex
> >> to work like a sleepable lock such as by spinning or DELAY.
> >>
> >> Given that it has been spinning for 200ms, it is not so bad to pause f=
or
> >> one additional microsecond.
> >>
> >> The adaptive spinning was added fairly recently, so apparently it
> >> happens fairly frequently that VI_TRYLOCK fails transiently.
> >> Unfortunately, the real adaptive spinning code cannot be used because =
it
> >> will spin forever as long as the thread owning v_interlock is running,
> >> including when that is because it is spinning for vnode_free_list_mtx.
> >> Perhaps we can try to VI_TRYLOCK a certain number of times. It is also
> >> possible to check the contested bit of vnode_free_list_mtx
> >> (sys/netgraph/netflow/netflow.c does something similar) and stop
> >> spinning in that case.
> >>
> >> A cpu_spinwait() invocation should also be added to the spin loop.
> >=20
> > There are two 'proper' solutions for this issue:
> >=20
> > One is to change the handling of the vnode lifecycle to allow the
> > safe block for the vnode interlock acquisition. In particular, the
> > change would add some stability of the vnode memory when vnode is
> > put on the free list. As example, the vnode zone could be marked as
> > type-stable again, and then the vnode interlock can be obtained with
> > dropped free list lock. Arguably, marking the zone as non-freeable would
> > be a regression, esp. for the zone which accounts for largest allocation
> > on the kernel memory.
> >=20
> > Another one is to somehow ensure that the priority is properly
> > propagated from the spinning thread to the vnode interlock owner.
> > I think that it is desirable to donate some amount of priority
> > from the spinning thread.  Unfortunately, I was unable to come
> > up with elegant solution for this which would be also contained
> > and did not require rewamp of the mutex interfaces.
> >=20
> > BTW, if anybody come up with the idea of the restructuring the free list
> > handling to avoid the free list/vnode interlock LOR altogether, it would
> > be the best.
> >=20
> > I do not have objections against the pause() addition, but I would
> > argue that should_yield() should be removed then, switching the code to
> > unconditionally pause when the collision detected.
>=20
> Taking the idea from Jilles, what about replacing should_yield with a
> check to see if the vnode_free_list_mtx mutex is contented?
>=20
> That would prevent us from doing unnecessary pauses, and only releasing
> the vnode_free_list_mtx mutex when there's someone else that actually
> needs it.
This would still be racy, and possibly allows the lock convoy in the same
manner as the current code.  Also, AFAIR, the real problem was when
two iterators start synchronized, it usually ended in live-lock.

If you are willing, try this, of course, but I tend to agree with just
an addition of pause() for now.

--xcDMtsbEj34k8mAw
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJRozQJAAoJEJDCuSvBvK1B1mwP/1llpTg+aF+mzgnLt7f+uP4u
QSpSmB6D5tdDnGZWVgS+cMv3oR/SiQsS8yi65gFhskqPJputimSK52sy/Uq/VwJO
lxnnWwrIc72fSmBu/b1kr9V8roXpDgKt6o9/eSvRqk1WNBXSw4zfOXz96t6hfajU
3mAfOIBjnXg/y0umt/ymaKVxLPhie6KHsxIBxaCtP+VTgt2VoSxOJvYx5G2LkCq7
6Lr68qi+aDADRsaZq9xmdqfsHNYUJrW+0ewtmeHHLlV/RTQPx6Zn9wXbBP+uPwRA
1YVz0pNae0+wTQ+twlWPVycz2tZKYrItcw9eIYUKJxGkPZ+RUaAFNympGgUkJD0l
S2aOG33nK9moodCu1ThwgEdndIhsaFyeQnfqa3p2v2SdSZBcfbvzjsijFf3K7ZYL
MXfsx5hxPQ8RRBkSn1864mv5N3BfV/ZyjR0OIsmdEyocBqceYFYXeY8NN6hc5Oor
8L/CK8vOdEEbPtfcsKFyA5s0cN7uICyk9t4T1OhjElorrXU+AXArcoAn16Nhbdqm
Q4FGQQz4eODd2PdFmOOAo+UB68lYDMKBhx+pSz7/AYvw2a9I7rI3k7fnmuh3puqS
I7UIlJidCJknA+ulhiPIe6hnpexjrsA3TYbhvKn1o3BAEt33DlWltkuYyUO37p/Y
pyJ9nFTEcsmZ8mV7JPnF
=WTjv
-----END PGP SIGNATURE-----

--xcDMtsbEj34k8mAw--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130527102306.GB3047>