Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Jun 2013 17:48:59 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        Adrian Chadd <adrian@freebsd.org>, hackers@freebsd.org
Subject:   Re: b_freelist TAILQ/SLIST
Message-ID:  <20130629144859.GB91021@kib.kiev.ua>
In-Reply-To: <51CE8763.2090406@FreeBSD.org>
References:  <51CCAE14.6040504@FreeBSD.org> <20130628065732.GL91021@kib.kiev.ua> <51CE0AF7.6090906@FreeBSD.org> <20130629023532.GW91021@kib.kiev.ua> <51CE8763.2090406@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--mUmOsk7ZE69Fau6J
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Jun 29, 2013 at 10:06:11AM +0300, Alexander Motin wrote:
> I understand that lock attempt will steal cache line from lock owner.=20
> What I don't very understand is why avoiding it helps performance in=20
> this case. Indeed, having mutex on own cache line will not let other=20
> cores to steal also bswlist, but it also means that bswlist should be=20
> prefetched separately (and profiling shows resource stalls there). Or in=
=20
> this case separate speculative prefetch will be better then forced one=20
> which could be stolen? Is there cases when it is not, or the only reason=
=20
> to not pad all global mutexes is only saving memory?

I can speculate that it is the case when speculative execution helps.
If mutex and list head are on the different cache lines, then cpu
could speculatively read the head, and then prove that executing the
read before the lock acquisition does not break the ordering rules
(because lock protects the head, other core indeed cannot modify
the head if the lock acquisition was successfull).

I think it is very similar reason why locked instructions as barriers
are faster then the explicit barriers, cpu could still do the speculative
execution after the lock prefix if the ordering is provable consistent.

Please see the Intel IA32 architecture optimization manual 8.4.5 for
the recommendations (but not much explanation).

Yes, I think putting all locks on dedicated cache lines is the waste,
only hot locks need this.

--mUmOsk7ZE69Fau6J
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJRzvPbAAoJEJDCuSvBvK1BpdIQAKj4KU/KFAPjIGQWMBi0Mmqs
CoItYYEC+okQgZSFpZn6KFWlZmLxb9fB8S0hPL6ytKwdi6XAwNnVdhSuNrITDYZQ
10dylktNHkpeS9/OxmEmxIPe9kvPxlbhd+ffBUHiQqFpbzYgpVJbTVed9ClwrPxI
Zp+1pWDugRYnGzzrNz8B4DsD2EkxzlxVG+6bN4Gs/0Hk9FZ2dpbZ0cosESmd8vT2
Jtl2/Mc56pJ4HXOM65Pe3gUwx8Yo1Mj9XQrmC9FroI9iuJL987QhK5aN66r1A+x/
Yhh0koj8+cQ0Vzi6BKHRfVkrd9PU8mV7JKXvsAuaZfQFjYmMpZeNK9WYt3et4Xhy
+YO6Cqvy09mJs2JhlsCbpdk/Ytl+BryhjI5WdSMObtw4nuYGOipwIX4xkmNSrn61
IbhKWTnrhrsx5deeARUPQ1Bb9zn5QuBEaXOWO4d+w0yJDFDMTMosj4FlhRRsk3b9
NmdPPOukESP5CgZcgvGvsBn7mCcZlXucoZBwcubOvM6JXBWc3S2DJuA+C39Fs3CN
ZVLaQZqna3AwFZCXGhcWFQclsIHlrbYbVgndfsXs8mT2wq15bz2rcdX5A1AL1nAW
z66mKxC28ZksEbEC0eZv+IWGVcXnxiOGK2XHXLVpzKq3FEoKG/13QfVRgTOuSvhq
PtncUTjUY0V1W/fCJApG
=vB/a
-----END PGP SIGNATURE-----

--mUmOsk7ZE69Fau6J--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130629144859.GB91021>