Date: Sun, 7 May 2006 17:04:26 -0400 From: Kris Kennaway <kris@obsecurity.org> To: Kris Kennaway <kris@obsecurity.org> Cc: performance@FreeBSD.org, Robert Watson <rwatson@FreeBSD.org>, current@FreeBSD.org Subject: Re: Fine-grained locking for POSIX local sockets (UNIX domain sockets) Message-ID: <20060507210426.GA4422@xor.obsecurity.org> In-Reply-To: <20060506221908.GB51268@xor.obsecurity.org> References: <20060506150622.C17611@fledge.watson.org> <20060506221908.GB51268@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--OXfL5xGRrasGEqWY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 06, 2006 at 06:19:08PM -0400, Kris Kennaway wrote: > x norwatson-8 > + rwatson-8 > +------------------------------------------------------------+ > | + | > | + + + x x| > |+ + +++ + x xxxxx x x| > | |_________A___M______| |___A___| | > +------------------------------------------------------------+ > N Min Max Median Avg Stdd= ev > x 10 2601.46 2700.26 2650.52 2653.441 30.7580= 34 > + 10 2240.86 2516.87 2496.085 2468.468 81.8685= 76 > Difference at 95.0% confidence > -184.973 +/- 58.1052 > -6.97106% +/- 2.1898% > (Student's t, pooled s =3D 61.8406) >=20 > We see the drop in performance in both cases indicating that we are in > the "overloaded" regime. The fact that your patch seems to give worse > performance is puzzling at first sight. >=20 > Running mutex profiling (and only keeping the unp mutex entries and > the 10 most contended for clarity) shows the following: >=20 > norwatson, 8 clients: >=20 > debug.mutex.prof.stats: > max total count avg cnt_hold cnt_lock name > 5 40 9 4 0 3 kern/uipc= _usrreq.c:170 (unp) > 8 8 1 8 0 0 vm/uma_co= re.c:2101 (unpcb) > 13 283 52 5 0 0 vm/uma_co= re.c:890 (unpcb) > 14 1075 200 5 0 0 vm/uma_co= re.c:1885 (unpcb) > 4 52 18 2 4 6 kern/uipc= _usrreq.c:577 (unp) > 5 39 9 4 4 2 kern/uipc= _usrreq.c:534 (unp) > 5 35 11 3 6 6 kern/uipc= _usrreq.c:974 (unp) > 5 45 11 4 7 4 kern/uipc= _usrreq.c:210 (unp) > 171 1164 9 129 7 2 kern/uipc= _usrreq.c:917 (unp) > 14 78 20 3 11 2872481 kern/uipc= _usrreq.c:709 (unp) > 70 156 11 14 13 4 kern/uipc= _usrreq.c:895 (unp) > 43 581 20 29 24 6 kern/uipc= _usrreq.c:239 (unp) > 44 429 18 23 26 8 kern/uipc= _usrreq.c:518 (unp) > 55 491 12 40 30 10 kern/uipc= _usrreq.c:251 (unp) > ... > 449 20000519 320038 62 15158 0 kern/uipc= _usrreq.c:431 (so_rcv) > 459 86616085 2880079 30 15699 4944 kern/uipc= _usrreq.c:319 (so_snd) > 146 2273360 640315 3 27918 29789 kern/kern= _sig.c:1002 (process lock) > 387 3325481 640099 5 38143 47670 kern/kern= _descrip.c:420 (filedesc structure) > 150 1881990 640155 2 64111 49033 kern/kern= _descrip.c:368 (filedesc structure) > 496 13792853 3685885 3 101692 132480 kern/kern= _descrip.c:1988 (filedesc structure) > 207 4061793 551604 7 115427 118242 kern/kern= _synch.c:220 (process lock) > 391 10332282 3685885 2 194387 129547 kern/kern= _descrip.c:1967 (filedesc structure) > 465 25504709 320042 79 1632192 294498 kern/uipc= _usrreq.c:364 (unp) > 470 124263922 2880084 43 13222757 2685853 kern/uipc= _usrreq.c:309 (unp) >=20 > i.e. there is indeed heavy contention on the unp lock (column 5 counts > the number of times we tried to acquire it and failed because someone > else had the lock) - in fact about 5 times as many contentions as > successful acquisitions. >=20 > With your patch and the same load: >=20 > 3 20 9 2 0 0 kern/uipc= _usrreq.c:1028 (unp_mtx) > 3 22 9 2 0 0 kern/uipc= _usrreq.c:1161 (unp_mtx) > 5 29 9 3 0 2 kern/uipc= _usrreq.c:1065 (unp_global_mtx) > 5 53 18 2 0 76488 kern/uipc= _usrreq.c:287 (unp_global_mtx) > 6 33 9 3 0 0 kern/uipc= _usrreq.c:236 (unp_mtx) > 6 37 9 4 0 0 kern/uipc= _usrreq.c:819 (unp_mtx) > 7 7 1 7 0 0 vm/uma_co= re.c:2101 (unpcb) > 8 49 9 5 0 0 kern/uipc= _usrreq.c:1101 (unp_mtx) > 11 136 18 7 0 1 kern/uipc= _usrreq.c:458 (unp_global_mtx) > 32 143 9 15 0 1 kern/uipc= _usrreq.c:1160 (unp_global_mtx) > 44 472 18 26 0 0 kern/uipc= _usrreq.c:801 (unp_mtx) > 123 310 9 34 0 0 kern/uipc= _usrreq.c:1100 (unp_mtx) > 147 452 9 50 0 0 kern/uipc= _usrreq.c:1099 (unp_mtx) > 172 748 9 83 0 0 kern/uipc= _usrreq.c:473 (unp_mtx) > 337 1592 9 176 0 0 kern/uipc= _usrreq.c:1147 (unp_mtx) > 350 1790 9 198 0 0 kern/uipc= _usrreq.c:1146 (unp_mtx) > 780 39405928 320038 123 0 0 kern/uipc= _usrreq.c:618 (unp_mtx) > 18 140 9 15 1 0 kern/uipc= _usrreq.c:235 (unp_global_mtx) > 70 717 18 39 1 3 kern/uipc= _usrreq.c:800 (unp_global_mtx) > 528 2444 9 271 1 1 kern/uipc= _usrreq.c:1089 (unp_global_mtx) > 158 616 9 68 2 2 kern/uipc= _usrreq.c:476 (unp_mtx) > 794 175382857 2880084 60 2 7686 kern/uipc= _usrreq.c:574 (unp_mtx) > 4 25 9 2 3 2 kern/uipc= _usrreq.c:422 (unp_global_mtx) > 186 874 9 97 3 3 kern/uipc= _usrreq.c:472 (unp_global_mtx) > 768 33783759 320038 105 7442 0 kern/uipc= _usrreq.c:696 (unp_mtx) > ... > 465 913127 320045 2 43130 35046 kern/uipc= _socket.c:1101 (so_snd) > 483 2453927 628737 3 44768 46177 kern/kern= _sig.c:1002 (process lock) > 767 124298544 2880082 43 70037 59994 kern/uipc= _usrreq.c:581 (so_snd) > 794 45176699 320038 141 83252 72140 kern/uipc= _usrreq.c:617 (unp_global_mtx) > 549 9858281 3200210 3 579269 712643 kern/kern= _resource.c:1172 (sleep mtxpool) > 554 17122245 631715 27 641888 268243 kern/kern= _descrip.c:420 (filedesc structure) > 388 3009912 631753 4 653540 260590 kern/kern= _descrip.c:368 (filedesc structure) > 642 49626755 3681446 13 1642954 682669 kern/kern= _descrip.c:1988 (filedesc structure) > 530 13802687 3681446 3 1663244 616899 kern/kern= _descrip.c:1967 (filedesc structure) > 477 23472709 2810986 8 5671248 1900047 kern/kern= _synch.c:220 (process lock) >=20 > The top 10 heavily contended mutexes are very different (but note the > number of mutex acquisitions, column 3, is about the same). >=20 > There is not much contention on unp_global_mtx any longer, but there > is a lot more on some of the other mutexes, especially the process > lock via msleep(). Off-hand I don't know what is the cause of this > bottleneck (note: libthr is used as threading library and libpthread > is not ported to sparc64). >=20 > Also, a lot of the contention that used to be on the unp lock seems to > have fallen through onto contending *two* of the filedesc locks (all > about 1.6 million contentions). This may also help to explain the > performance drop. I profiled the wait channels in msleep and found: Before: 1 getblk 3 wait 1113 sigwait 2228 umtxq_busy 4505 fdesc 9405 umtx 14264 sbwait ----- 31515 total After: 1 getblk 1 wait 1413 sigwait 2794 umtxq_busy 4638 fdesc 10779 umtx 11663 sbwait ----- 31287 total i.e. the same number of msleep calls in both cases, but the patch shuffles them from sbwait onto everything else. Relative change: sbwait -19% umtx +14% fdesc +3% umtxq_busy +25% sigwait +27% This may help to understand why performance drops at high load when unp locking is moved out of the way. Kris --OXfL5xGRrasGEqWY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEXmDZWry0BWjoQKURAu/0AKClaJIDOtbXoiyk38hh8XPOM2iixwCfdn5z 6Ls/SEAnr4MpzYr6hner4rU= =5hwo -----END PGP SIGNATURE----- --OXfL5xGRrasGEqWY--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060507210426.GA4422>