Date: Sun, 14 May 2006 22:21:45 +0300 From: Sven Petai <hadara@bsd.ee> To: freebsd-current@freebsd.org Cc: Robert Watson <rwatson@freebsd.org>, Kris Kennaway <kris@obsecurity.org> Subject: Re: Fine-grained locking for POSIX local sockets (UNIX domain sockets) Message-ID: <200605142221.46093.hadara@bsd.ee> In-Reply-To: <20060508065207.GA20386@xor.obsecurity.org> References: <20060506150622.C17611@fledge.watson.org> <20060507230430.GA6872@xor.obsecurity.org> <20060508065207.GA20386@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 08 May 2006 09:52, Kris Kennaway wrote: > The other big gain is to sleep > mtxpool contention, which roughly doubled: > > /* > * Change the total socket buffer size a user has used. > */ > int > chgsbsize(uip, hiwat, to, max) > struct uidinfo *uip; > u_int *hiwat; > u_int to; > rlim_t max; > { > rlim_t new; > > UIDINFO_LOCK(uip); > > So the next question is how can that be optimized? > > Kris hi on the 8 core machine this lock was the top contended one with rwatsons patch, with over 8 million failed acquire attempts. Originally the unp lock had only ~3 million of those, so this explains the sharp drop with larger number of threads I suppose. I feel like I'm missing some very obvious reason, but wouldn't the simplest workaround be just to return 1 right away if limit is set to infinity, which is almost always the case since it's the default, and document on the login.conf manpage that you might take performance hit with this type of workloads when you set sbsize limits. --- /usr/src/sys_clean/kern/kern_resource.c Sat Mar 11 12:48:19 2006 +++ /usr/src/sys/kern/kern_resource.c Sun May 14 05:34:02 2006 @@ -1169,6 +1169,10 @@ { rlim_t new; + if (max == RLIM_INFINITY) { + *hiwat = to; + return (1); + } UIDINFO_LOCK(uip); new = uip->ui_sbsize + to - *hiwat; /* Don't allow them to exceed max, but allow subtraction. */ 8 core machine that I originally used for benchmarking was shipped out to client, so I couldn't test how it would have performed with uidinfo contention out of the way, but results from a 1 * dualcore machine look good: http://bsd.ee/~hadara/debug/mysql4/dualcore/stats.html Several interesting things can be noticed from this data * on dualcore rwatsons patch gives consistent performance boost with all the thread settings tested, no sharp drop after 20 that I had on 8 core * with threadcount in range [3;10] even number of threads performs usually ~4-5% better than odd number * with uidinfo + rwatson patch there were some significant outliers where one result was more than 30% better than others with same settings, these were removed before calculating mean values for graphs after I had finished benchmarking I discovered that new malloc library has debug turned on. After turning it off I see large (20-25%) performance boosts across the range, so I started new round of testing with NO_MALLOC_EXTRAS defined, I'll update the results ASAP. I wonder if I should set up automatic&periodic performance testing system, that would run all the tests for example once a week, with latest current and stable, so that it would be easier for developers to see how changes affect different workloads. If you guys think it would be worthwile, what would be the bechmarks you would like to see in addition to mysql+supersmack ?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200605142221.46093.hadara>