From owner-freebsd-arch@FreeBSD.ORG Fri Aug 3 00:52:40 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F27916A417; Fri, 3 Aug 2007 00:52:40 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 313CC13C45A; Fri, 3 Aug 2007 00:52:40 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.101] (c-71-231-138-78.hsd1.or.comcast.net [71.231.138.78]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l730qbtE026836 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Thu, 2 Aug 2007 20:52:38 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Thu, 2 Aug 2007 17:55:12 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: Kris Kennaway In-Reply-To: <20070729180722.GB85196@rot26.obsecurity.org> Message-ID: <20070802174819.S561@10.0.0.1> References: <20070702230728.E552@10.0.0.1> <20070703181242.T552@10.0.0.1> <20070704105525.GU45894@elvis.mu.org> <20070704114005.X552@10.0.0.1> <20070729180722.GB85196@rot26.obsecurity.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, Alfred Perlstein Subject: Re: Fine grain select locking. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Aug 2007 00:52:40 -0000 On Sun, 29 Jul 2007, Kris Kennaway wrote: > On Wed, Jul 04, 2007 at 11:47:35AM -0700, Jeff Roberson wrote: > >>>> http://people.freebsd.org/~jeff/select2.diff >>> >>> Jeff, I understand you're trying to speed up mysql micro benchmarks, >>> but have you done any benchmarking on large select operations? >> >> I don't know that I'd call mysql a micro-benchmark. This patch also >> didn't help there as much as I had hoped and I'm still trying to >> understand why. > > Here is a graph of the performance effects to sysbench with this > patch: > > http://obsecurity.dyndns.org/select.png > Kris, Thanks very much for looking into this. The pgsql numbers and lock profiling output seem to verify the concurrency of the patch. Hopefully this and db's microbenchmark are enough to convince people this should go in after 7.0 branches. In regards to the proc locking; I believe we need to make seperate locks for signal processing, the various limits, etc. I think if we add one or two more locks per-proc we won't need to do rwlocks and we can fix most of this contention. I believe filedescriptor locking is the place where we are most lacking. The new sx helped tremendously. However, this is still going to be a scalability limiter. I have looked into both linux and solaris's solution to this problem. Briefly, linux uses RCU to protect the list, which is close to ideal as this is certainly a read heavy workload. Solaris on the other hand uses the actual file lock to protect the descriptor slot. So they fetch the file pointer, lock it, and then check to see if they lost a race with the slot being reassigned while they were acquiring the lock. This approach is perhaps better than rcu in many cases except when the descriptor set is expanded. Then they have to lock every file in the set. I hope we can hash out a good plan to resolve this for 8.0. filedesc and lockmgr are the biggest hitters on mysql writes. I suspect this is also the case for pgsql and likely other network server type programs. Thanks, Jeff > mysql > ===== > > It appears that at higher loads most of the contention is now in > userland, no longer within the kernel. There is also significant > contention on the proc lock. > > Peak load (8 clients): > 31 1137 169 1280 0 0 126 56 kern/kern_umtx.c:325 (sleep mutex:umtxql) > 7 5688 554 4750 1 0 722 294 kern/subr_sleepqueue.c:388 (sleep mutex:process lock) > 3 2335 1155 8732 0 0 669 571 kern/sys_generic.c:955 (sleep mutex:process lock) > > Higher load (20 clients): > 88 6714 807 4763 1 0 754 276 kern/subr_sleepqueue.c:388 (sleep mutex:process lock) > 3 2342 1228 7656 0 0 650 550 kern/sys_generic.c:955 (sleep mutex:process lock) > 2 431 1299 1023 0 1 53 77 kern/kern_sig.c:996 (sleep mutex:process lock) > 7 371 3545 635 0 5 58 131 kern/kern_mutex.c:141 (sleep mutex:umtxql) > 70 5085 7433 3184 1 2 507 377 kern/kern_umtx.c:325 (sleep mutex:umtxql) > > I looked in the past at replacing the proc mutex with a rwlock and > looking for places where shared locking could be used, but at least as > the code is written currently I dont think any of those apply here. > > With the select locking patch overall mysql performance does not > change much, but the total amount of time spent waiting for locks is > greatly reduced (by about an order of magnitude), so system time > should be lower with these changes (unless it's counterbalanced by > greater time spent doing other things than lock waits). I have not > measured this though. > > We might be able to obtain some further improvement at higher loads by > improving the contention behaviour of umtx objects (the kernel part of > the libthr pthread mutex). I suspect most of the problem is in mysql > itself. What we need is a userland counterpart of lock profiling, for > profiling contention on pthread mutexes. > > pgsql > ===== > > Clear performance benefit from select locking, on the order of 5-10%. > Reduction in lock wait time is about *two* orders of magnitude. > > Peak load: > > 5 2942 1437 2607 1 0 818 446 kern/subr_turnstile.c:546 (spin mutex:turnstile chain) > 13 9250 1474 9572 0 0 1405 585 kern/subr_sleepqueue.c:388 (sleep mutex:process lock) > 39 3019 2856 9458 0 0 1613 1131 kern/sys_generic.c:955 (sleep mutex:process lock) > 120 5540 5494 16954 0 0 3536 2017 kern/kern_sig.c:996 (sleep mutex:process lock) > > 20 clients: > 8 5336 3506 4338 1 0 1610 910 kern/subr_turnstile.c:546 (spin mutex:turnstile chain) > 2 2828 4261 8787 0 0 1749 1298 kern/sys_generic.c:955 (sleep mutex:process lock) > 56 10717 4568 8968 1 0 3092 1382 kern/subr_sleepqueue.c:388 (sleep mutex:process lock) > 4 5390 7646 15766 0 0 3325 2568 kern/kern_sig.c:996 (sleep mutex:process lock) > 79 9423 70619 33525 0 2 154 92 kern/uipc_syscalls.c:135 (sleep mutex:sleep mtxpool) > > i.e. much the same lock workload as mysql except for no umtx > contention (pgsql is not threaded), and huge wait time (but not much > contention) on the following: > > static int > getsock(struct filedesc *fdp, int fd, struct file **fpp, u_int *fflagp) > { > ... > > if (fdp == NULL) > error = EBADF; > else { > FILEDESC_SLOCK(fdp); > fp = fget_locked(fdp, fd); > if (fp == NULL) > error = EBADF; > else if (fp->f_type != DTYPE_SOCKET) { > fp = NULL; > error = ENOTSOCK; > } else { > fhold(fp); > ... > > } > > I think this is mostly because it's called so often, with small > incremental but large total cost. > > Kris >