From owner-freebsd-arch@FreeBSD.ORG  Fri Aug  3 00:52:40 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9F27916A417;
	Fri,  3 Aug 2007 00:52:40 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 313CC13C45A;
	Fri,  3 Aug 2007 00:52:40 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.101] (c-71-231-138-78.hsd1.or.comcast.net
	[71.231.138.78]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	l730qbtE026836
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
	Thu, 2 Aug 2007 20:52:38 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Thu, 2 Aug 2007 17:55:12 -0700 (PDT)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@10.0.0.1
To: Kris Kennaway <kris@obsecurity.org>
In-Reply-To: <20070729180722.GB85196@rot26.obsecurity.org>
Message-ID: <20070802174819.S561@10.0.0.1>
References: <20070702230728.E552@10.0.0.1> <20070703181242.T552@10.0.0.1>
	<20070704105525.GU45894@elvis.mu.org> <20070704114005.X552@10.0.0.1>
	<20070729180722.GB85196@rot26.obsecurity.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org, Alfred Perlstein <alfred@freebsd.org>
Subject: Re: Fine grain select locking.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Aug 2007 00:52:40 -0000

On Sun, 29 Jul 2007, Kris Kennaway wrote:

> On Wed, Jul 04, 2007 at 11:47:35AM -0700, Jeff Roberson wrote:
>
>>>> http://people.freebsd.org/~jeff/select2.diff
>>>
>>> Jeff, I understand you're trying to speed up mysql micro benchmarks,
>>> but have you done any benchmarking on large select operations?
>>
>> I don't know that I'd call mysql a micro-benchmark.  This patch also
>> didn't help there as much as I had hoped and I'm still trying to
>> understand why.
>
> Here is a graph of the performance effects to sysbench with this
> patch:
>
>  http://obsecurity.dyndns.org/select.png
>

Kris,

Thanks very much for looking into this.  The pgsql numbers and lock 
profiling output seem to verify the concurrency of the patch.  Hopefully 
this and db's microbenchmark are enough to convince people this should go 
in after 7.0 branches.

In regards to the proc locking;  I believe we need to make seperate locks 
for signal processing, the various limits, etc.  I think if we add one or 
two more locks per-proc we won't need to do rwlocks and we can fix most of 
this contention.

I believe filedescriptor locking is the place where we are most lacking. 
The new sx helped tremendously.  However, this is still going to be a 
scalability limiter.  I have looked into both linux and solaris's solution 
to this problem.  Briefly, linux uses RCU to protect the list, which is 
close to ideal as this is certainly a read heavy workload.  Solaris on the 
other hand uses the actual file lock to protect the descriptor slot.  So 
they fetch the file pointer, lock it, and then check to see if they lost a 
race with the slot being reassigned while they were acquiring the lock. 
This approach is perhaps better than rcu in many cases except when the 
descriptor set is expanded.  Then they have to lock every file in the set.

I hope we can hash out a good plan to resolve this for 8.0.  filedesc and 
lockmgr are the biggest hitters on mysql writes.  I suspect this is also 
the case for pgsql and likely other network server type programs.

Thanks,
Jeff

> mysql
> =====
>
> It appears that at higher loads most of the contention is now in
> userland, no longer within the kernel.  There is also significant
> contention on the proc lock.
>
> Peak load (8 clients):
>    31         1137          169        1280     0     0          126           56 kern/kern_umtx.c:325 (sleep mutex:umtxql)
>     7         5688          554        4750     1     0          722          294 kern/subr_sleepqueue.c:388 (sleep mutex:process lock)
>     3         2335         1155        8732     0     0          669          571 kern/sys_generic.c:955 (sleep mutex:process lock)
>
> Higher load (20 clients):
>    88         6714          807        4763     1     0          754          276 kern/subr_sleepqueue.c:388 (sleep mutex:process lock)
>     3         2342         1228        7656     0     0          650          550 kern/sys_generic.c:955 (sleep mutex:process lock)
>     2          431         1299        1023     0     1           53           77 kern/kern_sig.c:996 (sleep mutex:process lock)
>     7          371         3545         635     0     5           58          131 kern/kern_mutex.c:141 (sleep mutex:umtxql)
>    70         5085         7433        3184     1     2          507          377 kern/kern_umtx.c:325 (sleep mutex:umtxql)
>
> I looked in the past at replacing the proc mutex with a rwlock and
> looking for places where shared locking could be used, but at least as
> the code is written currently I dont think any of those apply here.
>
> With the select locking patch overall mysql performance does not
> change much, but the total amount of time spent waiting for locks is
> greatly reduced (by about an order of magnitude), so system time
> should be lower with these changes (unless it's counterbalanced by
> greater time spent doing other things than lock waits).  I have not
> measured this though.
>
> We might be able to obtain some further improvement at higher loads by
> improving the contention behaviour of umtx objects (the kernel part of
> the libthr pthread mutex).  I suspect most of the problem is in mysql
> itself.  What we need is a userland counterpart of lock profiling, for
> profiling contention on pthread mutexes.
>
> pgsql
> =====
>
> Clear performance benefit from select locking, on the order of 5-10%.
> Reduction in lock wait time is about *two* orders of magnitude.
>
> Peak load:
>
>     5         2942         1437        2607     1     0          818          446 kern/subr_turnstile.c:546 (spin mutex:turnstile chain)
>    13         9250         1474        9572     0     0         1405          585 kern/subr_sleepqueue.c:388 (sleep mutex:process lock)
>    39         3019         2856        9458     0     0         1613         1131 kern/sys_generic.c:955 (sleep mutex:process lock)
>   120         5540         5494       16954     0     0         3536         2017 kern/kern_sig.c:996 (sleep mutex:process lock)
>
> 20 clients:
>     8         5336         3506        4338     1     0         1610          910 kern/subr_turnstile.c:546 (spin mutex:turnstile chain)
>     2         2828         4261        8787     0     0         1749         1298 kern/sys_generic.c:955 (sleep mutex:process lock)
>    56        10717         4568        8968     1     0         3092         1382 kern/subr_sleepqueue.c:388 (sleep mutex:process lock)
>     4         5390         7646       15766     0     0         3325         2568 kern/kern_sig.c:996 (sleep mutex:process lock)
>    79         9423        70619       33525     0     2          154           92 kern/uipc_syscalls.c:135 (sleep mutex:sleep mtxpool)
>
> i.e. much the same lock workload as mysql except for no umtx
> contention (pgsql is not threaded), and huge wait time (but not much
> contention) on the following:
>
> static int
> getsock(struct filedesc *fdp, int fd, struct file **fpp, u_int *fflagp)
> {
>        ...
>
>        if (fdp == NULL)
>                error = EBADF;
>        else {
>                FILEDESC_SLOCK(fdp);
>                fp = fget_locked(fdp, fd);
>                if (fp == NULL)
>                        error = EBADF;
>                else if (fp->f_type != DTYPE_SOCKET) {
>                        fp = NULL;
>                        error = ENOTSOCK;
>                } else {
>                        fhold(fp);
> 			...
>
> }
>
> I think this is mostly because it's called so often, with small
> incremental but large total cost.
>
> Kris
>