From owner-freebsd-performance@FreeBSD.ORG  Mon May  8 10:43:37 2006
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@freebsd.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from localhost.my.domain (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id D34D816A402;
	Mon,  8 May 2006 10:43:36 +0000 (UTC)
	(envelope-from davidxu@freebsd.org)
From: David Xu <davidxu@freebsd.org>
To: freebsd-performance@freebsd.org
Date: Mon, 8 May 2006 18:43:31 +0800
User-Agent: KMail/1.8.2
References: <20060506150622.C17611@fledge.watson.org>
	<20060507230430.GA6872@xor.obsecurity.org>
	<20060508065207.GA20386@xor.obsecurity.org>
In-Reply-To: <20060508065207.GA20386@xor.obsecurity.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="gb2312"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200605081843.31825.davidxu@freebsd.org>
Cc: Robert Watson <rwatson@freebsd.org>, performance@freebsd.org,
	current@freebsd.org, Kris Kennaway <kris@obsecurity.org>
Subject: Re: Fine-grained locking for POSIX local sockets (UNIX domain
	sockets)
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 May 2006 10:43:37 -0000

On Monday 08 May 2006 14:52, Kris Kennaway wrote:
> OK, David's patch fixes the umtx thundering herd (and seems to give a
> 4-6% boost).  I also fixed a thundering herd in FILEDESC_UNLOCK (which
> was also waking up 2-7 CPUs at once about 30% of the time) by doing
> s/wakeup/wakeup_one/.  This did not seem to give a performance impact
> on this test though.
>....
> filedesc contention is down by a factor of 3-4, with corresponding
> reduction in the average hold time.  The process lock contention
> coming from the signal delivery wakeup has also gone way down for some
> reason.
>

I found that mysqld frequently calls alarm() in its file thr_alarm.c and 
thr_kill() to send SIGALRM to its timer thread to wake it up, the timer 
thread itself is being blocked in sigwait(), normally the alarm timer will
be expired in a second, so the kernel will periodically call psignal to find
a thread which can handle the signal, it means kernel has to periodically
walk through thread list with process lock and scheduler held, this is
very expensive.

thr_kill will in most time wake up the timer thread earlier, in thr_kill
syscall,  kernel has to walk through thread list to find a thread whose
thread is matching the given id, the function thread_find()
uses a linear searching algorithm, it is slow, if there are lots of thread in
the process,  the process lock will be holden too long, I think that's the 
reason why you have seen so many process lock contention, if you
define USE_ALARM_THREAD in mysql header file, the contention should
be decreased ( I hope ), patch:

--- my_pthread.h.old	Mon May  8 18:16:56 2006
+++ my_pthread.h	Mon May  8 18:17:07 2006
@@ -267,6 +267,8 @@
 
 /* Test first for RTS or FSU threads */
 
+#define USE_ALARM_THREAD
+
 #if defined(PTHREAD_SCOPE_GLOBAL) && !defined(PTHREAD_SCOPE_SYSTEM)
 #define HAVE_rts_threads
 extern int my_pthread_create_detached;


> unp contention has risen a bit.  The other big gain is to sleep
> mtxpool contention, which roughly doubled:
>
> /*
>  * Change the total socket buffer size a user has used.
>  */
> int
> chgsbsize(uip, hiwat, to, max)
>         struct  uidinfo *uip;
>         u_int  *hiwat;
>         u_int   to;
>         rlim_t  max;
> {
>         rlim_t new;
>
>         UIDINFO_LOCK(uip);
>
> So the next question is how can that be optimized?
>
may use atomic_cmpset_int in a loop to avoid context switch or use an
adaptive mutex, but there is no adaptive mutex type you can specify.
rlim_t is a 64bit integer, so atomic operation can not be used, but 64bit 
integer might not be necessary for socket buffer size.

> Kris

David Xu