From owner-freebsd-current@FreeBSD.ORG Wed Feb 28 23:51:08 2007 Return-Path: X-Original-To: current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4C94C16A403; Wed, 28 Feb 2007 23:51:08 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id BBEDA13C478; Wed, 28 Feb 2007 23:51:02 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id DA5DD46BD2; Wed, 28 Feb 2007 18:51:01 -0500 (EST) Date: Wed, 28 Feb 2007 23:51:01 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Stephane E. Potvin" In-Reply-To: <45E5D589.3080202@FreeBSD.org> Message-ID: <20070228234754.Q13593@fledge.watson.org> References: <20070226204916.C56223@fledge.watson.org> <45E5D589.3080202@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: brooks@FreeBSD.org, current@FreeBSD.org Subject: Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Feb 2007 23:51:08 -0000 On Wed, 28 Feb 2007, Stephane E. Potvin wrote: >> Please let me know if you experience any problems with UNIX domain sockets >> -- these changes will affect applications that consume UNIX domain sockets >> directly, like MySQL and Postfix, as well as consumers of POSIX fifos, >> which are implemented using UNIX domain sockets in-kernel. > > Since this commit, I've been observing frequent deadlocks on my laptop, > mostly when starting-up gnome. It usually takes less than 5 to 10 minutes > for the deadlock to happens. > > I was able to drop into ddb once and got the following information: (there > might be some typos as I had to copy this manually) Thanks, this information was very helpful, and indeed the problem is as you surmise: cases existed where more than one unpcb lock was acquired at a time when holding only a global read lock, not a global write lock. I guess these slipped through from an earlier version of the patch. In any case, could you try the patch at: http://www.watson.org/~robert/freebsd/netperf/20070228-unp_deadlock.diff This eliminates overlapped unpcb lock acquisition in both datagram and stream cases, and with any luck will fix the deadlock problem. It may also marginally improve performance by further reducing unpcb lock contention. Thanks, Robert N M Watson Computer Laboratory University of Cambridge > > show alllocks > Process 906 (gnome-power-manager) thread 0xc553c570 (100126) > exclusive sleep mutex unp_mtx r = 0 (0xc5573bb8) locked @ > /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849 > shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ > /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768 > Process 860 (dbus-daemon) thread 0xc4d001d0 (100095) > exclusive sleep mutex unp_mtx r = 0 (0xc5573b10) locked @ > /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849 > shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ > /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768 > > show lock 0xc5573bb8 > class: sleep mutex > name: unp_mtx > flags: {DEF, RECURSE, DUPOK} > state: {OWNED, CONTESTED} > owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") > > show turnstile 0xc5573bb8 > Lock: 0xc5573bb8 - (sleep mutex) unp_mtx > Lock Owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") > Shared Waiters: > empty > Exclusive Waiters: > 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") > Pending Threads: > empty > > show lock 0xc5573b10 > class: sleep mutex > name: unp_mtx > flags: {DEF, RECURSE, DUPOK} > state: {OWNED, CONTESTED} > owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") > > show turnstile 0xc5573b10 > Lock: 0xc5573b10 - (sleep mutex) unp_mtx > Lock Owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") > Shared Waiters: > empty > Exclusive Waiters: > 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") > Pending Threads: > empty > > show lock 0xc06d1dac > class: rw > name: unp_global_rwlock > state: RLOCK: 2 locks > waiters: writers > > show turnstile 0xc06d1dac > Lock: 0xc06d1dac - (rw) unp_global_rwlock > Lock Owner: none > Shared Waiters: > empty > Exclusive Waiters: > 0xc4d00000 (tid 100096, pid 857, "gconfd-2") > 0xc4d01570 (tid 100085, pid 804, "login") > 0xc4fcaae0 (tid 100133, pid 887, "bonobo-activation-s") > 0xc48c23a0 (tid 100106, pid 897, "gaim") > 0xc4d01910 (tid 100120, pid 909, "gnome-screensaver") > 0xc553cae0 (tid 100123, pid 905, "gnome-mount") > Pending Threads: > empty > > bt 100095 > Tracing pid 860 tid 100095 td 0xc4d001d0 > shced_switch(3301966288,0,1,3226391662,3310601584,...) at 3226314602 = > sched_switch+303 > mi_switch(1,0,3227647346,647,3228084884,...) at 3226245932 = mi_switch+489 > turnstile_wait(3310828472,3310601584,0,3310601586,3310828472,...) at > 3226393861 = turnstile_wait+633 > _mtx_lock_sleep(3310828472,3301966288,0,3227660663,877,...) at 3226177946 = > _mtx_lock_sleep+261 > _mtx_lock_flags(3310828472,0,3227660663,877,3310833112,...) at 3226177102 = > _mtx_lock_flags+102 > uipc_send(3310832888,0,3296484864,0,0,...) at 3226561343 = uipc_send+1058 > sosend_generic(3310832888,0,3302262848,3296484864,0,...) at 3226529764 = > sosend_generic_1067 > sosend(3310832888,0,3302262848,0,0,...) at 3226530139 = sosend+63 > soo_write(3304721288,3302262848,3297254528,0,3301966288,...) at 3226433647 = > soo_write+121 > dofilewrite > kern_writev > writev > syscall > > bt 100126 > Tracing pid 906 tid 100126 td 0xc553c570 > sched_switch > mi_switch > turnstile_wait > _mtx_lock_sleep > _mtx_locl_flags > uipc_send > sosend_generic > sosend > soo_write > dofilewrite > kern_writev > writev > syscalL > > As you can see, the threads 100095 and 100126 both are waiting on each > other's lock. The function uipc_send tries to lock two unp_mtx without > holding a write lock on unp_global_rwlock. It seems that the write ownership > is taken by uipc_send only if nam is not NULL or the PRUS_EOF flag is set. > Both of these conditions are false in this particular call scenario. From the > comments just above the second lock in uipc_usrreq.c, the global write lock > should already acquired by the time we get there. I'm not sure where or under > what condition the write lock should be acquired to correctly fix this. I'll > keep the core around in case you want me to provide more information. > > Regards, > > Steph >