From owner-freebsd-current@FreeBSD.ORG Thu Mar 1 00:04:20 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3E9CC16A410; Thu, 1 Mar 2007 00:04:20 +0000 (UTC) (envelope-from rrs@cisco.com) Received: from sj-iport-4.cisco.com (sj-iport-4.cisco.com [171.68.10.86]) by mx1.freebsd.org (Postfix) with ESMTP id 2283713C46B; Thu, 1 Mar 2007 00:01:08 +0000 (UTC) (envelope-from rrs@cisco.com) Received: from sj-dkim-6.cisco.com ([171.68.10.81]) by sj-iport-4.cisco.com with ESMTP; 28 Feb 2007 16:01:08 -0800 X-IronPort-AV: i="4.14,233,1170662400"; d="scan'208"; a="43923215:sNHT63237573" Received: from sj-core-4.cisco.com (sj-core-4.cisco.com [171.68.223.138]) by sj-dkim-6.cisco.com (8.12.11/8.12.11) with ESMTP id l21018C2022744; Wed, 28 Feb 2007 16:01:08 -0800 Received: from xbh-sjc-211.amer.cisco.com (xbh-sjc-211.cisco.com [171.70.151.144]) by sj-core-4.cisco.com (8.12.10/8.12.6) with ESMTP id l2100lnd010440; Wed, 28 Feb 2007 16:01:08 -0800 (PST) Received: from xfe-sjc-212.amer.cisco.com ([171.70.151.187]) by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 28 Feb 2007 16:00:48 -0800 Received: from [127.0.0.1] ([171.68.225.134]) by xfe-sjc-212.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 28 Feb 2007 16:00:47 -0800 Message-ID: <45E6178F.8040302@cisco.com> Date: Wed, 28 Feb 2007 19:00:15 -0500 From: Randall Stewart User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.8) Gecko/20061029 FreeBSD/i386 SeaMonkey/1.0.6 MIME-Version: 1.0 To: Robert Watson References: <20070226204916.C56223@fledge.watson.org> <45E5D589.3080202@FreeBSD.org> <20070228234754.Q13593@fledge.watson.org> In-Reply-To: <20070228234754.Q13593@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 01 Mar 2007 00:00:47.0668 (UTC) FILETIME=[AA78CF40:01C75B94] DKIM-Signature: v=0.5; a=rsa-sha256; q=dns/txt; l=6311; t=1172707268; x=1173571268; c=relaxed/simple; s=sjdkim6002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=rrs@cisco.com; z=From:=20Randall=20Stewart=20 |Subject:=20Re=3A=20HEADS=20UP=3A=20UNIX=20domain=20socket=20locking=20ch anges=20merged=20to=20CVS=20HEAD |Sender:=20; bh=q5xFogT3dGjE2rfPIyGUQRGpzkQw347zrJm3vYX6sJs=; b=afRD1UFbdZJ470zL4lwAJM9WbxH2dGkmulkpXL4WARij5t0RgidoJR7dRV1FJEctyxPBM2tq k6+UjtHsIMFPhnhoeUqqrKkDl7VXWtIS2BRet1X6xS+1LWtVOra/8tmH; Authentication-Results: sj-dkim-6; header.From=rrs@cisco.com; dkim=pass (sig from cisco.com/sjdkim6002 verified; ); Cc: brooks@freebsd.org, "Stephane E. Potvin" , current@freebsd.org Subject: Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Mar 2007 00:04:20 -0000 Robert Watson wrote: > > On Wed, 28 Feb 2007, Stephane E. Potvin wrote: > >>> Please let me know if you experience any problems with UNIX domain >>> sockets -- these changes will affect applications that consume UNIX >>> domain sockets directly, like MySQL and Postfix, as well as consumers >>> of POSIX fifos, which are implemented using UNIX domain sockets >>> in-kernel. >> >> Since this commit, I've been observing frequent deadlocks on my >> laptop, mostly when starting-up gnome. It usually takes less than 5 to >> 10 minutes for the deadlock to happens. >> >> I was able to drop into ddb once and got the following information: >> (there might be some typos as I had to copy this manually) > > Thanks, this information was very helpful, and indeed the problem is as > you surmise: cases existed where more than one unpcb lock was acquired > at a time when holding only a global read lock, not a global write > lock. I guess these slipped through from an earlier version of the > patch. In any case, could you try the patch at: > > http://www.watson.org/~robert/freebsd/netperf/20070228-unp_deadlock.diff > > This eliminates overlapped unpcb lock acquisition in both datagram and > stream cases, and with any luck will fix the deadlock problem. It may > also marginally improve performance by further reducing unpcb lock > contention. > > Thanks, > > Robert N M Watson > Computer Laboratory > University of Cambridge > >> >> show alllocks >> Process 906 (gnome-power-manager) thread 0xc553c570 (100126) >> exclusive sleep mutex unp_mtx r = 0 (0xc5573bb8) locked @ >> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849 >> shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ >> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768 >> Process 860 (dbus-daemon) thread 0xc4d001d0 (100095) >> exclusive sleep mutex unp_mtx r = 0 (0xc5573b10) locked @ >> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849 >> shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ >> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768 >> >> show lock 0xc5573bb8 >> class: sleep mutex >> name: unp_mtx >> flags: {DEF, RECURSE, DUPOK} >> state: {OWNED, CONTESTED} >> owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") >> >> show turnstile 0xc5573bb8 >> Lock: 0xc5573bb8 - (sleep mutex) unp_mtx >> Lock Owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") >> Shared Waiters: >> empty >> Exclusive Waiters: >> 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") >> Pending Threads: >> empty >> >> show lock 0xc5573b10 >> class: sleep mutex >> name: unp_mtx >> flags: {DEF, RECURSE, DUPOK} >> state: {OWNED, CONTESTED} >> owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") >> >> show turnstile 0xc5573b10 >> Lock: 0xc5573b10 - (sleep mutex) unp_mtx >> Lock Owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") >> Shared Waiters: >> empty >> Exclusive Waiters: >> 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") >> Pending Threads: >> empty >> >> show lock 0xc06d1dac >> class: rw >> name: unp_global_rwlock >> state: RLOCK: 2 locks >> waiters: writers >> >> show turnstile 0xc06d1dac >> Lock: 0xc06d1dac - (rw) unp_global_rwlock >> Lock Owner: none >> Shared Waiters: >> empty >> Exclusive Waiters: >> 0xc4d00000 (tid 100096, pid 857, "gconfd-2") >> 0xc4d01570 (tid 100085, pid 804, "login") >> 0xc4fcaae0 (tid 100133, pid 887, "bonobo-activation-s") >> 0xc48c23a0 (tid 100106, pid 897, "gaim") >> 0xc4d01910 (tid 100120, pid 909, "gnome-screensaver") >> 0xc553cae0 (tid 100123, pid 905, "gnome-mount") >> Pending Threads: >> empty >> >> bt 100095 >> Tracing pid 860 tid 100095 td 0xc4d001d0 >> shced_switch(3301966288,0,1,3226391662,3310601584,...) at 3226314602 = >> sched_switch+303 >> mi_switch(1,0,3227647346,647,3228084884,...) at 3226245932 = >> mi_switch+489 >> turnstile_wait(3310828472,3310601584,0,3310601586,3310828472,...) at >> 3226393861 = turnstile_wait+633 >> _mtx_lock_sleep(3310828472,3301966288,0,3227660663,877,...) at >> 3226177946 = _mtx_lock_sleep+261 >> _mtx_lock_flags(3310828472,0,3227660663,877,3310833112,...) at >> 3226177102 = _mtx_lock_flags+102 >> uipc_send(3310832888,0,3296484864,0,0,...) at 3226561343 = uipc_send+1058 >> sosend_generic(3310832888,0,3302262848,3296484864,0,...) at 3226529764 >> = sosend_generic_1067 >> sosend(3310832888,0,3302262848,0,0,...) at 3226530139 = sosend+63 >> soo_write(3304721288,3302262848,3297254528,0,3301966288,...) at >> 3226433647 = soo_write+121 >> dofilewrite >> kern_writev >> writev >> syscall >> >> bt 100126 >> Tracing pid 906 tid 100126 td 0xc553c570 >> sched_switch >> mi_switch >> turnstile_wait >> _mtx_lock_sleep >> _mtx_locl_flags >> uipc_send >> sosend_generic >> sosend >> soo_write >> dofilewrite >> kern_writev >> writev >> syscalL >> >> As you can see, the threads 100095 and 100126 both are waiting on each >> other's lock. The function uipc_send tries to lock two unp_mtx without >> holding a write lock on unp_global_rwlock. It seems that the write >> ownership is taken by uipc_send only if nam is not NULL or the >> PRUS_EOF flag is set. Both of these conditions are false in this >> particular call scenario. From the comments just above the second lock >> in uipc_usrreq.c, the global write lock should already acquired by the >> time we get there. I'm not sure where or under what condition the >> write lock should be acquired to correctly fix this. I'll keep the >> core around in case you want me to provide more information. >> >> Regards, >> >> Steph >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > Robert: I have been having the same problem.. and thought it was some of my code ;-o.... but I see now its not (after more testing) I will try your patch and get back to you :-D R -- Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 803-317-4952 (cell)