Date: Wed, 28 Feb 2007 19:00:15 -0500 From: Randall Stewart <rrs@cisco.com> To: Robert Watson <rwatson@freebsd.org> Cc: brooks@freebsd.org, "Stephane E. Potvin" <sepotvin@freebsd.org>, current@freebsd.org Subject: Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD Message-ID: <45E6178F.8040302@cisco.com> In-Reply-To: <20070228234754.Q13593@fledge.watson.org> References: <20070226204916.C56223@fledge.watson.org> <45E5D589.3080202@FreeBSD.org> <20070228234754.Q13593@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote: > > On Wed, 28 Feb 2007, Stephane E. Potvin wrote: > >>> Please let me know if you experience any problems with UNIX domain >>> sockets -- these changes will affect applications that consume UNIX >>> domain sockets directly, like MySQL and Postfix, as well as consumers >>> of POSIX fifos, which are implemented using UNIX domain sockets >>> in-kernel. >> >> Since this commit, I've been observing frequent deadlocks on my >> laptop, mostly when starting-up gnome. It usually takes less than 5 to >> 10 minutes for the deadlock to happens. >> >> I was able to drop into ddb once and got the following information: >> (there might be some typos as I had to copy this manually) > > Thanks, this information was very helpful, and indeed the problem is as > you surmise: cases existed where more than one unpcb lock was acquired > at a time when holding only a global read lock, not a global write > lock. I guess these slipped through from an earlier version of the > patch. In any case, could you try the patch at: > > http://www.watson.org/~robert/freebsd/netperf/20070228-unp_deadlock.diff > > This eliminates overlapped unpcb lock acquisition in both datagram and > stream cases, and with any luck will fix the deadlock problem. It may > also marginally improve performance by further reducing unpcb lock > contention. > > Thanks, > > Robert N M Watson > Computer Laboratory > University of Cambridge > >> >> show alllocks >> Process 906 (gnome-power-manager) thread 0xc553c570 (100126) >> exclusive sleep mutex unp_mtx r = 0 (0xc5573bb8) locked @ >> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849 >> shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ >> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768 >> Process 860 (dbus-daemon) thread 0xc4d001d0 (100095) >> exclusive sleep mutex unp_mtx r = 0 (0xc5573b10) locked @ >> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849 >> shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ >> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768 >> >> show lock 0xc5573bb8 >> class: sleep mutex >> name: unp_mtx >> flags: {DEF, RECURSE, DUPOK} >> state: {OWNED, CONTESTED} >> owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") >> >> show turnstile 0xc5573bb8 >> Lock: 0xc5573bb8 - (sleep mutex) unp_mtx >> Lock Owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") >> Shared Waiters: >> empty >> Exclusive Waiters: >> 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") >> Pending Threads: >> empty >> >> show lock 0xc5573b10 >> class: sleep mutex >> name: unp_mtx >> flags: {DEF, RECURSE, DUPOK} >> state: {OWNED, CONTESTED} >> owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") >> >> show turnstile 0xc5573b10 >> Lock: 0xc5573b10 - (sleep mutex) unp_mtx >> Lock Owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") >> Shared Waiters: >> empty >> Exclusive Waiters: >> 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") >> Pending Threads: >> empty >> >> show lock 0xc06d1dac >> class: rw >> name: unp_global_rwlock >> state: RLOCK: 2 locks >> waiters: writers >> >> show turnstile 0xc06d1dac >> Lock: 0xc06d1dac - (rw) unp_global_rwlock >> Lock Owner: none >> Shared Waiters: >> empty >> Exclusive Waiters: >> 0xc4d00000 (tid 100096, pid 857, "gconfd-2") >> 0xc4d01570 (tid 100085, pid 804, "login") >> 0xc4fcaae0 (tid 100133, pid 887, "bonobo-activation-s") >> 0xc48c23a0 (tid 100106, pid 897, "gaim") >> 0xc4d01910 (tid 100120, pid 909, "gnome-screensaver") >> 0xc553cae0 (tid 100123, pid 905, "gnome-mount") >> Pending Threads: >> empty >> >> bt 100095 >> Tracing pid 860 tid 100095 td 0xc4d001d0 >> shced_switch(3301966288,0,1,3226391662,3310601584,...) at 3226314602 = >> sched_switch+303 >> mi_switch(1,0,3227647346,647,3228084884,...) at 3226245932 = >> mi_switch+489 >> turnstile_wait(3310828472,3310601584,0,3310601586,3310828472,...) at >> 3226393861 = turnstile_wait+633 >> _mtx_lock_sleep(3310828472,3301966288,0,3227660663,877,...) at >> 3226177946 = _mtx_lock_sleep+261 >> _mtx_lock_flags(3310828472,0,3227660663,877,3310833112,...) at >> 3226177102 = _mtx_lock_flags+102 >> uipc_send(3310832888,0,3296484864,0,0,...) at 3226561343 = uipc_send+1058 >> sosend_generic(3310832888,0,3302262848,3296484864,0,...) at 3226529764 >> = sosend_generic_1067 >> sosend(3310832888,0,3302262848,0,0,...) at 3226530139 = sosend+63 >> soo_write(3304721288,3302262848,3297254528,0,3301966288,...) at >> 3226433647 = soo_write+121 >> dofilewrite >> kern_writev >> writev >> syscall >> >> bt 100126 >> Tracing pid 906 tid 100126 td 0xc553c570 >> sched_switch >> mi_switch >> turnstile_wait >> _mtx_lock_sleep >> _mtx_locl_flags >> uipc_send >> sosend_generic >> sosend >> soo_write >> dofilewrite >> kern_writev >> writev >> syscalL >> >> As you can see, the threads 100095 and 100126 both are waiting on each >> other's lock. The function uipc_send tries to lock two unp_mtx without >> holding a write lock on unp_global_rwlock. It seems that the write >> ownership is taken by uipc_send only if nam is not NULL or the >> PRUS_EOF flag is set. Both of these conditions are false in this >> particular call scenario. From the comments just above the second lock >> in uipc_usrreq.c, the global write lock should already acquired by the >> time we get there. I'm not sure where or under what condition the >> write lock should be acquired to correctly fix this. I'll keep the >> core around in case you want me to provide more information. >> >> Regards, >> >> Steph >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > Robert: I have been having the same problem.. and thought it was some of my code ;-o.... but I see now its not (after more testing) I will try your patch and get back to you :-D R -- Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 <or> 803-317-4952 (cell)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45E6178F.8040302>